Text Mining with Python
Overview
This course consists of-
- Participants will learn the introduction of Machine Learning and also know the differences between Statistics vs Business Analytics vs Data Science vs Machine Learning vs Deep Learning vs Artificial Intelligence(Understanding the difference)
- Participants will learn the machine learning project life cycle and also Text Mining project life cycle’s Generalized architecture
2 Days
Pre-Requisites
Basic knowledge on Python
Course Outline
-
Introduction Analytics Tool (Python)
- What is Python & History
- Installing Python & Python Environment
- Basic commands in Python
- Data Types and Operations
- Python packages
- Loops
- My first python program
- If-then-else statement
- Functions in Python
- User defined Functions
- Numpy
- Scipy
- Pandas
- Matplotlib
- Sklearn
- nltk
- Data importing
- Connecting to External data sources
- Working with datasets
- Manipulating the datasets
- Merging
- Exporting the datasets into external files
Basic Descriptive Statistics
- Population and Sample of Data Types
- Measures of Central tendency o Measures of dispersion
- Percentiles & Quartiles
- Box plots and outlier detection o Creating Graphs and Reporting o Probability Distributions
- Hypothesis testing
-
- Exploratory Data Analysis
- Data Validation rules
- Data Cleaning techniques
- Deal with missing data Add default values
- Remove incomplete rows
- Deal with error-prone columns
- Fixing the nan values and string/float confusion
- Data Preparation for analysis
- Normalize data types Change casing
- Creating new variables Feature Scaling
- Feature Standardization Label Encoding
- One-Hot Encoding
Algorithms used in Machine Learning
- Supervised Machine learning algorithms
- Unsupervised Machine learning algorithms
Logistic Regression
- Need of logistic Regression
- Logistic regression models
- Validation of logistic regression models
- Multicollinearity in logistic regression
- Individual Impact of variables
- Confusion Matrix
- Case study(Spam filtering)
Text Mining and NLP
- What is text mining
- The NLTK package
- Preparing text for analysis
- Information retrieval
- Text Pre-processing
- Text summarisation
- Sentiment analysis
- Text classification
- News data classification
- Topic Modelling
- LDA
- LDA on Python
- Enterprise Business Intelligence/Data Mining, Competitive Intelligence
- E-Discovery, Records Management
- National Security/Intelligence
- Scientific discovery, especially Life Sciences
- Sentiment Analysis Tools, Listening Platforms
- Natural Language/Semantic Toolkit or Service
- Publishing
- Automated ad placement
- Search/Information Access
- Social media monitoring
- Text Mining best practice
