Predective Analytics
Predective Analytics
• Analytics refers to the use of data, statistical analysis, and modeling to solve problems, gain
insights, and make decisions.
• Organizations generate huge amounts of data; analytics helps convert this data into
actionable insights.
• Used in areas like marketing, finance, operations, and HR to make better decisions.
5. Power of Analytics
6. Predictive Analytics
• A branch of analytics that uses historical data and statistical models to predict future
outcomes.
• Classification Models
➤ Predict categories (e.g., spam or not spam)
➤ Used in email filters, fraud detection
• Regression Models
➤ Predict numeric values (e.g., sales, revenue)
➤ Used in forecasting
• Clustering
➤ Group similar data (not direct prediction but used for segmentation)
➤ Used in market research
• Decision Trees
➤ Easy-to-understand visual models for decision-making
• Manufacturing
• Healthcare
• Telecommunication
• Network optimization
• Fraud detection
• Supply Chain
• Inventory forecasting
• Information Technology
• IT resource planning
• Definition:
• Digital Analytics is the analysis of digital data from websites, mobile apps, social media, etc.,
to optimize user experience and business outcomes.
• Tools Used:
• Google Analytics
• Adobe Analytics
MODULE 3
• Simple Linear Regression (SLR) is a statistical method to predict the value of one variable (Y)
using one independent variable (X).
o β₀ = intercept
o β₁ = slope
o ε = error term
2. Importance of SLR
• A method to find best-fitting line by minimizing the sum of squared errors (differences
between actual and predicted Y)
6. Model Interpretation
• R² (R-squared): Explains how well the model fits the data (ranges from 0 to 1)
7. Model Validation
• Check assumptions:
Linearity
Independence
Homoscedasticity (constant variance)
Normality of residuals
• Use metrics:
o R²
MODULE 4
• MLR is used to predict the value of a dependent variable (Y) using two or more
independent variables (X₁, X₂, ..., Xn).
• General equation:
Y = β₀ + β₁X₁ + β₂X₂ + ... + βnXn + ε
Where:
o Y = Dependent variable
o ε = Error term
3. Model Diagnostics
Dummy Variables
Derived Variables
Interaction Variables
5. Multicollinearity
• Detected using:
• Solution:
6. Model Deployment
• Steps:
1. Finalize model
4. Monitor performance
7. Demo Using Software
MODULE 5
Logistic regression is a supervised machine learning algorithm used for classification tasks where
the goal is to predict the probability that an instance belongs to a given class or not. Logistic
regression is a statistical algorithm which analyze the relationship between two data factors. The
article explores the fundamentals of logistic regression, it's types and implementations.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as
0 and 1, it gives the probabilistic values which lie between 0 and 1.
• In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
MODULE 6
• A decision tree is a flowchart-like model used for classification and regression tasks.
5. Introduction to CHAID
• Two types:
MODULE 7
1. Introduction to Unstructured Data Analysis
• Examples:
Why is it important?
2. Sentiment Analysis
Definition:
Example:
Applications:
Overview:
• Called "naïve" because it assumes all features (words) are independent of each other
Formula:
Applications:
• Spam detection
• Email filtering
• Sentiment classification
• Text categorization
MODULE 8
1. Forecasting
• Deals with data collected over time at regular intervals (daily, monthly, yearly).
• Additive Model:
Time series = Trend + Seasonality + Noise
Use when seasonal variations are constant over time
• Multiplicative Model:
Time series = Trend × Seasonality × Noise
Use when seasonal variations increase/decrease proportionally with trend
4. Forecasting Accuracy
• Common metrics:
• Types:
o Weighted Moving Average (WMA): Different weights, recent data given more
importance
• Give more weight to recent observations using a smoothing factor (α, 0 < α < 1).
• Types: