0% found this document useful (0 votes)
8 views11 pages

Predective Analytics

The document provides a comprehensive overview of analytics, including its definition, importance, and applications in decision-making and predictive analytics. It covers various statistical methods such as linear and multiple regression, logistic regression, decision trees, and unstructured data analysis, along with their applications in different fields. Additionally, it discusses forecasting techniques, time series analysis, and accuracy metrics for predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views11 pages

Predective Analytics

The document provides a comprehensive overview of analytics, including its definition, importance, and applications in decision-making and predictive analytics. It covers various statistical methods such as linear and multiple regression, logistic regression, decision trees, and unstructured data analysis, along with their applications in different fields. Additionally, it discusses forecasting techniques, time series analysis, and accuracy metrics for predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Module 1

1. Overview & Definition of Analytics

• Analytics refers to the use of data, statistical analysis, and modeling to solve problems, gain
insights, and make decisions.

• It involves collecting, processing, and interpreting data to find meaningful patterns.

2. Need for Analytics

• Organizations generate huge amounts of data; analytics helps convert this data into
actionable insights.

• Helps improve efficiency, reduce costs, and predict trends.

• Necessary for staying competitive in today’s data-driven world.

3. Analytics in Decision Making

• Analytics supports evidence-based decision making.

• Reduces guesswork and increases the accuracy of business decisions.

• Used in areas like marketing, finance, operations, and HR to make better decisions.

4. Analytics as a Game Changer and Innovator

• Transforms how businesses operate by providing real-time insights.

• Drives innovation by identifying new opportunities.

• Examples: Personalized recommendations by Amazon or Netflix.

5. Power of Analytics

• Enables data-driven strategies.

• Helps in understanding customer behavior, market trends, and operational performance.

• Supports automation and optimization of processes.

6. Predictive Analytics

• A branch of analytics that uses historical data and statistical models to predict future
outcomes.

• Common tools: Regression analysis, machine learning, forecasting models.

• Example: Predicting customer churn or future sales.


Module 2

1. Types of Predictive Analytics

• These are the main approaches to predicting future outcomes:

• Classification Models
➤ Predict categories (e.g., spam or not spam)
➤ Used in email filters, fraud detection

• Regression Models
➤ Predict numeric values (e.g., sales, revenue)
➤ Used in forecasting

• Time Series Analysis


➤ Predict future trends based on time-based data
➤ Used in stock price or demand forecasting

• Clustering
➤ Group similar data (not direct prediction but used for segmentation)
➤ Used in market research

• 2. Techniques of Predictive Analytics

• These are the tools and methods used:

• Linear & Logistic Regression


➤ Predict continuous and binary outcomes

• Decision Trees
➤ Easy-to-understand visual models for decision-making

• Neural Networks & Deep Learning


➤ Powerful for complex patterns like image/speech recognition

• Random Forest, XGBoost


➤ Advanced ensemble techniques for higher accuracy

• Machine Learning (ML)


➤ Used for self-learning predictive models

• 3. Applications of Predictive Analytics

• Manufacturing

• Healthcare

• Personalized treatment plans

• Telecommunication
• Network optimization

• Fraud detection

• Supply Chain

• Inventory forecasting

• Optimize delivery routes

• Supplier risk assessment

• Information Technology

• Cyber threat prediction

• Server failure prediction

• IT resource planning

• 4. Digital Analytics (Simplified)

• Definition:

• Digital Analytics is the analysis of digital data from websites, mobile apps, social media, etc.,
to optimize user experience and business outcomes.

• Key Focus Areas:

• Website traffic (page views, bounce rate)

• User behavior (clicks, time spent)

• Conversion tracking (sales, signups)

• Social media metrics (likes, shares, comments)

• Tools Used:

• Google Analytics

• Adobe Analytics

• Social media insights (Meta, X, LinkedIn)

• Heatmaps (Hotjar, Crazy Egg)

MODULE 3

• Simple Linear Regression (SLR) is a statistical method to predict the value of one variable (Y)
using one independent variable (X).

• It fits a straight line to the data:


Y = β₀ + β₁X + ε
Where:

o Y = dependent variable (outcome)


o X = independent variable (predictor)

o β₀ = intercept

o β₁ = slope

o ε = error term

2. Importance of SLR

• Helps in predicting outcomes (e.g., predicting sales based on advertising spend)

• Useful for understanding relationships between variables

• Forms the foundation for more advanced regression models

3. Types of Regression (Basic Mention)

• Simple Linear Regression – one independent variable

• Multiple Linear Regression – more than one independent variable

4. SLR Model Building Steps

1. Identify variables – Define X and Y

2. Collect data – Numerical, continuous values

3. Plot the data – Scatter plot to check linearity

4. Fit the line – Use regression to find β₀ and β₁

5. Evaluate model – Check goodness of fit

5. OLS Estimation (Ordinary Least Squares)

• A method to find best-fitting line by minimizing the sum of squared errors (differences
between actual and predicted Y)

• Gives estimates of β₀ (intercept) and β₁ (slope)

6. Model Interpretation

• Intercept (β₀): Value of Y when X = 0

• Slope (β₁): Change in Y for a one-unit change in X

• R² (R-squared): Explains how well the model fits the data (ranges from 0 to 1)
7. Model Validation

• Residual analysis: Errors should be randomly scattered (no pattern)

• Check assumptions:
Linearity
Independence
Homoscedasticity (constant variance)
Normality of residuals

• Use metrics:

o R²

o RMSE (Root Mean Square Error)

o p-value (for statistical significance of β₁)

MODULE 4

1. Multiple Linear Regression (MLR): Introduction

• MLR is used to predict the value of a dependent variable (Y) using two or more
independent variables (X₁, X₂, ..., Xn).

• General equation:
Y = β₀ + β₁X₁ + β₂X₂ + ... + βnXn + ε
Where:

o Y = Dependent variable

o X₁, X₂, ... Xn = Independent variables

o β₀ = Intercept, β₁...βn = Coefficients

o ε = Error term

2. Estimation of Regression Parameters

• Done using Ordinary Least Squares (OLS) method

• Objective: Minimize sum of squared residuals (errors)

• Output: Coefficients (β-values) that best fit the data

3. Model Diagnostics

Helps check if the model is valid and reliable:

• R²: How well independent variables explain variation in Y

• Adjusted R²: Better for MLR; adjusts for number of variables

• Residual plots: Should be randomly scattered


• p-values: Check if variables are statistically significant (p < 0.05)

• F-test: Tests overall significance of the model

4. Dummy, Derived & Interaction Variables

Dummy Variables

• Used to represent categorical data (e.g., Male = 0, Female = 1)

• Needed because regression requires numeric input

Derived Variables

• Created by transforming existing variables

• Example: log(salary), age²

Interaction Variables

• Show combined effect of two variables

• Example: X₁ * X₂ — effect of X₁ depends on X₂

5. Multicollinearity

• When independent variables are highly correlated with each other

• It distorts regression results

• Detected using:

o VIF (Variance Inflation Factor): VIF > 10 indicates a problem

• Solution:

o Drop one of the correlated variables

o Use Principal Component Analysis (PCA)

6. Model Deployment

• Deploying a model means using it in real-world applications

• Steps:

1. Finalize model

2. Convert into software/script

3. Integrate into systems (e.g., websites, apps)

4. Monitor performance
7. Demo Using Software

• Tools like Excel, Python, R, SPSS, or SAS used for MLR

• Common steps in software:


➤ Import data → Fit model → Interpret output (coefficients, R², p-values) → Plot diagnostics

MODULE 5

Logistic regression is a supervised machine learning algorithm used for classification tasks where
the goal is to predict the probability that an instance belongs to a given class or not. Logistic
regression is a statistical algorithm which analyze the relationship between two data factors. The
article explores the fundamentals of logistic regression, it's types and implementations.

• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as
0 and 1, it gives the probabilistic values which lie between 0 and 1.

• In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).

MODULE 6

1. Overview of Decision Trees

• A decision tree is a flowchart-like model used for classification and regression tasks.

• It splits data into branches based on conditions to make predictions.

• Simple to visualize, interpret, and explain.

2. Applications of Decision Trees

Used in many fields like:

• Marketing → Predict customer behavior

• Finance → Credit risk scoring

• Healthcare → Disease diagnosis

• HR → Predict employee attrition

5. Introduction to CHAID

• CHAID = Chi-Square Automatic Interaction Detector

• Used for categorical target variables

• Splits based on Chi-square tests (checks statistical significance)

• Creates multi-way splits (unlike binary splits in CART)

6. Classification and Regression Tree (CART)

• Common decision tree algorithm


• Uses binary splits only (Yes/No)

• Two types:

o Classification Tree: For categorical output (e.g., yes/no)

o Regression Tree: For numeric output (e.g., price, salary)

• Splits based on:

o Gini Index (for classification)

o Mean Squared Error (MSE) (for regression)

MODULE 7
1. Introduction to Unstructured Data Analysis

What is Unstructured Data?

• Data that doesn’t follow a fixed format

• Examples:

o Text (emails, reviews, tweets)

o Images, videos, audio

Why is it important?

• 80–90% of data today is unstructured

• Analyzing it helps discover hidden insights (e.g., customer opinions, trends)

2. Sentiment Analysis

Definition:

• A technique to determine whether text expresses positive, negative, or neutral emotion.

Example:

• “This product is amazing!” → Positive

• “Worst service ever.” → Negative

Applications:

• Customer feedback analysis

• Social media monitoring

• Brand reputation tracking


Techniques:

• Lexicon-based: Uses predefined word lists

• Machine Learning-based: Uses models like Naïve Bayes or SVM

3. Naïve Bayes Algorithm

Overview:

• A supervised machine learning algorithm based on Bayes' Theorem

• Called "naïve" because it assumes all features (words) are independent of each other

Formula:

• P(A|B) = [P(B|A) * P(A)] / P(B)

o A = class (e.g., positive/negative)

o B = input data (e.g., words in review)

How it works (for text):

1. Learn probabilities of words in each class

2. For new text, calculate probability for each class

3. Choose the class with the highest probability

Applications:

• Spam detection

• Email filtering

• Sentiment classification

• Text categorization

MODULE 8
1. Forecasting

• Forecasting is predicting future values based on past data.

• Used in sales, finance, inventory, weather prediction, etc.

• Helps in planning and decision making.

2. Time Series Analysis

• Deals with data collected over time at regular intervals (daily, monthly, yearly).

• Goal: Identify patterns/trends to forecast future points.


• Components of time series:

o Trend: Long-term upward or downward movement

o Seasonality: Repeating patterns at fixed periods (e.g., sales peak in December)

o Cyclic: Irregular, long-term fluctuations

o Random (noise): Unpredictable variations

3. Additive & Multiplicative Models

• Additive Model:
Time series = Trend + Seasonality + Noise
Use when seasonal variations are constant over time

• Multiplicative Model:
Time series = Trend × Seasonality × Noise
Use when seasonal variations increase/decrease proportionally with trend

4. Forecasting Accuracy

• Measures how close forecasted values are to actual values.

• Common metrics:

o MAD (Mean Absolute Deviation)

o MSE (Mean Squared Error)

o MAPE (Mean Absolute Percentage Error)

5. Moving Average Models

• Smoothes time series by averaging data points over a fixed window.

• Types:

o Simple Moving Average (SMA): Equal weights to all points

o Weighted Moving Average (WMA): Different weights, recent data given more
importance

6. Exponential Smoothing Techniques

• Give more weight to recent observations using a smoothing factor (α, 0 < α < 1).

• Types:

o Simple Exponential Smoothing: For data without trend or seasonality


o Holt’s Linear Trend Model: For data with trend

o Holt-Winters Model: For data with trend and seasonality

Quick Exam Summary:

Topic Key Points

Forecasting Predict future using past data

Time Series Components Trend, Seasonality, Cyclic, Random

Additive vs Multiplicative Additive = add parts; Multiplicative = multiply parts

Accuracy Metrics MAD, MSE, MAPE

Moving Average Smooth data by averaging

Exponential Smoothing Recent data weighted more

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy