0% found this document useful (0 votes)

22 views48 pages

1 Regression Analysis

Uploaded by

Ayush Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views48 pages

1 Regression Analysis

Uploaded by

Ayush Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

BUSINESS KNOWLEDGE

Cart Abandonment Analysis

Problem :
High fractions of your online customer are adding product to their cart but not
purchasing it
Examples
Business Knowledge that will be helpful

1. Discussions with the marketing team

2. Discussions with the product team
3. Dry run of the online purchasing process to understand customer journey
4. Research on industry reports regarding cart abandonment
5. Any previous work in your /other organization regarding cart Abandonment

Start-Tech Academy
Data Exploration
Next step should be to use the acquired business knowledge to search for relevant data

Identify Plan data Quality

Steps Data need request checks

Start-Tech Academy
Data Exploration
Next step should be to use the acquired business knowledge to search for relevant data

1. Internal Data
Data collected by your organization
Data E.g. Usage data, sales data, promotion data

Exploration 2. External data

Data acquired from external data sources
E.g. Census Data, External vendor Data, Scrape data

Start-Tech Academy
Data Exploration
Cart Abandonment Analysis
1. Input from the marketing team –
Our 50 % comes from email marketing, 30% from organic search and rest 20% from
ad word marketing
-> Gather the source website data for all customers
2. Input from the product team
Examples We have 3 step purchase process – Cart review, Address/personal detail, Payment
-> Gather the Cart Abandonment location for all customer
3. Input from industry reports regarding cart abandonment
Customers tends to put high value item for long duration in their cart
-> Gather the data about total Cart value of all customers
4. Input from dry run
Encountered a survey link for rate website experience
-> Gather survey data for all customers

Start-Tech Academy
DATA DICTIONARY
Next step should be to understand the data. You should know variable definition and distribution along with table’s
unique identifiers and foreign keys

A Comprehensive Data Dictionary should include

1. Definition of predictors
Data
2. Unique identifier of each table ( or Primary Keys)
Dictionary 3. Foreign keys or matching keys between tables
https://youtu.be/76Y6Tg1glrQ
4. Explanation of values in case of Categorical variables

Start-Tech Academy
DATA DICTIONARY
Data Dictionary
House Pricing Dataset

The data set contains 506 observations of house prices from different towns.
Corresponding to each house price, data of 18 other variables is available on
which price is suspected to depend

Examples price
crime_rate
Value of the house
Crime rate in that neighborhood
resid_area Proportion of residential area in the town
air_qual Quality of air in that neighborhood
room_num Average number of rooms in houses of that locality
age How old is the house construction in years
dist1 Distance from employment hub 1
dist2 Distance from employment hub 2
dist3 Distance from employment hub 3
dist4 Distance from employment hub 4
teachers Number of teachers per thousand population in the town

Start-Tech Academy
DATA DICTIONARY
Data Dictionary
House Pricing Dataset

The data set contains 506 observations of house prices from different towns.
Corresponding to each house price, data of 18 other variables is available on
which price is suspected to depend
Examples
poor_prop Proportion of poor population in the town
airport Is there an airport in the city? (Yes/No)
n_hos_beds Number of hospital beds per 1000 population in the town
n_hot_rooms Number of hotel rooms per 1000 population in the town
waterbody What type of natural fresh water source is there in the city (lake/ river/ both/ none)
rainfall The yearly average rainfall in centimeters
bus_ter Is there a bus terminal in the city? (Yes/No)
parks Proportion of land assigned as parks and green areas in the town

Start-Tech Academy
UNIVARIATE ANALYSIS
Univariate analysis is the simplest form of analyzing data. “Uni” means “one”, so in other words your data has only one
variable. It doesn’t deal with causes or relationships (unlike regression) and it’s major purpose is to describe; it takes
data, summarizes that data and finds patterns in the data.

Ways to describe patterns found in univariate data

1. Central tendency
1. Mean
2. Mode
3. Median
Univariate 2. Dispersion
Analysis 1. Range
2. Variance
3. maximum, minimum,
4. Quartiles (including the interquartile range), and
5. Standard deviation
3. Count /Null count

Start-Tech Academy
EDD (EXTENDED DATA DICTIONARY)

Example

Start-Tech Academy
Missing Value Imputation
Real-world data often has missing values. Data can have missing values for a number of reasons such as observations
that were not recorded and data corruption.

Impact
• Handling missing data is important as many machine learning algorithms do
not support data with missing values.
Solution
Missing Value • Remove rows with missing data from your dataset.
Imputation • Impute missing values with mean/median values in your dataset.
Note
• Use business knowledge to take separate approach for each variable
• It is advisable to impute instead of remove in case of small sample size or
large proportion of observations with missing values

Start-Tech Academy
Missing Value Imputation
1. Impute with ZERO
• Impute missing values with zero
2. Impute with Median/Mean/Mode
• For numerical variables, impute missing values with Mean or Median
• For categorical variables, impute missing values with Mode
Methods 3. Segment based imputation
• Identify relevant segments
• Calculate mean/median/mode of segments
• Impute the missing value according to the segments
• For example, we can say rainfall hardly varies for cities in a particular
State
• In this case, we can impute missing rainfall value of a city with the
average of that state

Start-Tech Academy
Outlier Treatment
Outlier is a commonly used terminology by analysts and data scientists, Outlier is an observation that appears far away
and diverges from an overall pattern in a sample.

Reasons
• Data Entry Errors
• Measurement Error
• Sampling error etc
Outlier Impact
Treatment • It increases the error variance and reduces the power of statistical tests
Solution
• Detect outliers using EDD and visualization methods such as scatter plot,
histogram or box plots
• Impute outliers

Start-Tech Academy
Outlier Treatment

Without Outlier With Outlier

Data 6,6,6,4,4,5,5,5,5,7,7 6,6,6,4,4,5,5,5,5,7,7,300
Mean 5.45 30.0
Example Median 5 5.5
Mode 5 5
Standard 1.04 85.03
deviation
Variance 1.08 7230.10

Start-Tech Academy
Outlier Treatment
1. Capping and Flooring
• Impute all the values above 3* P99 and below 0.3*P1
• Impute with values 3* P99 and 0.3*P1
• You can use any multiplier instead of 3, as per your business
requirement
Methods 2. Exponential smoothing
• Extrapolate curve between P95 to P99 and cap all the values falling
outside to the value generated by the curve
• Similarly, extrapolate curve between P5 and P1
3. Sigma Approach
• Identify outliers by capturing all the values falling outside 𝝁 ∓ 𝔁𝝈
• You can use any multiplier as x, as per your business requirement

Start-Tech Academy
Bivariate Analysis
Bivariate analysis is the simultaneous analysis of two variables (attributes). It explores the concept of relationship
between two variables, whether there exists an association and the strength of this association, or whether there are
differences between two variables and the significance of these differences.

Scatter Plot
• Scatter indicates the type (linear or non-linear) and strength of the
relationship between two variables
Creating new • We will use Scatter plot to transform variables
Correlation
Variables • Linear correlation quantifies the strength of a linear relationship between
two numerical variables.
• When there is no correlation between two variables, there is no tendency
for the values of one quantity to increase or decrease with the values of the
second quantity.
• Correlation is used to drop Non Usable variables

Start-Tech Academy
Scatter plots

Start-Tech Academy
Variable Transformation
Transform your existing variable to extract more information out of them

Identify
• Using your business knowledge and bivariate analysis to modify variable
Creating new Methods
Variables • Use Mean/Median of variables conveying similar type of information
• Create ratio variable which are more relevant to business
• Transform variable by taking log, exponential, roots etc.

Start-Tech Academy
Transformation

If Take e^x instead of x

If Take log(1+x) instead of x

Start-Tech Academy
Transformation

𝑛
If Take 𝑥 𝑜𝑟 𝑥 instead of x

Start-Tech Academy
Scatter plots

Start-Tech Academy
Variable Transformation
Transform your existing variable to extract more information out of them

Start-Tech Academy
Transformation

If Take e^x instead of x

If Take log(1+x) instead of x

Start-Tech Academy
Transformation

𝑛
If Take 𝑥 𝑜𝑟 𝑥 instead of x

Start-Tech Academy
Correlation
Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. A
positive correlation indicates the extent to which those variables increase or decrease in parallel; a negative correlation
indicates the extent to which one variable increases as the other decreases.

Examples
Some examples of data that have a high correlation:
• Your caloric intake and your weight.
Correlation • The amount of time your study and your GPA.

Some examples of data that have a low correlation (or none at all):
• A dog’s name and the type of dog biscuit they prefer.
• The cost of a car wash and how long it takes to buy a soda inside the
station.

Start-Tech Academy
The Correlation Coefficient
Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. A
positive correlation indicates the extent to which those variables increase or decrease in parallel; a negative correlation
indicates the extent to which one variable increases as the other decreases.

Definition
• A correlation coefficient is a way to put a value to the relationship.
• Correlation coefficients have a value of between -1 and 1.
• A “0” means there is no relationship between the variables at all,
Correlation • While -1 or 1 means that there is a perfect negative or positive correlation
Coefficient Example

Start-Tech Academy
Correlation vs Causation
Causation : The relation between something that happens and the thing that causes it . The first thing that happens is
the cause and the second thing is the effect .

Correlation
vs
Causation

Source :http://www.tylervigen.com/spurious-correlations
Start-Tech Academy
The Correlation Matrix
Definition
• A correlation matrix is a table showing correlation coefficients between variables.
• Each cell in the table shows the correlation between two variables.
• A correlation matrix is used as a way to summarize data, as an input into a more
advanced analysis, and as a diagnostic for advanced analyses.
Example
Correlation
Matrix

Application
• To summarize a large amount of data where the goal is to see patterns.
• To Identify collinearity in the data

Start-Tech Academy
Multicollinearity

Definition
• Multicollinearity exists whenever two or more of the predictors in a regression
model are moderately or highly correlated.
Effects
• Multicollinearity results in a change in the signs as well as in the magnitudes of
the partial regression coefficients from one sample to another sample.
• Multicollinearity makes it tedious to assess the relative importance of the
Multicollinearity independent variables in explaining the variation caused by the dependent
variable.
Solution
• Remove highly correlated independent variables by looking at the correlation
matrix and VIF

Start-Tech Academy
Dummy Variable
A Dummy variable or Indicator Variable is an artificial variable created to represent an attribute with two or
more distinct categories/levels.

Why
• Regression analysis treats all independent (X) variables in the analysis as
numerical.
• Nominal variables, or variables that describe a characteristic using two or more
categories, are commonplace in regression research, but are not always useable
in their categorical form.
Dummy Variable • Dummy coding is a way of incorporating nominal variables into regression
analysis

How
• We can make a separate column, or variable, for each category.
• This new variables can take value 0 or 1 depending on the value of the
categorical variable

Start-Tech Academy
Dummy Variable
A Dummy variable or Indicator Variable is an artificial variable created to represent an attribute with two or
more distinct categories/levels.

Student Favorite class Science Math

1 Science 1 0
2 Science 1 0
3 English 0 0
4 Math 0 1

Dummy Variable Things to keep in mind

• The number of dummy variables necessary to represent a single attribute
Example variable is equal to the number of levels (categories) in that variable minus
one.
• We cannot code variables like science = 1, math = 2, and English = 3. As, we
can see that there is no such thing as an increase in favorite class – math is
not higher than science, and is not lower than language either. And even if
there is increase , we cannot quantify that increase

Start-Tech Academy
Linear Regression
linear regression is a linear approach to modelling the relationship between a dependent variable and one or more
independent variables

Introduction

Start-Tech Academy
Linear Regression

Here are a few important questions that we might seek to address:

1. Prediction Question
Questions How accurately can I predict the price of a house , given the values of all
variables
2. Inferential Question
How accurately can we estimate the effect of each of this variables on the
house price

Start-Tech Academy
Simple Linear Regression
Simple linear regression is an approach for predicting a quantitative response Y on the basis of a single predictor
variable X. It assumes that there is approximately a linear relationship between X and Y .

Model Equation
𝒀 ≈𝛽0 + 𝛽1 𝑋
𝛽0 is known as Intercept
𝛽1 is known as slope
Together 𝛽0 and 𝛽1 known as the model coefficients or parameters.
Introduction
For House Price data
• X will represent Room_num
• Y will represent Price
Price ≈𝛽0 + 𝛽1 × Room_num

From our training data we will get 𝛽0 and 𝛽𝟏

Start-Tech Academy
Simple Linear Regression

• Our goal is to obtain coefficient estimates 𝛽0 and 𝛽𝟏 such that the linear
model fits the available data well
• Total number of rows (Data Point) ⇒ 𝑛 = 506
• Data ⇒ 𝑥1, 𝑦1 , 𝑥2, 𝑦2 , 𝑥3, 𝑦3 , ………………… 𝑥506, 𝑦506
• Lets call calculated 𝑦 value as 𝑦
Estimating the 𝑦1 = 𝛽0 + 𝛽𝟏 𝑥1
Coefficients 𝑦2 = 𝛽0 + 𝛽𝟏 𝑥2
𝑦506 = 𝛽0 + 𝛽𝟏 𝑥506
• The difference between residual the ith observed response value and the
ith response value that is predicted by our linear model is known as residual
𝑒𝑖 = 𝑦𝑖 − 𝑦𝑖

Start-Tech Academy
Simple Linear Regression
Residual –
The difference between residual the ith observed response value and the ith
response value that is predicted by our linear model is known as residual
𝑒𝑖 = 𝑦𝑖 − 𝑦𝑖

Residual

Start-Tech Academy
Simple Linear Regression

Residual sum of squares (RSS)

𝑅𝑆𝑆 = 𝑒12 + 𝑒22 … … . +𝑒𝑛2

The least squares approach chooses 𝛽0 and 𝛽𝟏 to minimize the RSS

RSS Using some calculus, one can show that the minimizers are

Start-Tech Academy
Simple Linear Regression

For our Model

Model

Start-Tech Academy
Simple Linear Regression
we assume that the true relationship between X and Y takes the form Y = f(X) + ε for some unknown function f, where
ε is a mean-zero random error term.

If f is to be approximated by a linear function, then we

can write this relationship as

𝒀 = 𝛽0 + 𝛽1 𝑋 +ε
Assessing the 𝛽0 is known as Intercept
𝛽1 is known as slope
Accuracy ε is an error term

Population regression line

Sample regression line

Start-Tech Academy
Simple Linear Regression

𝜎 2 = 𝑉𝑎𝑟 𝜀
σ2 is not known, but can be estimated from the data. This estimate is known as
Standard error the residual standard error (RSE)

In Coefficients
There is approximately a 95% chance that the interval

will contain the true value of 𝛽1

Start-Tech Academy
Simple Linear Regression

Is there any relationship between X and Y

𝒀 = 𝛽0 + 𝛽1 𝑋
Hypothesis
• If 𝛽1 is zero, it means there is no relationship
tests Ho : There is no relationship between X and Y
Ha : There is some relationship between X and Y
H : 𝛽1 = 0
Ha : 𝛽1 ≠ 0,

Start-Tech Academy
Simple Linear Regression

• To disapprove Ho, we calculate T statistics

• We also compute the probability of observing any value equal to |t| or

Larger
• We call this probability the p-value
Hypothesis • A small p-value means there is an association between the predictor and
tests the response (typically less than 5% or 1 %)

Start-Tech Academy
Simple Linear Regression

The quality of a linear regression fit is typically assessed using two related
quantities: the residual standard error (RSE) and the 𝑅 2 statistic.

Residual Standard Error

Quality of Fit
𝑹𝑺𝑬 • RSE is the average amount that the response will deviate from the true
regression line
• RSE is also considered as a measure of lack of fit of the model to the data

Start-Tech Academy
Simple Linear Regression
The RSE provides an absolute measure of lack of fit of the model to the data.

𝑹𝟐
• 𝑹𝟐 is the proportion of variance explained
• 𝑹𝟐 always takes on a value between 0 and 1,
• 𝑹𝟐 is independent of the scale of Y.
Quality of Fit
𝑹𝟐
• TSS - total sum of squares
• RSS - residual sum of squares

Start-Tech Academy
Multiple Linear Regression
In Multiple linear regression more than one predictor variables are used to predict the response variable

Relationship for Multiple linear regression can be written

𝛽0 is known as Intercept
Multiple Linear p is the number of predictors
ϵ is an error term
Regression
For our Model,
The equation is
𝑷𝒓𝒊𝒄𝒆 = 𝛽0 + 𝛽1 Crime_rate +𝛽𝟐poor_pop … … . . 𝛽𝟏𝟔avg_dist

Start-Tech Academy
Multiple Linear Regression

Estimating
Regression
Coefficients

Start-Tech Academy
Multiple Linear Regression

Estimating
Regression
Coefficients

Start-Tech Academy

Trend Lines Case Study
No ratings yet
Trend Lines Case Study
5 pages
Dap
No ratings yet
Dap
1,254 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
A Comprehensive Guide To Data Exploration: Steps of Data Exploration and Preparation Missing Value Treatment
100% (2)
A Comprehensive Guide To Data Exploration: Steps of Data Exploration and Preparation Missing Value Treatment
8 pages
3rd Session. Slides
No ratings yet
3rd Session. Slides
58 pages
EDA
100% (1)
EDA
9 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
45 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
EDA - Task
No ratings yet
EDA - Task
20 pages
Machine Learning Unit 2
No ratings yet
Machine Learning Unit 2
71 pages
Ch1 2
No ratings yet
Ch1 2
23 pages
6 - InnovatiCS - Data Visualization (Numerical & Graphical Descriptive Statistics)
No ratings yet
6 - InnovatiCS - Data Visualization (Numerical & Graphical Descriptive Statistics)
96 pages
DM LAQs (CT 1)
No ratings yet
DM LAQs (CT 1)
40 pages
Practical - 1 - Data Exploration and Data Preparation - DAL - Lab
100% (1)
Practical - 1 - Data Exploration and Data Preparation - DAL - Lab
8 pages
Marketing Engineering and Analytics
No ratings yet
Marketing Engineering and Analytics
52 pages
04 - 09 - Variable Transformation
No ratings yet
04 - 09 - Variable Transformation
6 pages
Data Wrangling
No ratings yet
Data Wrangling
18 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
04 05 PDE Missing Value
No ratings yet
04 05 PDE Missing Value
3 pages
Big Data Chapter 3
No ratings yet
Big Data Chapter 3
29 pages
02 - Data Exploration: IS5740: Management Support and Business Intelligence Systems
No ratings yet
02 - Data Exploration: IS5740: Management Support and Business Intelligence Systems
37 pages
Guide Data Exploration
No ratings yet
Guide Data Exploration
16 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
M6 Predictive Analytics Presentation
No ratings yet
M6 Predictive Analytics Presentation
49 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Econ f241 Econometric Methods - Sem II, 2016-17
No ratings yet
Econ f241 Econometric Methods - Sem II, 2016-17
4 pages
Predictive Modeling Business Report Seetharaman Final Changes PDF
100% (1)
Predictive Modeling Business Report Seetharaman Final Changes PDF
28 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
19 pages
L18&19 Data Exploration
No ratings yet
L18&19 Data Exploration
50 pages
Module 3 Data Preparation
No ratings yet
Module 3 Data Preparation
33 pages
Business Analytics
No ratings yet
Business Analytics
56 pages
PA Summary Sheet
No ratings yet
PA Summary Sheet
9 pages
Data Analytics and Its Processess - Models - Methods
No ratings yet
Data Analytics and Its Processess - Models - Methods
55 pages
CC&BD Unit 4
No ratings yet
CC&BD Unit 4
12 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
EDA - Zep
No ratings yet
EDA - Zep
33 pages
Extreme Value Theory Introduction DeHaanFerreira
No ratings yet
Extreme Value Theory Introduction DeHaanFerreira
421 pages
MCQS of Presentation
No ratings yet
MCQS of Presentation
2 pages
Data Exploration
No ratings yet
Data Exploration
23 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
56 pages
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
No ratings yet
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
20 pages
A Guide To Data Exploration
No ratings yet
A Guide To Data Exploration
20 pages
Interval Estimation Interval (CI) Estimation: Course No: MATH F113
No ratings yet
Interval Estimation Interval (CI) Estimation: Course No: MATH F113
46 pages
Summary Chapter 5 - 7 - Group 4
No ratings yet
Summary Chapter 5 - 7 - Group 4
47 pages
Chapter2 BI
No ratings yet
Chapter2 BI
77 pages
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Lesson 04 Data Analytics Overview
No ratings yet
Lesson 04 Data Analytics Overview
47 pages
Fundamentals of Biostatistics 8th Edition by Bernard Rosner 130526892X 9798214344201 - Read The Ebook Online or Download It To Own The Full Content
100% (10)
Fundamentals of Biostatistics 8th Edition by Bernard Rosner 130526892X 9798214344201 - Read The Ebook Online or Download It To Own The Full Content
90 pages
Chapter 1 Introduction To Multivariate Data Analysis
No ratings yet
Chapter 1 Introduction To Multivariate Data Analysis
15 pages
Business Analytics Data Science For Business Problems (Walter R. Paczkowski)
No ratings yet
Business Analytics Data Science For Business Problems (Walter R. Paczkowski)
416 pages
1 ASAP Business Analytics Introduction
No ratings yet
1 ASAP Business Analytics Introduction
25 pages
GMM by Wooldridge
No ratings yet
GMM by Wooldridge
15 pages
KDM Formula Sheet
No ratings yet
KDM Formula Sheet
12 pages
Statistical Intervals
No ratings yet
Statistical Intervals
27 pages
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
No ratings yet
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
76 pages
Unit 3-BA
No ratings yet
Unit 3-BA
31 pages
Module Data Analysis
No ratings yet
Module Data Analysis
6 pages
Unit 4
No ratings yet
Unit 4
33 pages
Ordinary Least Square Estimation
No ratings yet
Ordinary Least Square Estimation
30 pages
Abacus Break The Modelling Taboo Break T
No ratings yet
Abacus Break The Modelling Taboo Break T
10 pages
Data Analytics Part 3
No ratings yet
Data Analytics Part 3
54 pages
7 - Developing Models For Optimization
No ratings yet
7 - Developing Models For Optimization
18 pages
University of Rochester: APS 425 Professor G. William Schwert
No ratings yet
University of Rochester: APS 425 Professor G. William Schwert
7 pages
Punim Diplomeee
No ratings yet
Punim Diplomeee
60 pages
Module 3 - Regression
No ratings yet
Module 3 - Regression
55 pages
Probability and Stat Unit 1
No ratings yet
Probability and Stat Unit 1
12 pages
Estimation of Parameters
No ratings yet
Estimation of Parameters
37 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
BA Chatgpt Notes
No ratings yet
BA Chatgpt Notes
27 pages
Introduction To Statistical Thinking
No ratings yet
Introduction To Statistical Thinking
380 pages
Data Analytics Program - Introduction To Data Analytics - Lesson 1
No ratings yet
Data Analytics Program - Introduction To Data Analytics - Lesson 1
56 pages
MTD M2
No ratings yet
MTD M2
35 pages
Snijders TAB-Multilevel Analysis An Introduction To Basic and Advanced Multilevel Modeling-Pp74-93
No ratings yet
Snijders TAB-Multilevel Analysis An Introduction To Basic and Advanced Multilevel Modeling-Pp74-93
21 pages
Training 2
No ratings yet
Training 2
28 pages
Lean Six Sigma Mod 5
No ratings yet
Lean Six Sigma Mod 5
33 pages
Data Science With Python - Lesson 02 - Data Analytics Overview
No ratings yet
Data Science With Python - Lesson 02 - Data Analytics Overview
54 pages
Sequencing
No ratings yet
Sequencing
9 pages
Post Hoc Test: Descriptives
No ratings yet
Post Hoc Test: Descriptives
3 pages
Analisis Faktor Kondisi Ekonomi, Tingkat Pendidikan Dan Kemampuan Berwirausaha Terhadap Kinerja Usaha Bagi Pengusaha Pindang Di Desa Cukanggenteng
No ratings yet
Analisis Faktor Kondisi Ekonomi, Tingkat Pendidikan Dan Kemampuan Berwirausaha Terhadap Kinerja Usaha Bagi Pengusaha Pindang Di Desa Cukanggenteng
12 pages
Rolling Regression Theory
No ratings yet
Rolling Regression Theory
30 pages
Solution Mid Sem SP2023 PE216
No ratings yet
Solution Mid Sem SP2023 PE216
4 pages
MTD m1 Notes
No ratings yet
MTD m1 Notes
43 pages
Quality Function Deployment
No ratings yet
Quality Function Deployment
26 pages
Terro Project File
No ratings yet
Terro Project File
53 pages
Multinomial Logistic Regression Models: Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1
No ratings yet
Multinomial Logistic Regression Models: Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1
5 pages
Tutorial Questions (Module 2)
No ratings yet
Tutorial Questions (Module 2)
3 pages
Estimation of Parameter
No ratings yet
Estimation of Parameter
10 pages
Akar
No ratings yet
Akar
2 pages
ECD202 Lec04 2023
No ratings yet
ECD202 Lec04 2023
9 pages
Lecture 5
No ratings yet
Lecture 5
127 pages
Stats Medic Unit 6 Important Ideas
No ratings yet
Stats Medic Unit 6 Important Ideas
5 pages
Multivariate Analysis: y N P V A
No ratings yet
Multivariate Analysis: y N P V A
2 pages
Assignment - Econometrics For Finance
No ratings yet
Assignment - Econometrics For Finance
2 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
Homework 5
No ratings yet
Homework 5
6 pages
Case Study Solution - Product Operations For Turtle Finance
No ratings yet
Case Study Solution - Product Operations For Turtle Finance
9 pages
Panel Data V
No ratings yet
Panel Data V
28 pages
1 - Placement Details (UG)
No ratings yet
1 - Placement Details (UG)
8 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

1 Regression Analysis

Uploaded by

1 Regression Analysis

Uploaded by

BUSINESS KNOWLEDGE

Cart Abandonment Analysis

1. Discussions with the marketing team

Identify Plan data Quality

Exploration 2. External data

A Comprehensive Data Dictionary should include

Ways to describe patterns found in univariate data

Without Outlier With Outlier

If Take e^x instead of x

If Take log(1+x) instead of x

If Take e^x instead of x

If Take log(1+x) instead of x

Student Favorite class Science Math

Dummy Variable Things to keep in mind

Here are a few important questions that we might seek to address:

From our training data we will get 𝛽0 and 𝛽𝟏

Residual sum of squares (RSS)

𝑅𝑆𝑆 = 𝑒12 + 𝑒22 … … . +𝑒𝑛2

The least squares approach chooses 𝛽0 and 𝛽𝟏 to minimize the RSS

For our Model

If f is to be approximated by a linear function, then we

Population regression line

will contain the true value of 𝛽1

Is there any relationship between X and Y

• To disapprove Ho, we calculate T statistics

• We also compute the probability of observing any value equal to |t| or

Residual Standard Error

Relationship for Multiple linear regression can be written

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.