0% found this document useful (0 votes)

12 views49 pages

Multiple Linear Regression 13112023 063212pm

This document provides an introduction to multiple linear regression (MLR). MLR is used to estimate the relationship between two or more independent variables and one dependent variable. MLR can help determine how strong the relationship is between independent and dependent variables and predict dependent variable values given independent variable values. MLR makes assumptions similar to simple linear regression, including normality and independence of errors. Strategies for selecting independent variables include all-in, backward elimination, forward selection, and bidirectional elimination. Python code demonstrates implementing MLR, including one-hot encoding, training/testing a model, and using backward elimination for variable selection.

Uploaded by

AHSAN HAMEED

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views49 pages

Multiple Linear Regression 13112023 063212pm

Uploaded by

AHSAN HAMEED

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Multiple Linear

Regression
Introduction to Machine Learning
Contents
1. What is multiple linear regression (MLR)

2. What multiple linear regression can help you do.

3. Assumptions of multiple linear regression

4. How to perform a multiple linear regression

i. T-test
ii. P-value
iii. The model
iv. Selecting the independent variables
v. Python Code
vi. Example of backward elimination for selection of independent variables
What is MLR Multiple linear regression is used to estimate
the relationship between two or more
independent variables and one dependent
variable.
What multiple linear regression can help
you do.
• You can use multiple linear regression when you want to know:
 How strong the relationship is between two or more independent
variables and one dependent variable (e.g. how rainfall,
temperature, and amount of fertilizer added affect crop growth).
 The value of the dependent variable at a certain value of the
independent variables (e.g. the expected yield of a crop at
certain levels of rainfall, temperature, and fertilizer addition).
Assumptions of multiple linear regression
• Multiple linear regression makes all of the same assumptions as
simple linear regression:
 The probability distribution of e is normal.
 The mean of e is zero: E(e) = 0.
 The standard deviation of e is se for all values of X.
 The set of errors associated with different values of Y are all
independent
Two or more independent
variables (predictor variables).

Design
Requirements
Sample size: >= 50 (at least 10
times as many cases as
independent variables)
The formula for a multiple linear
regression is:
• y = the predicted value of the dependent variable
• B0 = the y-intercept (value of y when all other parameters are set to 0)
• B1X1= the regression coefficient (B1) of the first independent variable (X1) (a.k.a.
the effect that increasing the value of the independent variable has on the predicted
y value)
• … = do the same for however many independent variables you are testing
• BnXn = the regression coefficient of the last independent variable
• e = model error (a.k.a. how much variation there is in our estimate of y)
best-fit line
• To find the best-fit line for each independent variable, multiple
linear regression calculates three things:
 The regression coefficients that lead to the smallest overall
model error.
 The t-statistic of the overall model.
 The associated p-value (how likely it is that the t-statistic would
have occurred by chance if the null hypothesis of no relationship
between the independent and dependent variables was true).
T-test
• In statistics, the t-statistic is the ratio of the
departure of the estimated value of a
parameter from its hypothesized value to its
standard error’
• It is meant for evaluating whether the two
sets of data are statistically significantly
different from each other.
• Q.1: Find the t-test value for the following given two sets of values:
• A = 7, 2, 9, 8 and
• B = 1, 2, 3, 4?
• Solution: For first data set:
• Number of terms in first set i.e. n_1 = 4
• Calculate mean value for first data set using formula:
• Higher values of the t-value, also called t-score, indicate that a large difference exists
between the two sample sets. The smaller the t-value, the more similarity exists
between the two sample sets.
P-value
• P-value is the lowest significance level that results in rejecting the null hypothesis.
Example • Coin toss
 Two possible outcomes
 H0 = This is a fair coin
 H1 = This is not a fair coin
• The P-value test will assume that the H0
hypothesis is true i.e., the coin is fair
• Let us assume our threshold value to be 5%
i.e., 0.05
• Let us assume the output is
 First toss output is Tail (probability = 0.5)
 First toss output is Tail and second toss output is also Tail (probability = 0.25)
 First two outputs same as before, third toss output is also Tail (probability =
0.125)
 First three outputs same as before, fourth toss output is also Tail (probability
= 0.0625)
 First four outputs same as before, fifth toss output is also Tail (probability =
0.03)
 First five outputs same as before, sixth toss output is also Tail (probability =
0.01)
 After the fourth output the statistical test is significant. Since P-value of less
than 5% indicates that the hypothesis H0 is rejected and hypothesis H1 is
accepted i.e., the coin is not fair
Selecting the independent variables being
used
• Five strategies are available for selecting the independent variables
 All in
 Backward Elimination
 Forward Selection
 Bi-directional elimination
 Score Comparison (All possible combinations)
All in
• Use all features
• Prior knowledge (Data domain expert) tell you which features to
keep and which to discard
Backward Elimination
1. Select a significance level (SL) for P-value e.g. 5% (0.05)
2. Fit the model will all predictors
3. Consider the predictor with highest P-value. If P>SL go to step
4, otherwise include the predictor in your feature set
4. Remove variable with P>SL
5. Fit model without the variable and go to step 3 if all features
have not been exhausted. Otherwise terminate
Forward selection
Select Select a significance level (SL) for P-value e.g. 5% (0.05)

Fit Fit all the predictors y->xn one at a time and select one with the lowest P-value

Keep this variable and fit all possible models with one extra predictor i.e., add one predictor to the
Keep variables you already have.

Consider the predictor with the least P-value. If P<SL, go to step 3, otherwise finish. (keep the
Consider previous model)
Select a significance level to enter (SL_Enter) and stay
(SL_stay) in the model.

Perform the next step of forward selection (new variables must

enter if P < SL_enter)

Bi-directional Pefrom all steps of backward elimination (old variables must

Elimination have P<SL_stay to stay in the model)

No new variables can enter and no old variables can exit

FIN: model is ready

All possible models 1. Select a criterion of goodness of fit
2. Construct all possible models. If N
variables the
3. Select the model with the best criterion
4. Model is ready

• Very computationally intense !!!

• We will be using backward elimination
strategy
MLR Implementation
Python
Multiple Linear Regression

Python Implementation
Importing
Dataset
Dataset
• Total 50 samples

• Three independent variables

 Administration
 Marketing spent
 State (categorical data)
 One hot encoding
 Three categories, so three dummy
variables

• One dependent variable

 Profit
Code

• One hot encoding to be applied on

column 3

• 80, 20 split
Training and testing the model
Evaluating the model
Q: Do we need to normalize the data in MLR
• A: No, we do not need to perform normalization for MLR, since the
coefficients b0, b1, b2,… in the MLR model automatically does
that.

Q: Do we need to check the assumptions of linear

regression
Some points • A: Absolutely not, for a new dataset play and experiment with it. If
there are redundant features it will perform poorly.
to remember
Q: Do we need to use some strategy to avoid the
dummy variable trap
• A: The class used here in python will automatically do that

Q: Do we have to use techniques such as backward

elimination etc, before applying MLR.
• A: No, because the class we use will automatically do that.
Example of Backward Elimination
1. Select a significance level (SL) for P-value e.g. 5% (0.05)
2. Fit the model will all predictors
3. Consider the predictor with highest P-value. If P>SL go to step
4, otherwise include the predictor in your feature set
4. Remove variable with P>SL
5. Fit model without the variable and go to step 3 if all features
have not been exhausted. Otherwise terminate
Code
• Importing the dataset
• Dividing the dataset into independent and dependent variables
• One hot encoding for the categorical data
• We do not need to cater the missing values as there are none
Inserting beta_0

x6 has highest p value, so it should be removed

x5 has highest p value so it should be removed
Now no independent variable has a p value >0.05, so we keep the remaining variables
Comparison between two approaches using
RMSE

The regression model with backward elimination shows lower RMSE

Plotting the output
Example of Forward Selection(Optional)
1. Select a significance level (SL) for P-value e.g. 5% (0.05)
2. Fit all the predictors y->xn one at a time and select one with the
lowest P-value
3. Keep this variable and fit all possible models with one extra
predictor i.e., add one predictor to the variables you already
have.
4. Consider the predictor with the least P-value. If P<SL, go to
step 3, otherwise finish. (keep the previous model)

Advanced Data Analysis - Lecture Notes
No ratings yet
Advanced Data Analysis - Lecture Notes
874 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Econometrics Lecture Notes Booklet
No ratings yet
Econometrics Lecture Notes Booklet
81 pages
Stepwise Regression
100% (2)
Stepwise Regression
28 pages
Bags of Little Bootstraps
No ratings yet
Bags of Little Bootstraps
44 pages
Lecture 4 - Multiple Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 4 - Multiple Linear Regression Imran 20022025 092939am
49 pages
Lecture 4 Intro To ML 27 03 2023 27032023 041559pm
No ratings yet
Lecture 4 Intro To ML 27 03 2023 27032023 041559pm
50 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
13 pages
Multiple Linear Regression 3
No ratings yet
Multiple Linear Regression 3
68 pages
336 Final Review Notes: Regression
No ratings yet
336 Final Review Notes: Regression
4 pages
Ch08 Part 2 - Multiple Regression
No ratings yet
Ch08 Part 2 - Multiple Regression
45 pages
Predictive Analytics
No ratings yet
Predictive Analytics
46 pages
Ch08 Part 2 - Multtiple Regression
No ratings yet
Ch08 Part 2 - Multtiple Regression
45 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
13 pages
SBE11 CH 16
No ratings yet
SBE11 CH 16
59 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
Prediction & Forecasting: Regression Analysis
No ratings yet
Prediction & Forecasting: Regression Analysis
3 pages
3 Linear Regression 3
No ratings yet
3 Linear Regression 3
10 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Y Abx BX BX: Multiple Linear Regression
No ratings yet
Y Abx BX BX: Multiple Linear Regression
48 pages
Mlmultiplelinearregression 170919114353 PDF
No ratings yet
Mlmultiplelinearregression 170919114353 PDF
8 pages
IE266 S25 Week12
No ratings yet
IE266 S25 Week12
53 pages
Multiple Regression
No ratings yet
Multiple Regression
21 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Chapter 10 Multiple Regression
No ratings yet
Chapter 10 Multiple Regression
43 pages
Regression - Part III - 2021
No ratings yet
Regression - Part III - 2021
55 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
SRM Formula Sheet
No ratings yet
SRM Formula Sheet
16 pages
ML Unit
No ratings yet
ML Unit
23 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Jurnal Asli Diagram Sa
No ratings yet
Jurnal Asli Diagram Sa
11 pages
Fsgs
No ratings yet
Fsgs
28 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Group 1 Practical
No ratings yet
Group 1 Practical
16 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
4 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Chapter 8: Multiple and Logistic Regression Learning Objectives
No ratings yet
Chapter 8: Multiple and Logistic Regression Learning Objectives
3 pages
Chap 6 MultipleLinearRegression Adjusted
No ratings yet
Chap 6 MultipleLinearRegression Adjusted
30 pages
Regn Lect 5
No ratings yet
Regn Lect 5
9 pages
Mla Unit 2
No ratings yet
Mla Unit 2
99 pages
Multilinear ProblemStatement
No ratings yet
Multilinear ProblemStatement
132 pages
SRM Formula Sheet-2
100% (1)
SRM Formula Sheet-2
11 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
7-Multiple Regression
No ratings yet
7-Multiple Regression
17 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Thesis Using Multiple Linear Regression
75% (4)
Thesis Using Multiple Linear Regression
7 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
125.785 Module 2.2
No ratings yet
125.785 Module 2.2
95 pages
Chapter 06 Linear Reg
No ratings yet
Chapter 06 Linear Reg
24 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Unit 2
No ratings yet
Unit 2
79 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Lesson 10
No ratings yet
Lesson 10
9 pages
Chapter 6 (Part Ii)
No ratings yet
Chapter 6 (Part Ii)
41 pages
RiP Final Study
No ratings yet
RiP Final Study
35 pages
Data Science Module 5 Q & A
No ratings yet
Data Science Module 5 Q & A
8 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Ielts Pattern
No ratings yet
Ielts Pattern
6 pages
Basic Electronics - 20 Steps (With Pictures) - Instructables
No ratings yet
Basic Electronics - 20 Steps (With Pictures) - Instructables
31 pages
Defence Presentation
No ratings yet
Defence Presentation
12 pages
Data Preprocessing 09112023 065121pm
No ratings yet
Data Preprocessing 09112023 065121pm
30 pages
Expt No. 3 Frequency by Quadrat Method
No ratings yet
Expt No. 3 Frequency by Quadrat Method
2 pages
超参数字典Hyperparameter Dictionary
No ratings yet
超参数字典Hyperparameter Dictionary
8 pages
A44n3 P&S R20
No ratings yet
A44n3 P&S R20
2 pages
AP Stats Module 5 Notes
No ratings yet
AP Stats Module 5 Notes
3 pages
Summative 2 SP
No ratings yet
Summative 2 SP
3 pages
Fitting With R
No ratings yet
Fitting With R
22 pages
Thesis Chi Square
100% (3)
Thesis Chi Square
5 pages
Sjasr-3-123 The Impact of Academic Performance
No ratings yet
Sjasr-3-123 The Impact of Academic Performance
4 pages
Ch3 Simple Linear Regression PDF
No ratings yet
Ch3 Simple Linear Regression PDF
24 pages
CS1B - September 2024 - Exam Paper
No ratings yet
CS1B - September 2024 - Exam Paper
6 pages
PMMT100 FT 20 2020 1
No ratings yet
PMMT100 FT 20 2020 1
12 pages
Exploratory Factor Analysis
No ratings yet
Exploratory Factor Analysis
9 pages
Exercises 9 - Decision Making 0 0
No ratings yet
Exercises 9 - Decision Making 0 0
6 pages
Chap5 Statistical Inference
No ratings yet
Chap5 Statistical Inference
38 pages
11 CORRELATION Point-Biserial
No ratings yet
11 CORRELATION Point-Biserial
26 pages
SPSS Introduction
No ratings yet
SPSS Introduction
10 pages
FYBCOM SEM-II - Correlation & Regression
No ratings yet
FYBCOM SEM-II - Correlation & Regression
3 pages
1 s2.0 S2214241X15000589 Main
0% (1)
1 s2.0 S2214241X15000589 Main
8 pages
Data Summarization in Excel
No ratings yet
Data Summarization in Excel
3 pages
Is1 Class 1 Group 6 Assignment 2
No ratings yet
Is1 Class 1 Group 6 Assignment 2
13 pages
PPT3 - Statistical Models in Simulation
No ratings yet
PPT3 - Statistical Models in Simulation
38 pages
Geostatistical Simulations of Geothermal Reservoirs: Two-And Multiple-Point Statistic Models
No ratings yet
Geostatistical Simulations of Geothermal Reservoirs: Two-And Multiple-Point Statistic Models
13 pages
Lectorial Week 6b NEW
No ratings yet
Lectorial Week 6b NEW
16 pages
IBM322 MTE 22 Feb
No ratings yet
IBM322 MTE 22 Feb
4 pages
Supervised Machine Learning Unit 3
No ratings yet
Supervised Machine Learning Unit 3
8 pages
Lampiran Quick DASH
100% (1)
Lampiran Quick DASH
24 pages
Pareto Distribution - Wikipedia
No ratings yet
Pareto Distribution - Wikipedia
86 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Multiple Linear Regression 13112023 063212pm

Uploaded by

Multiple Linear Regression 13112023 063212pm

Uploaded by

Multiple Linear

2. What multiple linear regression can help you do.

3. Assumptions of multiple linear regression

4. How to perform a multiple linear regression

Perform the next step of forward selection (new variables must

Bi-directional Pefrom all steps of backward elimination (old variables must

No new variables can enter and no old variables can exit

FIN: model is ready

• Very computationally intense !!!

• Three independent variables

• One dependent variable

• One hot encoding to be applied on

Q: Do we need to check the assumptions of linear

Q: Do we have to use techniques such as backward

x6 has highest p value, so it should be removed

The regression model with backward elimination shows lower RMSE

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.