0% found this document useful (0 votes)
41 views6 pages

Multiple Linear Regression

The document discusses multiple linear regression models. It provides examples using kidney function data and burger taste/texture data to demonstrate multiple linear regression analysis. Key points include: - Multiple linear regression allows for analysis of relationships between a dependent variable (Y) and multiple independent variables (X1, X2, X3, etc.). - It provides regression summaries including significance tests of each predictor, R2 values, and regression equations to explain relationships. - Additional analyses examine multicollinearity, outliers, normality, and model selection to refine the model.

Uploaded by

Felicity
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views6 pages

Multiple Linear Regression

The document discusses multiple linear regression models. It provides examples using kidney function data and burger taste/texture data to demonstrate multiple linear regression analysis. Key points include: - Multiple linear regression allows for analysis of relationships between a dependent variable (Y) and multiple independent variables (X1, X2, X3, etc.). - It provides regression summaries including significance tests of each predictor, R2 values, and regression equations to explain relationships. - Additional analyses examine multicollinearity, outliers, normality, and model selection to refine the model.

Uploaded by

Felicity
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

MULTIPLE LINEAR REGRESSION MODEL

 Similar to simple linear regression but has


more variable X
 Scatterplot — may not be applied here
because it has more than 2 variables

REGRESSION SUMMARY: Statistics > Multiple


Regression > Variables > Input Y on dependent and
X1, X2, X3... on independent > OK > Quick >
Summary: Regression results

REDUNDANCY — no multicollinearity
among the predictor variables (X)

VARIANCE INFLATION FACTOR (VIF) — Scatterplot of Y against X3


≥ 0.1 to say that there is no redundancy
 There must be no redundancy

VI F K < 5 5 ≤VI F K ≤10


VI F K > 10
No Moderate to Very severe
multicollinear severe multicollinear
ity multicollinear ity
ity

REDUNDANCY: Multiple Regression *bottom left >


Advanced > Redundancy

EXAMPLE #1: Kidney Function data Regression Summary

 n = 33 male subjects
 Y = creatinine clearance (an important
measure of kidney function)
 X1 = serum creatinine concentration
 X2 = age (years)
 X3 = weight (kg)  Above the table

Scatterplot of Y against X1
p<0.0001
Reject Ho (MLRM does not fit the data)

∴ MLRM fits the data (at least 1 predictor is


significant)

*Not necessarily all variables are significant

 p-value on the table

X 1=0.0001
X 2=0.0001
Scatterplot of Y against X2

X 3=0.001
Reject Ho (There is no significant predictors for Y)
∴ All are significant predictors for Y ¿1
 R2=0.8548  Cook’s Distance

∴ 85% gives the proportion of total variability in Y Minimum=0.0001


which can be explained by its linear relationship with
X1, X2, X3
 MLR Equation (b)
Maximum=0.350465
¿3
^y =120.0 .473−39.9393(X 1 )−0.7368 ( X 2) + 0.7764( X 3)
∴ No outliers
∴ X 1 — y is expected to decrease by 39.9393 for  Kolmogorov-Smirnov Test
every 1 unit increase in X1 holding X2 and X3
constant
p>0.20
∴ X 2 — y is expected to decrease by 0.7368 for  Lilliefors’ Test
every 1 unit increase in X2 holding X1 and X3
constant p>0.20
∴ X 3 — y is expected to increase by 0.7764 for Do not reject Ho
every 1 unit increase in X1 holding X2 and X1
constant ∴ Follows a normal distribution
 Redundancy EXAMPLE #2: Burger Example data

 n = 12 hamburger brands
 Y = flavor and texture score (0 to 100)
 X1 = price per burger
 X2 = number of calories per burger
 X3 = amount of fat per burger
 X4 = amount of sodium per burger
X 1=0.7665
 Regression Summary

X 2=0.7690
X 3=0.9766
∴ No redundancy
 Above the table
Residual Analysis

 Durbin-Watson
p<0.00075
d=2.349150≈ 2 Reject Ho (MLRM does not fit the data)
 Serial Corr.
Serial Corr .=−0.194180 ≈ 1 ∴ MLRM fits the data (at least 1 predictor is
significant)

∴ Independent  p-value on the table

 Standard Residual X 1=0.744695


Minimum=0.0001 Do not reject Ho

Maximum=0.350465 ∴ Not a significant predictor for Y


X 2=0.401317 ∴ X 3 — y is expected to increase by 2.1592 in the
taste score for every 1 unit increase in fat holding X4
Do not reject Ho constant

∴ Not a significant predictor for Y ∴ X 4 — y is expected to increase by 0.111397 in


the taste score for every 1 unit increase in sodium
X 3=0.080951 holding X3 constant

Do not reject Ho
 R2=0.9046
∴ Not a significant predictor for Y
∴ 90% gives the proportion of total variability in
taste score which can be explained by its linear
X 4=0.025675 relationship with fat content and sodium content

Reject Ho  Redundancy

∴ A significant predictor for Y X 3=0.873937


 R2=0.86655860 X 4=0.873937
o When comparing 2 models to see which

one is better, adjusted R2 must be ∴ No redundancy


compared
2 increases if a predictor is added DUMMY VARIABLES — indicator
o R variables, 1 and 0
and considers the significance and the
model complexity of the predictor
(changes if a predictor is not significant)
K−1
 Reduce the model
o Principle of Parsimony — when there  Reference Category — 0
o What is being compared to
are 2 models that provide almost the
same information, go for the simpler o “XXX lower compared to the
one *Reference Category*”
o Drop/ remove the least significant
DUMMY VARIABLE: Double click variable > Text
predictor (highest p-value) until all are
Labels > Assign 0 and 1 > OK
less than 0.05 (level of significance)
 Drop predictors one by one only
SYSTEMATIC WAY OF LABELING 1 & 0: Click
column > Data > Recode > “v1 <= 6”, value “1”, other
value (bottom right) “0”
X1 , X2 , X3 X
, X2 ,4 X 3 ,XX34, X 4
EXAMPLE:
X 1 0.744695 - -

X 2 0.401317 0.38572 - a. Gender ( K=2)

X 3 0.080951 0.065000.22738 K−1=1; thus, 1 category


X 4 0.0256750.011750.00047
R2 0.8665580.8813 0.8834
Category
X1
Female 1
 MLR Equation (b) Male 0
Comparison Female vs Male
^y =0.893815+ 2.1592 ( X 3 ) +0.111397 ( X ) *Reference Category
4
b. Skin Tone ( K=3)
K−1=2; thus, 2 categories
Category
X1 X2 increase in X 4, sodium content, holding other
Light 1 0 predictors constant.
Fair 0 1
*Dark 0 0
1, local
Comparison
*Reference Category
Light vs Dark Fair vs Dark
∴ X5= {0 ,imported — The expected

taste score is lower by 4.6585 units for local burger


c. Taste ( K=4 ) compared to the imported burger holding the other
predictors constant.
K−1=3; thus, 3 categories
*Local compared to imported (local vs imported)
Category
X1 X2 X3 EXAMPLE #3: Lung Pressure Data
Bitter 1 0 0
Sour 0 1 0  n= 19 mild to moderate chronic obstructive
Sweet 0 0 1 pulmonary disease (COPD) patients
*Salty 0 0 0  Y = invasive measure of systolic pulmonary
Comparison Bitter vs Sour vs Sweet arterial pressure
Salty Salty vs  X1 = emptying rate of blood into the
Salty pumping chamber of the heart
*Reference Category  X2 = ejection rate of blood pumped out of
the heart into the lungs
EXAMPLE #2: Burger Example data (continuation)
 X3 = blood gas
 Dummy Variable
o 1 — local
o 0 — imported

ADDING DUMMY VARIABLE: Multiple Linear


Regression > Variable > Significant data and dummy
variables > Continue with current selection > OK >  Model p-value
OK > Summary: Regression > OK
p<0.00250
Reject Ho (the MLRM does not fit the data)

∴ The MLRM fits the data


 X 5 p-value (at least 1 predictor is significant)

 p-value
X 5 =0.142937 X 1=0.172055
Do not reject Ho
Don’t Reject Ho ( X 5 is not a significant predictor) ∴ Not a significant predictor for Y
∴ X 5 is not a significant predictor X 2=0.046204
Reject Ho

 Equation (dummy variable is really not∴ A significant predictor for Y


X 1=0.8485
included but for the sake of discussion)

Do not reject Ho
^y =0.8283+ 1.8163 ( X 3 ) +0.1215 ( X 4 )−4.6585(X 5 )
∴ Not a significant predictor for Y
∴ X 4 — There is an expected increase by  Reduce the model

0.1215 units in taste score for every 1 unit X 1 , X 2 , XX


3 2 , X1 X2
X 1 0.172055 0.117801 -
 Equation
X2 0.046204
0.021488
0.000369
X3 0.84846
- - ^y =62.80103−0.6271 ( X 2 ) +12.23211 ( D

 Equation ∴ X 2 — The y value is expected to decrease by


0.6271 units for every 1 unit increase in X 2 holding
^y =71.4352−0.65192(X 2 )
other predictors constant

∴ The y value is expected to decrease by 0.65192 ∴ D1 — The expected value is higher by 12.2321
for every 1 unit increase in X 2 among young patients compared to the old patients
holding the other predictors constant
 Introducing a dummy variable
∴ D2 — The expected value is higher by
Age Group
D1 D2 11.51596 among middle aged patients compared to the
Young 1 0 old patients holding the other predictors constant
Middle 0 1
Old 0 0 EXAMPLE #4: Surgical Unit Data
*Reference Category
 n = 54 patients
 Y = Survival time
 X1 = Blood clotting score
 X2 = Prognostic index
 X3 = Enzyme function test score
 X4 = Liver function test score
 Model p-value
Scatterplot against Y and X1 (not homoscedastic;
megaphone-shaped)
p<0.00217
Reject Ho (MLRM does not fit the data)

∴ MLRM fits the data (at least 1 predictor is


significant)

 p-value

X 2 =0.000688
Reject Ho

∴ A significant predictor for Y Scatterplot Y and X2 (not homoscedastic;


megaphone-shaped)
D1=149326
Don’t Reject Ho

∴ Not a significant predictor for Y


D 2=181069
Don’t Reject Ho

∴ Not a significant predictor for Y


Scatterplot Y and X4 (not homoscedastic; not
constant variance)
∴ X 2 — There is an expected increase of
0.121401 in ln ^
y (logarithm of survival time) for
every 1 unit increase in pindex holding other
predictors constant

∴ X 3 — There is an expected increase of


0.021928 in ln ^
y (logarithm of survival time) for
every 1 unit increase in enzyme holding other
predictors constant

 Redundancy — above 0.1 for tolerance


limit

 Use ln y or y '

∴ No redundancy

 p-value

X 4=0.833248
Don’t Reject Ho

∴ Not a significant predictor for Y


 Equation

ln ^y =1.113582+0.159405 ( X 1 ) +0.121401 ( X 2 ) +0.021928 (X 3 )

∴ X 1 — There is an expected increase of


0.159405 in ln ^
y (logarithm of survival time) for
every 1 unit increase in blood clot holding other
predictors constant

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy