Multiple Linear Regression
Multiple Linear Regression
REDUNDANCY — no multicollinearity
among the predictor variables (X)
n = 33 male subjects
Y = creatinine clearance (an important
measure of kidney function)
X1 = serum creatinine concentration
X2 = age (years)
X3 = weight (kg) Above the table
Scatterplot of Y against X1
p<0.0001
Reject Ho (MLRM does not fit the data)
X 1=0.0001
X 2=0.0001
Scatterplot of Y against X2
X 3=0.001
Reject Ho (There is no significant predictors for Y)
∴ All are significant predictors for Y ¿1
R2=0.8548 Cook’s Distance
n = 12 hamburger brands
Y = flavor and texture score (0 to 100)
X1 = price per burger
X2 = number of calories per burger
X3 = amount of fat per burger
X4 = amount of sodium per burger
X 1=0.7665
Regression Summary
X 2=0.7690
X 3=0.9766
∴ No redundancy
Above the table
Residual Analysis
Durbin-Watson
p<0.00075
d=2.349150≈ 2 Reject Ho (MLRM does not fit the data)
Serial Corr.
Serial Corr .=−0.194180 ≈ 1 ∴ MLRM fits the data (at least 1 predictor is
significant)
Do not reject Ho
R2=0.9046
∴ Not a significant predictor for Y
∴ 90% gives the proportion of total variability in
taste score which can be explained by its linear
X 4=0.025675 relationship with fat content and sodium content
Reject Ho Redundancy
p-value
X 5 =0.142937 X 1=0.172055
Do not reject Ho
Don’t Reject Ho ( X 5 is not a significant predictor) ∴ Not a significant predictor for Y
∴ X 5 is not a significant predictor X 2=0.046204
Reject Ho
Do not reject Ho
^y =0.8283+ 1.8163 ( X 3 ) +0.1215 ( X 4 )−4.6585(X 5 )
∴ Not a significant predictor for Y
∴ X 4 — There is an expected increase by Reduce the model
∴ The y value is expected to decrease by 0.65192 ∴ D1 — The expected value is higher by 12.2321
for every 1 unit increase in X 2 among young patients compared to the old patients
holding the other predictors constant
Introducing a dummy variable
∴ D2 — The expected value is higher by
Age Group
D1 D2 11.51596 among middle aged patients compared to the
Young 1 0 old patients holding the other predictors constant
Middle 0 1
Old 0 0 EXAMPLE #4: Surgical Unit Data
*Reference Category
n = 54 patients
Y = Survival time
X1 = Blood clotting score
X2 = Prognostic index
X3 = Enzyme function test score
X4 = Liver function test score
Model p-value
Scatterplot against Y and X1 (not homoscedastic;
megaphone-shaped)
p<0.00217
Reject Ho (MLRM does not fit the data)
p-value
X 2 =0.000688
Reject Ho
Use ln y or y '
∴ No redundancy
p-value
X 4=0.833248
Don’t Reject Ho