STA302 Week09 Full
STA302 Week09 Full
1/39
Last Week
• Review on matrices
• Simple linear regression model in matrix form.
Yi = —0 + —1 Xi + ‘i , i = 1, . . . , n
2/39
Last Week (contd..)
• If ‘ ≥ N(0, ‡ 2 I), then we have
• Y≥ N(X—X—
X—,‡ ‡2 I)
b≥ N(— ‡2 (X Õ X )≠1 )
— ,‡
fit
•
• X—
Ŷ ≥ N(X—
X—,‡ ‡2 H)
• e≥ N(00,‡ 2
‡ (I-H))
• XhÕ —
Ŷh ≥ N(X —,‡‡2 XhÕ (X Õ X )≠1 Xh )
• ANOVA in matrix form
ÿ ÿ ÿ 1
SSTO = (Yi ≠ Ȳ )2 = Yi2 ≠ ( Yi )/n = YÕ (I ≠ J)Y
n
SSE = e Õ e = (Y ≠ Xb)Õ (Y ≠ Xb) = YÕ Y ≠ bÕ XÕ Y = YÕ (I ≠ H)Y
ÿ 1 1
SSR = (Ŷi ≠ Ȳ )2 = bÕ XÕ Y ≠ YÕ JY = YÕ [H ≠ J]Y
n n
• Mean response and prediction of new observation
1 (Xh ≠ X̄ )2
V (Ŷh ) = ‡ 2 [ + q ],
n (Xi ≠ X̄ )2
∆ s 2 (Ŷh ) = MSE (XhÕ (X Õ X )≠1 Xh ), spred
2
= MSE (1 + XhÕ (X Õ X )≠1 Xh ).
3/39
Week 09- Learning objectives & Outcomes
4/39
Chapter 6:
Multiple Linear Regression
5/39
Multiple Regression Models
EMI pot A K
pzxzt Bfa ) Py Dit Ppt Xp
= + t -
-
6/39
First-order Models with Two Predictors Variables
Yi = —0 + —1 Xi1 + —2 Xi2 + ‘i , i = 1, . . . , n
- -
.
#R×
.
.
'
' for fixed xz
: '
X Xz=kz
8/39
First-order Models with Two Predictor Variables (contd.)
9/39
Multiple linear regression (MLR) model
mm
11/39
MLR Model with More than One Predictor Variables
12/39
MLR model with dummy variable of binary levels.
• —0 = E (Y |D = 0), —1 = E (Y |D = 1) ≠ E (Y |D = 0)
• Reg. Coef. estimators:
b0 = —ˆ0 = Ȳ (D = 0), b1 = —ˆ1 = Ȳ (D = 1) ≠ Ȳ (D = 0)
• MLR model:
Yi = —0 + —1 Xi1 + —2 Di + ‘i , ‘i ≥ N(0, ‡ 2 )
• —0 = E (Y |X1 = 0, D = 0),
• —1 = E (Y |X1 = x + 1, D = d0 ) ≠ E (Y |X1 = x , D = d0 ), d0 = 0/1
• —2 = E (Y |X1 = x0 , D = 1) ≠ E (Y |X1 = x0 , D = 0)
13/39
MLR model with dummy variable of binary levels
Do :
Yiipotpixiitei
two sub models {
p
Xit
Del lie Potpa ) t P
:
'
, Ei
Yi = —0 + —1 Xi1 + —2 Di + ‘i , ‘i ≥ N(0, ‡ 2 )
:
Yitpotf) TAXIHEI
XIHEI
:
Yiipotp ,
fixed Debi
→
.tk#whenoX=1
~
vertical distance between
two sub models
The .
14/39
MLR model with factor variable of multiple levels.
• – = E (Yblue≠collar ),
• “2 = E (Ywhite≠collar ) ≠ E (Yblue≠collar )
• “3 = E (Yprofessional ) ≠ E (Yblue≠collar )
• Regression Coefficient estimators:
• b0 = –
ˆ = Ȳ (blue-collar),
• b1 = “
ˆ2 = Ȳ (white-collar)-Ȳ (blue-collar),
• b2 = “
ˆ3 = Ȳ (professional)-Ȳ (blue-collar)
15/39
MLR model with factor variable of multiple levels (contd.)
⇐
a • This model describes three parallel regression planes, which can differ
in their intercepts
• • Blue collar: Yi = – + —1 Xi1 + —2 Xi2 + ‘i
{• White collar: Yi = (– + “2 ) + —1 Xi1 + —2 Xi2 + ‘i
• Professional collar: Yi = (– + “3 ) + —1 Xi1 + —2 Xi2 + ‘i
• Interpreation of regression coefficients
• – = E (Y |X1 = 0, X2 = 0, X3 =blue-collar): gives the intercept for
blue-collar model.
• “2 = E (Y |D2 = 1,other fixed)-E (Y |D2 = 0,other fixed): represents
the constant vertical distance between the parallel regression planes
for white-collar and blue-collar occupations (fixing the values of
education and income).
• “3 = E (Y |D3 = 1,other fixed)-E (Y |D3 = 0,other fixed): represents
the constant vertical distance between the parallel regression planes
for professional and blue-collar occupations (fixing the values of
education and income).
16/39
MLR model with factor variable of multiple levels (contd.)
17/39
MLR model with factor variable of multiple levels (contd.)
Etlkdtpiktfxz
white collar model
@
-
3
EHK @)tfXHfXr
th
3 professional model
2 EGKHTBHAX,tfXz
@
18/39
MLR model: Polynomial Regression (Ch8)
E (Y ) = —0 + —1 X + —2 X 2
E (Y ) = —0 + —1 X + —2 X 2 + —3 X 3
• Higher order
E (Y ) = —0 + —1 X + —2 X 2 + —3 X 3 + . . .
19/39
Example: Polynomial Regression
• Data
• Y: female steroid level.
• X: age.
• Fitted regression model
20/39
MLR with interaction term
• Two-way interaction
E (Y ) = —0 + —1 X1 + —2 X2 + —3 X3 + —3 X1 X2 X3
21/39
Visualization: two-way interaction effect
22/39
Example: two-way interaction effect
• Response (Y): income.
• Predictors: X1 denotes the eduation years; X2 denotes the gender
(D=1 for male,D=0 for female).
parallel
§ t.no parallel
(a) ,Lb)⇒
interaction
effect exists
:
Etlk Htr )t ( ptdlx
ECYK 2tpX
24/39
MLR with Transformed Variables
25/39
MLR with Interaction Effects and Combination of Cases
• By cross-product term:
26/39
MLR with Interaction Effects and Combination of Cases
(contd.)
27/39
Meaning of Linear in MLR model
Yi = —0 exp (—1 Xi ) + ‘i
28/39
Matrix approach to MLR
29/39
MLR in Matrix Form
• In matrix form Y = X — + ‘
n◊1 n◊p p◊1 n◊1
S ST TS T S T
Y1 1 X11 X12 ... X1(p≠1) —0 ‘1
WY2 X W1 X21 X22 ... X2(p≠1) X W —1 X W‘2 X
W X W XW X W X
Y = W . X = W. .. .. .. .. X W .. X + W .. X
n◊1 U .. V U .. . . . . VU . V U.V
Yn 1 Xn1 Xn2 ... Xn(p≠1) —p≠1 ‘n
30/39
MLR in Matrix Form (contd.)
Y = X — + ‘
n◊1 n◊p p◊1 n◊1
• Y: vector of responses.
• — : vector of parameters.
• X: matrix of constants (design matrix).
• ‘ : vector of independent normal random variables.
‘ ≥ N(0, ‡ 2 In◊n )
E(Y) = X—
—; Var (Y) = ‡ 2 In◊n
n◊1
• — , ‡ 2 In◊n ).
That is, Y ≥ N(X—
31/39
Estimation of Regression Coefficients in MLR model
• The MLR model
Yi = —0 Xi0 + —1 Xi1 + —2 Xi2 + . . . + —p≠1 Xi(p≠1) + ‘i , i = 1, . . . , n
• The Least squares method: the values of —0 , . . . , —p≠1 minimizes Q
n
ÿ T
Q= (Yi ≠ Ŷi )2 = (Y ≠ Xbb )(Y ≠ Xbb )T
h
i=1
ˆQ
∆ = ≠2Y Õ X + 2b T X T X
ˆb
• Setting this equation to zero: ≠2Y Õ X + 2b T X T X = 0
• Transpose both sides of the equation, we have
X ÕX b = X ÕY
p◊p p◊1 p◊1
ˆ Assume X Õ X is invertible
• Solving it for —.
∆ LSE : b = (X Õ X )≠1 X Õ Y
p◊1 p◊p p◊1
32/39
Estimation of Regression Coefficients in MLR model
(contd.)
1 Ó
L(—, ‡ 2 ) = exp
(2fi‡ )
2 n/2
n Ô
1 ÿ
≠ 2 (Yi ≠ —0 ≠ —1 Xi1 ≠ . . . ≠ —p≠1 Xi(p≠1) )2
2‡ 1
33/39
Estimation of Regression Coefficients in MLR model
(contd.) @ Xxi symmetric is matrix
* { '
Khat
'
2
( 1×1×5 ) =
varcbkvarllxlxitxy )
xtxixvarlyl.li#xTT=rzlxTx5'xTxInh-*=rixxi
=
(
↳
b = (X Õ X )≠1 X Õ Y
'
(X Õ X )≠1 X Õ Y )=(X
• E(b)=E((X
p◊p p◊1
(X Õ X )≠1 X Õ E (Y) = — HIT
• Var(b)=Var((X (X X )≠1 X Õ
(X X ) X Y)=‡ 2(X Õ X )≠1 ,
Õ
(X Õ X )≠1
• Var(b) is estimated by s 2 (b)=MSE(X
2
• b≥ N(—— , ‡ (X X ) )
Õ ≠1
34/39
Fitted Values and Residuals ⇐
iii. =kiIaIi¥nH"
it
II hi ,
. hin Fi hijtj
=
,
. .
• E(Ŷ )=E(HY)=HX— X—
— =X
• Var(Ŷ )=Var(HY)=‡ 2 H.
• Ŷ ≥ N(X —, ‡ 2 H)
X—
• The vector of residuals
e = Y ≠ Ŷ = (I-H)Y
n◊1
• E(e)=E((I-H)Y)=Xb b ≠ X—
—=0
• Var(e)=Var((I-H)Y)=‡ 2 (I-H),estimated by s 2 (e)=MSE(I-H).
• e ≥ N(0, ‡ 2 (I-H))
35/39
Analysis of Variance Results (ANOVA)
ÿ ÿ ÿ 1
SSTO = (Yi ≠ Ȳ )2 = Yi2 ≠ ( Yi )/n = YÕ (I ≠ J)Y
n
SSE = e Õ e = (Y ≠ Xb)Õ (Y ≠ Xb) = YÕ Y ≠ bÕ XÕ Y = YÕ (I ≠ H)Y
ÿ 1 1
SSR = (Ŷi ≠ Ȳ )2 = bÕ XÕ Y ≠ YÕ JY = YÕ [H ≠ J]Y
n n
SSR
MSR =
p≠1
.se#.YLrsIIFsnD
sin :p
For
SSE
MSE =
n≠p
36/39
Analysis of Variance Results (ANOVA) (contd.)
• E{MSE}=‡ 2
• p-1=2 =) Etfkpotp Xitpzxz
,
1 ÿ ÿ
E {MSR} = ‡ 2 + [—12 (Xi1 ≠ X̄1 )2 + —22 (Xi2 ≠ X̄2 )2
2
ÿ
+ 2—1 —2 (Xi1 ≠ X̄1 )(Xi2 ≠ X̄2 )]
38/39
Upcoming topics
• Chapter 6:
• F test for regression coefficients.
• Coefficient of Multiple Determination.
• Inferences about Regression Parameters.
• Interval Estimation of —k , E(Yh ).
• F test of lack of fit
• Chapter 7:
• The extra sum of squares
39/39