IE266 S25 Week12
IE266 S25 Week12
𝑌 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜀
Regression coefficient:
Expected change in 𝑦 per unit increase in
𝑥𝑘 , when all other independent variables
are constant.
Regression model
𝑌 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜀
~𝑁(0, 𝜎 2 )
iid 𝑁(0, 𝜎 2 )
Regression equation
E[𝑌] = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘
In vector notation:
𝛽0 𝜀1
𝑦1 𝑌1 1 𝑥11 … . 𝑥1𝑘
𝛽1 𝜀2
𝑦2 𝑌 1 𝑥21 … . 𝑥2𝑘 𝜀= …
𝑦= … Y= 2 𝑥=
…
𝛽 = 𝛽2
…
… 𝜀𝑛
𝑦𝑛 𝑛×1 𝑌𝑛 1 𝑥𝑛1 … . 𝑥𝑛𝑘 𝑛×(𝑘+1)
𝑛×1
𝑛×1 𝛽𝑘 (𝑘+1)×1
Regression model
𝛽0
𝑌1 1 𝑥11 … . 𝑥1𝑘 𝜀1
𝛽1 𝜀2
𝑌 1 𝑥21 … . 𝑥2𝑘
𝑌 = 𝑥𝛽 + 𝜀 ⇒ 2 = 𝛽2 + …
… …
… 𝜀𝑛
𝑌𝑛 1 𝑥𝑛1 … . 𝑥𝑛𝑘
𝛽𝑘
Regression equation
𝐸[𝑌] = 𝑥𝛽
METU, IE266, S25 4
Step 3- Estimate 𝜷𝟎 , 𝜷𝟏 , ⋯ , 𝜷𝒌
𝑖=1
In vector notation
𝑚𝑖𝑛 = 𝑦 − 𝑥𝛽 𝑇 (𝑦 − x𝛽)
= 𝑦 𝑇 𝑦 − 𝑥𝛽 𝑇 𝑦 − 𝑦 𝑇 𝑥𝛽 + 𝑥𝛽 𝑇 𝑥𝛽
= 𝑦 𝑇 𝑦 − 2𝛽 𝑇 𝑥 𝑇 𝑦 + 𝛽 𝑇 𝑥 𝑇 𝑥𝛽
𝜕𝐿
= −2𝑥 𝑇 𝑦 + 2𝑥 𝑇 𝑥𝛽 = 0 ⇒ 2𝑥 𝑇 𝑥𝛽 = 2𝑥 𝑇 𝑦
𝜕𝛽
𝛽መ = 𝑥 𝑇 𝑥 −1 𝑥 𝑇 𝑦
E 𝛽መ = 𝐸 𝑥 𝑇 𝑥 −1 𝑥 𝑇 𝑌 = 𝐸 𝑥𝑇𝑥 −1 𝑥 𝑇 (𝑥𝛽 + 𝜀)
=𝛽
𝑐𝑜𝑣 𝛽መ = 𝜎 2 𝑥 𝑇 𝑥 −1
𝑣𝑎𝑟 𝛽መ = 𝜎 2 𝑥 𝑇 𝑥 −1
𝑗𝑗
0 and 𝛽
Find 𝛽 1 using matrix
operations.
𝑇 𝑇
𝑆𝑆𝐸 = 𝑦 − 𝑥 𝛽መ 𝑦 − 𝑥 𝛽መ = 𝑦 𝑇 𝑦 − 𝑦 𝑇 𝑥 𝛽መ − 𝑥 𝛽መ 𝑦 + 𝛽መ 𝑇 𝑥 𝑇 𝑥 𝛽መ
(𝑛 − 𝑘 − 1) 𝑥𝑇𝑦
= 𝑦 𝑇 𝑦 − 𝑦 𝑇 𝑥 𝛽መ
(𝑛 − 𝑝)
= 𝑦 𝑇 𝑦 − 𝑦 𝑇 𝑦ො
𝑆𝑆𝐸
𝑀𝑆𝐸 = = 𝜎ො 2
𝑛−𝑘−1
𝛽𝑗 ~𝑁(𝛽𝑗 , 𝜎 2 𝑥 𝑇 𝑥 −1
𝑗𝑗 )
Steps of HT:
0. Parameter: 𝛽𝑗
1. 𝐻0 : 𝛽𝑗 = 0
𝐻𝑎 : 𝛽𝑗 ≠ 0
𝑗 −0
𝛽
2. Test statistic : 𝑡0 = ~𝑡𝑛−𝑝
−1
𝑀𝑆𝐸 𝑥 𝑇 𝑥 𝑗𝑗
Tests the significance of 𝛽𝑗 , given that other variables are in the model.
METU, IE266, S25 12
Step 5 - Tests on individual regression coefficients
𝐻0 : 𝛽1 = 0 𝐻0 : 𝛽2 = 0 𝐻0 : 𝛽𝑘 = 0
…. 𝐻𝑎 : 𝛽𝑘 ≠ 0
𝐻𝑎 : 𝛽1 ≠ 0 𝐻𝑎 : 𝛽2 ≠ 0
10
𝑃 𝑓𝑎𝑖𝑙 𝑡𝑜 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒 = 0.95 ≅ 0.6
𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑘 = 0
𝐻𝑎 : 𝐻0 𝑖𝑠 𝑛𝑜𝑡 𝑡𝑟𝑢𝑒
n-1 k n-p
𝑅2
Steps of HT: 𝑘
Note 𝐹0 = (1−𝑅2 )
where
1. 𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑘 = 0 𝑛−𝑝
𝑆𝑆𝑅
𝐻𝑎 : 𝐻0 𝑖𝑠 𝑛𝑜𝑡 𝑡𝑟𝑢𝑒 𝑅2 =
𝑆𝑆𝑇
𝑆𝑆𝑅
𝑀𝑆𝑅 𝑘
2. Test statistic : 𝐹0 = = 𝑆𝑆𝐸 ~𝐹𝑘,𝑛−𝑝
𝑀𝑆𝐸
𝑛−𝑝 Multiple coefficient of
5. Critical value: f0 > 𝑓𝛼,𝑘,𝑛−𝑝 determination
METU, IE266, S25 14
Example. Home size vs other factors
A model is to be built to relate Home Size (Y) to several factors:
Income, Family Size and Education.
𝑆𝑆𝐸
2 𝑛−𝑝
𝑅𝑎𝑑𝑗 =1−
𝑆𝑆𝑇
𝑛−1
𝑦ො𝑖 𝑦ො𝑖
Same as in SLR
Same as in SLR
Analysis of Variance
Source DF SS MS F
Regression k SSR MSR MSR/MSE
Residual Error n-p SSE MSE
Lack of Fit m-p SSLOF MSLOF MSLOF/MSPE
Pure Error n-m SSPE MSPE
Total n-1 SST
1
𝑉𝐼𝐹𝑗 = 2,
1 − 𝑅𝑗
𝑅𝑗2 = 𝑆𝑆𝑅𝑗 /𝑆𝑆𝑇𝑗 , multiple coefficient of determination for
𝑌 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜀
1
model with all variables included, possibly with 𝑥1 = 𝑥 , 𝑥2 = ln 𝑥2
1
1
𝐸[𝑌] = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥12 + 𝛽3 ( )
𝑥2
Add: 𝛽4 𝑥1 𝑥2 + 𝛽5 𝑥12 𝑥2
kink
2 𝑀𝑆𝐸 𝑝
𝑅𝑎𝑑𝑗 (𝑝) =1−
𝑆𝑆𝑇
𝑛−1
𝑀𝑆𝐸(𝑝)
𝐶𝑝 is an estimator of Γ𝑝 : 𝐶𝑝 = (𝑛 − 𝑝) − 𝑛 − 2𝑝 .
𝑀𝑆𝐸(𝑓𝑢𝑙𝑙)
1
= σ𝑛𝑖=1(𝐸 𝑦ො𝑖 − 𝐸 𝑦𝑖 2
+ 𝑣𝑎𝑟 𝑦ො𝑖 )
𝜎2
1 1 𝑥𝑖 −𝑥ҧ 2 1 2 σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ
2
= σ𝑛𝑖=1 0+ 𝜎2 + = 𝜎 (1 + )
𝜎2 𝑛 𝑆𝑥𝑥 𝜎2 𝑆𝑥𝑥
σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ
2
=1+ ⇒ Γ2 = 2
𝑆𝑥𝑥
𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 = 0 (given 𝛽4 , 𝛽5 )
𝐻𝑎 : 𝐻0 not true
𝑆𝑆𝑅 𝛽1 , 𝛽2 , 𝛽3 𝛽0 , 𝛽4 , 𝛽5
= 𝑆𝑆𝑅 𝛽1 , 𝛽2 , 𝛽3 , 𝛽4 , 𝛽5 𝛽0 − 𝑆𝑆𝑅 𝛽4 , 𝛽5 𝛽0
= 𝑆𝑆𝐸 𝑥4 , 𝑥5 − 𝑆𝑆𝐸 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5
𝑆𝑆𝑅 𝜷𝟏 , 𝜷𝟐 , 𝜷𝟑 𝛽0 , 𝛽4 , 𝛽5
𝐹0 = 3 ~𝐹3,𝑛−5−1
𝑆𝑆𝐸 𝑥1 , … , 𝑥5
𝑛−5−1
METU, IE266, S25 39
Example.
𝑌 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝜀
𝐻0 : 𝛽1 = 𝛽2 = 0 (given 𝛽3 )
𝑺𝑺𝑹(. |𝜷𝟎 )
𝐻𝑎 : 𝐻0 not true
60.76−49.54
2 5.61
𝐹0 = =
68.405 − 60.76
= 8.07
𝑀𝑆𝐸
11
𝑺𝑺𝑹(. |𝜷𝟎 )
𝐻0 : 𝛽1 = 𝛽2 = 0 (given 𝛽3 )
𝐻𝑎 : 𝐻0 not true
60.76−49.54
5.61
𝐹0 = 2 = = 8.07
𝑀𝑆𝐸 68.405 − 60.76
11
Nb of locati
Forward selection
0.15
3. There may not be a single best model but several good models. One
may verify with alternative variable selection methods.