Discussion+on+Multiple+Regression ShimengHuang
Discussion+on+Multiple+Regression ShimengHuang
1
Agenda
2
Introduction
3
Introduction
Shimeng Huang
• 5th year PhD candidate in risk and insurance
4
Recep
𝑹𝑹𝟐𝟐
Adjusted 𝑹𝑹𝟐𝟐
Dummy Variable
Interaction
5
Regression analysis
• Simple linear regression: 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝜀𝜀𝑖𝑖
• Multiple linear regression:
• k independent variables
• Y-intercept
• Population slopes
• Random error
6
Recep - Important Concepts
Coefficient of Multiple Determination (𝑹𝑹𝟐𝟐 )
o Reports the proportion of total variation in Y explained by all X variables taken together
7
Recep - Important Concepts
Coefficient of Multiple Determination (𝑹𝑹𝟐𝟐 )
o Reports the proportion of total variation in Y explained by all X variables taken together
Adjusted 𝑹𝑹𝟐𝟐
o Shows the proportion of variation in Y explained by all X variables adjusted for the
number of X variables used
Penalize excessive use of unimportant independent variables
It is possible for adjusted 𝑅𝑅 2 to decrease when a new X variable is added to the model.
Use adjusted 𝑅𝑅 2 to compare models with different number of independent variables
8
Recep - Important Concepts
Confidence Level
o 95% confidence level: There is a 95% probability that the true value of the population
parameter lies in that interval
Significance Level (𝜶𝜶)
o Probability of rejecting the null hypothesis when it is true
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 = 1 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿
9
Recep - Hypothesis Testing (Two-Tailed)
• Null Hypothesis: H0: 𝛽𝛽𝑗𝑗 = 0; Alternative Hypothesis: H1: 𝛽𝛽𝑗𝑗 ≠ 0
• Significance level
• Three Ways
o 𝒑𝒑-value: The probability that H0 is true (no relationship/random selection)
o 𝒕𝒕-statistic
o Confidence interval
e.g., 95% confidence that 𝛽𝛽𝑗𝑗 (we never know) will be in the range of values
11
Recep - Hypothesis Testing (Two-Tailed)
• Null Hypothesis: H0: 𝛽𝛽𝑗𝑗 = 0; Alternative Hypothesis: H1: 𝛽𝛽𝑗𝑗 ≠ 0
• Significance level
• Three Ways
o 𝒑𝒑-value: The probability that H0 is true compare with significance level.
o 𝒕𝒕-statistic: compare with critical value from the t-table.
14
Practice Time
15
Practice 1
Question 5 in Chapter 2 (page 70)
• Data
o VinhoVerde
• Independent variable
• Dependent variable
• Regression equation
• Interpretation of coefficient
• Predicted value
16
Practice 1
Question 5 in Chapter 2 (page 70)
17
Practice 2
Question 15 in Chapter 2 (page 74)
• Data
o Nickels26Weeks
• 𝑅𝑅 2
• Adjusted 𝑅𝑅 2
• Interpretation of 𝑅𝑅 2
18
Practice 2
Question 15 in Chapter 2 (page 74)
19
Practice 3
Question 27 in Chapter 2 (page 79)
• Data
o VinhoVerde
• Confidence interval
• Significant contribution
20
Practice 3
Question 27 in Chapter 2 (page 79)
21
Practice 4
Question 43 in Chapter 2 (page 95)
• Data
o Moving
• Dummy variable
• Interpretation of coefficients
22
Practice 4
Question 43 in Chapter 2 (page 95)
• For a given amount of cubic feet moved, a building with an elevator is estimated to have a mean
labor hours of 4.5283 below an apartment without an elevator.
• Holding constant the effect of elevator in the building, for each cubic foot increase in amount
moved, the labor hours are estimated to increase by a mean of 0.0482. 23
Practice 5
Question 47 in Chapter 2 (page 96)
• Data
o VinhoVerde
• Interaction
24
Practice 5
Question 47 in Chapter 2 (page 96)
25
Practice 6
Question 77 in Chapter 2 (page 108)
• Data
o Baseball
• Multiple regression equation
• Interpretation of coefficient
• Predicted value
• Interpretation of 𝑅𝑅 2
• Significant contribution
• 𝑝𝑝-value
26
Practice 6
Question 77 in Chapter 2 (page 108)
27
Practice 6
Question 77 in Chapter 2 (page 108)
• The 𝒑𝒑-value for the ERA:
o The probability of obtaining a 𝑡𝑡𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 of –8.37 or less is 0.000 when the null hypothesis is
true.
• The 𝒑𝒑-value for the runs scored per game:
o The probability of obtaining a 𝑡𝑡𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 of 4.94 or larger is 0.000 when the null hypothesis is
true.
• Both are significant
28
Practice 7 (By Hand)
Question 24 in Chapter 2 (page 78)
• 𝑡𝑡-statistic
• Confidence interval
• Significant contribution
29
Practice 7 (By Hand)
Question 24 in Chapter 2 (page 78)
o Test-statistic
30
Practice 7 (By Hand)
Question 24 in Chapter 2 (page 78)
• The slope of 𝑋𝑋1 in terms of the 𝑡𝑡-statistic is 3.33
• The slope of 𝑋𝑋2 in terms of the 𝑡𝑡-statistic is 3.75
• 95% confidence interval for 𝛽𝛽1 is [1.4682, 6.5318]
• 𝛼𝛼 = 0.05
• 𝑑𝑑𝑑𝑑 = 𝑛𝑛 − 𝑘𝑘 − 1 = 20 − 2 − 1 = 17
4
• 𝑋𝑋1 : 𝑡𝑡𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = = 3.33 > 2.11 Reject H0
1.2
3
• 𝑋𝑋2 : 𝑡𝑡𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = = 3.75 > 2.11 Reject H0
0.8
31
Summary
32
Summary
• Key concepts
o Interpretation of estimated coefficients
o 𝑅𝑅2 and adjusted 𝑅𝑅 2
o Hypothesis testing
H0 & H1
Compare t-statistic and critical value (Or check confidence interval or p-value)
Decide whether to reject H0
o Dummy variables
o Interaction
• Any Questions on Multiple Regression
o Email
shimeng.huang@wisc.edu
• Office Hours: 1293 Grainger
Oct 1st: 8 AM – 9:30 AM
33
Q&A
Questions
34
Thanks!
Shimeng Huang| Wisconsin School of Business
35