0% found this document useful (0 votes)
9 views35 pages

Discussion+on+Multiple+Regression ShimengHuang

The document discusses multiple regression analysis, focusing on concepts such as the coefficient of multiple determination (R²), adjusted R², hypothesis testing, and the use of dummy variables and interactions in regression models. It includes practice questions to reinforce understanding of these concepts. The presentation is led by Shimeng Huang, a PhD candidate at the Wisconsin School of Business.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views35 pages

Discussion+on+Multiple+Regression ShimengHuang

The document discusses multiple regression analysis, focusing on concepts such as the coefficient of multiple determination (R²), adjusted R², hypothesis testing, and the use of dummy variables and interactions in regression models. It includes practice questions to reinforce understanding of these concepts. The presentation is led by Shimeng Huang, a PhD candidate at the Wisconsin School of Business.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

GB 307 | Business Analytics II

Discussion on Multiple Regression

Shimeng Huang| Wisconsin School of Business

September 27th, 2024

1
Agenda

Introduction Recap Practice Summary Q&A

2
Introduction

3
Introduction

Shimeng Huang
• 5th year PhD candidate in risk and insurance

4
Recep

𝑹𝑹𝟐𝟐

Adjusted 𝑹𝑹𝟐𝟐

Multiple Regression Hypothesis Testing

Dummy Variable

Interaction

5
Regression analysis
• Simple linear regression: 𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝜀𝜀𝑖𝑖
• Multiple linear regression:

𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1𝑖𝑖 + 𝛽𝛽2 𝑋𝑋2𝑖𝑖 + ⋯ + 𝛽𝛽𝑘𝑘 𝑋𝑋𝑘𝑘𝑖𝑖 + 𝜀𝜀𝑖𝑖

• k independent variables
• Y-intercept
• Population slopes
• Random error

𝑌𝑌�𝑖𝑖 = 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋1𝑖𝑖 + 𝑏𝑏2 𝑋𝑋2𝑖𝑖 + ⋯ + 𝑏𝑏𝑘𝑘 𝑋𝑋𝑘𝑘𝑘𝑘


• Interpretation:
• Holding all other variables constant, 𝑌𝑌 is expected to increase by 𝑏𝑏1 units when 𝑋𝑋1 increases
by 1 unit.

6
Recep - Important Concepts
Coefficient of Multiple Determination (𝑹𝑹𝟐𝟐 )
o Reports the proportion of total variation in Y explained by all X variables taken together

7
Recep - Important Concepts
Coefficient of Multiple Determination (𝑹𝑹𝟐𝟐 )
o Reports the proportion of total variation in Y explained by all X variables taken together

o 𝑅𝑅 2 never decreases when a new X variable is added to the model

Adjusted 𝑹𝑹𝟐𝟐
o Shows the proportion of variation in Y explained by all X variables adjusted for the
number of X variables used
 Penalize excessive use of unimportant independent variables
 It is possible for adjusted 𝑅𝑅 2 to decrease when a new X variable is added to the model.
 Use adjusted 𝑅𝑅 2 to compare models with different number of independent variables

8
Recep - Important Concepts
Confidence Level
o 95% confidence level: There is a 95% probability that the true value of the population
parameter lies in that interval
Significance Level (𝜶𝜶)
o Probability of rejecting the null hypothesis when it is true
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 = 1 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿

9
Recep - Hypothesis Testing (Two-Tailed)
• Null Hypothesis: H0: 𝛽𝛽𝑗𝑗 = 0; Alternative Hypothesis: H1: 𝛽𝛽𝑗𝑗 ≠ 0
• Significance level
• Three Ways
o 𝒑𝒑-value: The probability that H0 is true (no relationship/random selection)
o 𝒕𝒕-statistic

o Confidence interval
 e.g., 95% confidence that 𝛽𝛽𝑗𝑗 (we never know) will be in the range of values

• Critical Value (𝑡𝑡-table)


10
Recep - Hypothesis Testing (Two-Tailed)
• Critical Value (𝒕𝒕-table: Confidence level or significance level 𝜶𝜶 & Degrees of freedom)
• Confidence level
o 95% confidence level
o 99% confidence level
• Degrees of freedom
o 𝑛𝑛 − 𝑘𝑘 − 1

11
Recep - Hypothesis Testing (Two-Tailed)
• Null Hypothesis: H0: 𝛽𝛽𝑗𝑗 = 0; Alternative Hypothesis: H1: 𝛽𝛽𝑗𝑗 ≠ 0
• Significance level
• Three Ways
o 𝒑𝒑-value: The probability that H0 is true  compare with significance level.
o 𝒕𝒕-statistic:  compare with critical value from the t-table.

o Confidence interval:  see if the interval contains zero.


 e.g., 95% confidence that 𝛽𝛽𝑗𝑗 will be in the range of values
• Critical Value (𝑡𝑡-table: Confidence level or significance level α & Degrees of freedom)
• Decision Rule
o 𝑝𝑝-value: How low should P-value go?
 𝑃𝑃 − 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 < 0.05  Reject H0 (Convention: α = 0.05; can change: e.g., α = 0.01)
o 𝑡𝑡-test: Compare 𝑡𝑡-statistic with critical value
 |𝑡𝑡 − 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠| > 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉  Reject H0
 Confidence interval: zero is not included  Reject H0
12
Recep – categorical variables
• Dummy variables:
• A categorical independent variable with two levels
• Yes or no, on or off, weekday or weekend
• Coded as 1 or 0
• Categorical variables with >2 levels
• The number of dummy variables = number of levels – 1
• e.g. quarter of year: Q1, Q2, Q3, Q4
• 3 dummy variables to represent quarter (with Q4 as the default level):
• 𝑋𝑋1 =1 if quarter is Q1 or 𝑋𝑋1 = 0 if quarter is Q2, Q3, or Q4
• 𝑋𝑋2 =1 if quarter is Q2 or 𝑋𝑋2 = 0 if quarter is Q1, Q3, or Q4 Quarter 𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 𝑿𝑿𝟑𝟑
• 𝑋𝑋3 =1 if quarter is Q3 or 𝑋𝑋3 = 0 if quarter is Q1, Q2, or Q4 Q1 1 0 0
Q4 0 0 0
Q3 0 0 1
Q1 1 0 0
Q2 0 1 0
13
Recep – interactions
� = 𝒃𝒃𝟎𝟎 + 𝒃𝒃𝟏𝟏 𝑿𝑿𝟏𝟏 + 𝒃𝒃𝟐𝟐 𝑿𝑿𝟐𝟐 where 𝑋𝑋2 is a dummy variable
No interaction: 𝒀𝒀
• Different intercepts with different values of 𝑋𝑋2 .
• Same slope with different values of 𝑋𝑋2 (parallel lines)
Why adding an interaction?
• When the effect of 𝑋𝑋1 on 𝑌𝑌 depends on the value of 𝑋𝑋2 .
� = 𝒃𝒃𝟎𝟎 + 𝒃𝒃𝟏𝟏 𝑿𝑿𝟏𝟏 + 𝒃𝒃𝟐𝟐 𝑿𝑿𝟐𝟐 + 𝒃𝒃𝟑𝟑 (𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 )
With interaction: 𝒀𝒀
• When 𝑋𝑋2 = 0, 𝑌𝑌� = 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋1
• 𝑏𝑏0 is the intercept and 𝑏𝑏1 is the slope.
• When 𝑋𝑋2 = 1, 𝑌𝑌� = 𝑏𝑏0 + 𝑏𝑏1 𝑋𝑋1 + 𝑏𝑏2 + 𝑏𝑏3 𝑋𝑋1 = (𝑏𝑏0 + 𝑏𝑏2 ) + (𝑏𝑏1 + 𝑏𝑏3 )𝑋𝑋1
• (𝑏𝑏0 + 𝑏𝑏2 ) is the intercept and (𝑏𝑏1 + 𝑏𝑏3 ) is the slope.
• Different intercepts with different values of 𝑋𝑋2 .
• Different slope with different values of 𝑋𝑋2 (non-parallel lines)

14
Practice Time

15
Practice 1
Question 5 in Chapter 2 (page 70)
• Data
o VinhoVerde
• Independent variable
• Dependent variable
• Regression equation
• Interpretation of coefficient
• Predicted value

16
Practice 1
Question 5 in Chapter 2 (page 70)

17
Practice 2
Question 15 in Chapter 2 (page 74)
• Data
o Nickels26Weeks
• 𝑅𝑅 2
• Adjusted 𝑅𝑅 2
• Interpretation of 𝑅𝑅 2

18
Practice 2
Question 15 in Chapter 2 (page 74)

19
Practice 3
Question 27 in Chapter 2 (page 79)
• Data
o VinhoVerde
• Confidence interval
• Significant contribution

20
Practice 3
Question 27 in Chapter 2 (page 79)

21
Practice 4
Question 43 in Chapter 2 (page 95)
• Data
o Moving
• Dummy variable
• Interpretation of coefficients

22
Practice 4
Question 43 in Chapter 2 (page 95)

• For a given amount of cubic feet moved, a building with an elevator is estimated to have a mean
labor hours of 4.5283 below an apartment without an elevator.
• Holding constant the effect of elevator in the building, for each cubic foot increase in amount
moved, the labor hours are estimated to increase by a mean of 0.0482. 23
Practice 5
Question 47 in Chapter 2 (page 96)
• Data
o VinhoVerde
• Interaction

24
Practice 5
Question 47 in Chapter 2 (page 96)

25
Practice 6
Question 77 in Chapter 2 (page 108)
• Data
o Baseball
• Multiple regression equation
• Interpretation of coefficient
• Predicted value
• Interpretation of 𝑅𝑅 2
• Significant contribution
• 𝑝𝑝-value

26
Practice 6
Question 77 in Chapter 2 (page 108)

27
Practice 6
Question 77 in Chapter 2 (page 108)
• The 𝒑𝒑-value for the ERA:
o The probability of obtaining a 𝑡𝑡𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 of –8.37 or less is 0.000 when the null hypothesis is
true.
• The 𝒑𝒑-value for the runs scored per game:
o The probability of obtaining a 𝑡𝑡𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 of 4.94 or larger is 0.000 when the null hypothesis is
true.
• Both are significant

28
Practice 7 (By Hand)
Question 24 in Chapter 2 (page 78)
• 𝑡𝑡-statistic
• Confidence interval
• Significant contribution

29
Practice 7 (By Hand)
Question 24 in Chapter 2 (page 78)
o Test-statistic

o Confidence level (95%)


o Degrees of freedom: 𝑑𝑑𝑑𝑑 = 𝑛𝑛 − 𝑘𝑘 − 1
o 𝑡𝑡-table
o Confidence interval

30
Practice 7 (By Hand)
Question 24 in Chapter 2 (page 78)
• The slope of 𝑋𝑋1 in terms of the 𝑡𝑡-statistic is 3.33
• The slope of 𝑋𝑋2 in terms of the 𝑡𝑡-statistic is 3.75
• 95% confidence interval for 𝛽𝛽1 is [1.4682, 6.5318]
• 𝛼𝛼 = 0.05
• 𝑑𝑑𝑑𝑑 = 𝑛𝑛 − 𝑘𝑘 − 1 = 20 − 2 − 1 = 17
4
• 𝑋𝑋1 : 𝑡𝑡𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = = 3.33 > 2.11  Reject H0
1.2
3
• 𝑋𝑋2 : 𝑡𝑡𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = = 3.75 > 2.11  Reject H0
0.8

31
Summary

32
Summary
• Key concepts
o Interpretation of estimated coefficients
o 𝑅𝑅2 and adjusted 𝑅𝑅 2
o Hypothesis testing
 H0 & H1
 Compare t-statistic and critical value (Or check confidence interval or p-value)
 Decide whether to reject H0
o Dummy variables
o Interaction
• Any Questions on Multiple Regression
o Email
 shimeng.huang@wisc.edu
• Office Hours: 1293 Grainger
 Oct 1st: 8 AM – 9:30 AM

33
Q&A

Questions

34
Thanks!
Shimeng Huang| Wisconsin School of Business

35

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy