0% found this document useful (0 votes)
11 views

Week 3 - Multiple Regression Solutions

Uploaded by

Weiru Hou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Week 3 - Multiple Regression Solutions

Uploaded by

Weiru Hou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Week 3

Multiple Regression Solutions

Problem 1

The owner of Showtime Movie Theaters, Inc., would like to estimate weekly gross revenue
as a function of advertising expenditures. Historical data for a sample of eight weeks follow.
Webfile:Showtime

a. Develop an estimated regression equation with the amount of television advertising


as the independent variable.

Regression Equation
Weekly Gross Revenue ($1000s) = 88.64 + 1.604 Televison Advertising ($1000s)

b. Develop an estimated regression equation with both television advertising and


newspaper advertising as the independent variables.

Regression Equation
Weekly Gross Revenue ($1000s) = 83.23 + 2.290 Televison Advertising ($1000s)
+ 1.301 Newspaper Advertising ($1000s)

c. Is the estimated regression equation coefficient for television advertising


expenditures the same in part (a) and in part (b)? Interpret the coefficient in each
case.

No,

Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 88.64 1.58 56.02 0.000
Televison Advertising ($1000s) 1.604 0.478 3.36 0.015 1.00

Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 83.23 1.57 52.88 0.000
Televison Advertising ($1000s) 2.290 0.304 7.53 0.001 1.45
Newspaper Advertising ($1000s) 1.301 0.321 4.06 0.010 1.45
d. What is the estimate of the weekly gross revenue for a week when $3500 is spent on
television advertising and $1800 is spent on newspaper advertising?

Prediction for Weekly Gross Revenue ($1000s)


Regression Equation
Weekly Gross Revenue ($1000s) = 83.23 + 2.290 Televison Advertising ($1000s)
+ 1.301 Newspaper Advertising ($1000s)
Settings
Variable Setting
Televison Advertising ($1000s) 3500
Newspaper Advertising ($1000s) 1800
Prediction
Fit SE Fit 95% CI 95% PI
10440.7 1464.55 (6675.91, 14205.4) (6675.91, 14205.4) XX
XX denotes an extremely unusual point relative to predictor levels used to fit the model.

Problem 2

The National Football League (NFL) records a variety of performance data for individuals
and teams. To investigate the importance of passing on the percentage of games won by a
team, the following data show the conference (Conf), average number of passing yards per
attempt (Yds/Att), the number of interceptions thrown per attempt (Int/Att), and the
percent of games won (Win%) for a random sample of 16 NFL teams for the 2011 season
(NFL website, February 12, 2012) Webfile: NFLPassing

a. Develop an estimated regression equation that can be used to predict the


percentage of games won given the average number of passing yards per attempt.

Regression Equation
Win% = -58.8 + 16.39 Yds/Att

b. Develop an estimated regression equation that can be used to predict the


percentage of games won given the number of interceptions thrown per attempt.

Regression Equation
Win% = 97.5 - 1600 Int/Att
c. Develop an estimated regression equation that can be used to predict the
percentage of games won given the average number of passing yards per attempt
and the number of interceptions thrown per attempt.

Regression Equation
Win% = -5.8 + 12.95 Yds/Att - 1084 Int/Att

d. The average number of passing yards per attempt for the Kansas City Chiefs was 6.2
and the number of interceptions thrown per attempt was .036. Use the estimated
regression equation developed in part (c) to predict the percentage of games won by
the Kansas City Chiefs. (Note: For the 2011 season the Kansas City Chiefs’ records
was 7 wins and 9 losses). Compare your prediction with the actual percentage of
games won by the Kansas City Chiefs.

Prediction for Win%


Regression Equation
Win% = -5.8 + 12.95 Yds/Att - 1084 Int/Att
Settings
Variable Setting
Yds/Att 6.2
Int/Att 0.036
Prediction
Fit SE Fit 95% CI 95% PI
35.5064 4.48629 (25.8143, 45.1984) (6.60692, 64.4058)

Vs. actual % of wins is 43.75%


Problem 3.
In problem 2, data were given on the average number of passing yards per attempt
(Yds/Att), the number of interceptions thrown per attempt (Int/Att), and the percentage of
games won (Win%) for a random sample of 16 NFL teams for the 2011 season (NFL
website, February 12, 2012).

a. Did the estimated regression equation that uses only the average number of passing
yards per attempt as the independent variable to predict the percentage of games
won provide a good fit?

Model Summary
S R-sq R-sq(adj) R-sq(pred)
15.8732 57.71% 54.69% 44.88%

b. Discuss the benefits of using both the average number of passing yards per attempt
and the number of interceptions thrown per attempt to predict the percentage of
games won.

Model Summary
S R-sq R-sq(adj) R-sq(pred)
12.6024 75.25% 71.44% 60.51%

Problem 4

The National Football League (NFL) records a variety of performance data for individuals
and teams. A portion of the data showing the average number of passing yards per game on
offence (OffPassYds/G), the average number of yards given up per game on defense
(DefYds/G), and the percentage of games won (Win%), for the 2011 season follows
Webfile NFL2011 (ESPN website, November 3, 2012).

a. Develop an estimated regression equation that can be used to predict the


percentage of games won given the average passing yards obtained per game on
offense and the average number of yards given up per game on defense.

Regression Equation
Win% = 60.5 + 0.3186 OffPassYds/G - 0.2413 DefYds/G
b. Use the F test to determine the overall significance of the relationship. What is your
conclusion at the 0.05 level of significance?

H0: β1=β2=0
Ha: β1≠0 or β2≠0 (at least one of the coefficients is not equal to zero)

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 2 6179 3089.6 13.18 0.000
OffPassYds/G 1 6079 6079.5 25.94 0.000
DefYds/G 1 1713 1712.6 7.31 0.011
Error 29 6797 234.4
Total 31 12976

F=13.18 and the corresponding P value = 0.000

Reject the null hypothesis. The regression equation is significant. At least one of the
regression equation coefficients is different from zero.

c. Use the t test to determine the significance of each independent variable. What is
your conclusion at the 0.05 level of significance?

Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 60.5 28.4 2.14 0.041
OffPassYds/G 0.3186 0.0626 5.09 0.000 1.21
DefYds/G -0.2413 0.0893 -2.70 0.011 1.21
For OffPassYds/G T-value = 5.09, P value = 0.000, P<alpha, reject the null hypothesis.
H0: β1=0, The OffPassYds/G is a significant independent variable at alpha =0.05.
Ha: β1≠0

For DefYds/G T-value = -2.70, P value = 0.011, P<alpha, reject the null hypothesis.
H0: β2=0 The DefYds/G is a significant independent variable at alpha =0.05.
Ha: β2≠0
d. Does the model have multicollinearity problems? Explain.

VIF = 1.21 – indicating no multicollinearity problems,

Pearson coeff. of correlation


Correlation: DefYds/G, OffPassYds/G
Correlations
Pearson correlation 0.414

Multicollinearity is a potential problem if the absolute value of the sample correlation


coefficient exceeds 0.7 for any two independent variables. SO in this case, there should not
be any multicollinearity issues.

e. Does the model meet the assumptions? Explain.

Discussion in class.
Problem 5

A 10-year study conducted by the American Heart Association provided data on how age,
blood pressure, and smoking relate to the risk of strokes. Assume that the following data
are from a portion of this study. Risk is interpreted as the probability (times 100) that the
patient will have a stroke over the next 10-year period. For the smoking variable, define a
dummy variable with 1 indicating a smoker and 0 indicating a nonsmoker. Webfile: Stroke

a. Develop an estimated regression equation that relates risk of a stroke to the


person’s age, blood pressure, and whether the person is a smoker.

Regression Equation
Smoker
No Risk = -91.8 + 1.077 Age + 0.2518 Pressure

Yes Risk = -83.0 + 1.077 Age + 0.2518 Pressure

Or
Regression Equation
Risk = -91.8 + 1.077 Age + 0.2518 Pressure + 8.74 Smoker

b. Is smoking a significant factor in the risk of a stroke? Explain. Use α = .05.

Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant -91.8 15.2 -6.03 0.000
Age 1.077 0.166 6.49 0.000 1.46
Pressure 0.2518 0.0452 5.57 0.000 1.25
Smoker
Yes 8.74 3.00 2.91 0.010 1.36

H0: β1=0,
Ha: β1≠0

t-value = 2.91, P-value = 0.01,


P-value <alpha, reject the null hypothesis. Yes, smoker is a significant factor in risk of
stroke.
c. What is the probability of a stroke over the next 10 years for Art Speen, a 68-year-
old smoker who has blood pressure of 175? What action might the physician
recommend for this patient?

Prediction for Risk


Regression Equation
Risk = -91.8 + 1.077 Age + 0.2518 Pressure + 8.74 Smoker
Settings
Variable Setting
Age 68
Pressure 175
Smoker 1
Prediction
Fit SE Fit 95% CI 95% PI
34.2661 1.99785 (30.0309, 38.5014) (21.3487, 47.1836)

Discussion in class.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy