0% found this document useful (0 votes)
92 views6 pages

This Study Resource Was: Scatterplot of Attendance Vs Team Salary

This study analyzed the relationship between team salary and attendance in Major League Baseball in 2016. A scatterplot showed a positive relationship between the two variables. The regression equation found that for every additional $30 million in salary, attendance would be expected to increase by about 31,000 people. Higher team salary accounted for about 50% of the variation in attendance. Correlation between attendance and team ERA was stronger than attendance and batting average, but neither relationship was statistically significant.

Uploaded by

tyler
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views6 pages

This Study Resource Was: Scatterplot of Attendance Vs Team Salary

This study analyzed the relationship between team salary and attendance in Major League Baseball in 2016. A scatterplot showed a positive relationship between the two variables. The regression equation found that for every additional $30 million in salary, attendance would be expected to increase by about 31,000 people. Higher team salary accounted for about 50% of the variation in attendance. Correlation between attendance and team ERA was stronger than attendance and batting average, but neither relationship was statistically significant.

Uploaded by

tyler
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CH13- 63.

Refer to the Baseball 2016 data, which reports information on the 2016 Major League Baseball
season. Let attendance be the dependent variable and total team salary be the independent variable.
Determine the regression equation and answer the following questions.

a. Draw a scatter diagram. From the diagram, does there seem to be a direct relationship between the
two variables?

b. What is the expected attendance for a team with a salary of $100.0 million?

c. If the owners pay an additional $30 million, how many more people could they expect to attend?

d. At the .05 significance level, can we conclude that the slope of the regression line is positive? Conduct
the appropriate test of hypothesis.

e. What percentage of the variation in attendance is accounted for by salary?

f. Determine the correlation between attendance and team batting average and between attendance
and team ERA. Which is stronger? Conduct an appropriate test of hypothesis for each set of variables.

m
er as
co
eH w
Answer:

o.
a.
rs e
ou urc
Scatterplot of Attendance vs Team Salary
4000000
o

3500000
aC s

3000000
vi y re

2500000
Attendance

2000000
1500000
ed d

1000000
ar stu

500000
0
40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 200.00 220.00 240.00
Team Salary
is
Th

The above scatterplot shows as team salary increase the corresponding attendance increase. There exists
sh

a positive relationship between two variables.

b. Take the sample to calculate the correlation coefficient.

This study source was downloaded by 100000829508145 from CourseHero.com on 08-09-2021 23:48:50 GMT -05:00

https://www.coursehero.com/file/61871519/week7docx/
y
y−´¿
¿
r= ¿ =494162269/(30-1)(40.58)(594112.41)=0.70677
(x−x́) ¿
∑¿
¿
sy
b=r( )=0.70677(594112.41/40.58)=10347
sx
a= ý -b x́ =24588667-10347(122) =1196919
Attendance=1196919+10347 team salary

The expected attendance for a team with salary of $100 million is

Attendance=1196919+10347(100) =2231619

c. The expected attendance for a team with a salary of $130 million is

m
er as
Attendance= 1196919+10347(130) =2542029

co
The expected more attendance is

eH w
2542029-2231619=310410
d. The null hypotheses H 0 : β=0 from the regression excel output, the t-statistics is 5.29

o.
The P-value is 0.000.
rs e
ou urc
The level of the significance is 0.05
The null hypothesis is rejected at 0.05 level of significance since the P-value is less than the level
of significance. There is sufficient evidence to indicate that the slope of the regression line is
o

positive. The result is statistically significant.


aC s

e. From the regression excel output, the percentage of variation is 49.95. so 49.95%of the variation
vi y re

in attendance is accounted for by salary.


f. From correlation output from excel, correlation coefficient between ERA and Attendance is 0.257
stronger than BA with attendance 0.124
f (1) The null hypotheses for team batting average and attendance H 0 : ρ=0 (the correlation
ed d

in the population is 0), from the regression excel output, the t-statistics is 0.662
ar stu

The P-value is 0.514


t(α/2) at n-2 degrees of freedom on two-tail is 2.048
The level of the significance is 0.05
is

The null hypothesis fails to be rejected at 0.05 level of significance since the P-value is greater
than the level of significance. There is not enough evidence to indicate that there is correlation
Th

in between batting average and attendance.


f (2) The null hypotheses for team ERA and attendance H 0 : ρ=0 (the correlation in the
population is 0), from the regression excel output, the t-statistics is -1.41
sh

The P-value is 0.17


t(α/2) at n-2 degrees of freedom on two-tail is 2.048
The level of the significance is 0.05
The null hypothesis fails to be rejected at 0.05 level of significance since the P-value is greater
than the level of significance. There is not enough evidence to indicate that there is correlation
in between ERA and attendance.

This study source was downloaded by 100000829508145 from CourseHero.com on 08-09-2021 23:48:50 GMT -05:00

https://www.coursehero.com/file/61871519/week7docx/
CH13- 64. Refer to the Lincolnville School bus data. Develop a regression equation that ex-presses the
relationship between age of the bus and maintenance cost. The age of the bus is the independent
variable.

a. Draw a scatter diagram. What does this diagram suggest as to the relationship between the two
variables? Is it direct or indirect? Does it appear to be strong or weak?

b. Develop a regression equation. How much does an additional year add to the maintenance cost?
What is the estimated maintenance cost for a 10-year-old bus?

c. Conduct a test of hypothesis to determine whether the slope of the regression line is greater than
zero. Use the .05 significance level. Interpret your findings from parts (a), (b), and (c) in a brief report.

Answer:

a.

m
er as
Scatterplot of Age vs Maintenance Cost

co
eH w
12000

o.
10000

rs e
Maintenance Cost

ou urc
8000

6000
o

4000
aC s

2000
vi y re

0
0 2 4 6 8 10 12 14 16
Age
ed d
ar stu

Form above diagram it seems that there is a positive relationship between the two variables. The
relationship seems to be direct. Though it appears that the relation is weak.
is

b. From the regression output from excel:


Maintenance cost= a + b Age
Th

b= 603.2
a= a= ý -b x́ =4551.8875-603.2(6.9875) =337
the required regression equation is Maintenance cost= 337+ 603.2Age
sh

The estimated maintenance cost for a 10-year old bus is


Maintenance cost=337+603.2(10) =6369
c. The null hypotheses H 0 : β=0 from the regression excel output, the t-statistics is 8.84
The P-value is 0.000.
The level of the significance is 0.05

This study source was downloaded by 100000829508145 from CourseHero.com on 08-09-2021 23:48:50 GMT -05:00

https://www.coursehero.com/file/61871519/week7docx/
The null hypothesis is rejected at 0.05 level of significance since the P-value is less than the level
of significance. There is sufficient evidence to indicate that the slope of the regression line is
positive. The result is statistically significant.

CH14- 35. Refer to the Lincolnville School District bus data. First, add a variable to change the type of
engine (diesel or gasoline) to a qualitative variable. If the engine type is diesel, then set the qualitative
variable to 0. If the engine type is gasoline, then set the qualitative variable to 1. Develop a regression
equation using statistical software with maintenance cost as the dependent variable and age, odometer
miles, miles since last maintenance, and engine type as the independent variables.

a. Develop a correlation matrix. Which independent variables have strong or weak correlations with the
dependent variable? Do you see any problems with multicollinearity?

b. Use a statistical software package to determine the multiple regression equation. How did you select
the variables to include in the equation? How did you use the information from the correlation analysis?

m
er as
Show that your regression equation shows a significant relationship. Write out the regression equation
and interpret its practical application. Report and interpret the R-square.

co
eH w
c. Develop a histogram or a stem-and-leaf display of the residuals from the final regression equation

o.
developed in part (f). Is it reasonable to conclude that the normality assumption has been met?
rs e
ou urc
d. Plot the residuals against the fitted values from the final regression equation developed in part (f)
against the fitted values of Y. Plot the residuals on the vertical axis and the fitted values on the horizontal
axis.
o
aC s
vi y re

Answer:

a.
ed d
ar stu
is
Th
sh

From the correlation matrix on excel. The dependent variable maintenance cost has a strong
correlation with age. Also, two independent variable age and odometer miles are strong correlated.

b. From the regression output on excel, the regression equation is


Maintenance cost= -623+746Age-0.00915 Odometer Miles-0.00361miles+ 3186 Engine Type
The value of correlation coefficient R2 =0.9227. 92.27% of the variation in maintenance cost
is explained by the dependent variables.

This study source was downloaded by 100000829508145 from CourseHero.com on 08-09-2021 23:48:50 GMT -05:00

https://www.coursehero.com/file/61871519/week7docx/
c. For the frequency table we need determine the number classes. Use “2 to the k rule” if we try
k=6 then 26= 64 which is smaller than n=80 (observations) So let k=7 then 2 7=128 which is
greater then n=80. Then we need find out the class interval. Use “ i ≥ (H-L)/k” then i ≥
(2005.361283-(-1611.329037)/7≥516.67 So we can use 550 as interval. Since the minimum value
is -1611.329037 so the lower limit can be -1650

organize the information into frequency distribution table and histogram

class frequency
1 -1650--1100 3
2 -1100--550 10
3 -550-0 30
4 0-550 23
5 550-1100 10
6 1100-1650 3
7 1650-2200 1

m
er as
co
eH w
o.
rs e
ou urc
o
aC s
vi y re
ed d
ar stu

It shows that the residuals follow a normal distribution.


is

d.
Th
sh

This study source was downloaded by 100000829508145 from CourseHero.com on 08-09-2021 23:48:50 GMT -05:00

https://www.coursehero.com/file/61871519/week7docx/
residuals against the fitted values
2500
2000
1500
1000
Residuals

500
0
-2000 0 2000 4000 6000 8000 10000
-500
-1000
-1500
-2000
Fitted Value

m
er as
There is no apparent relationship in the residuals, but the residual variation maybe increasing

co
eH w
with larger fitted values.

o.
rs e
ou urc
o
aC s
vi y re
ed d
ar stu
is
Th
sh

This study source was downloaded by 100000829508145 from CourseHero.com on 08-09-2021 23:48:50 GMT -05:00

https://www.coursehero.com/file/61871519/week7docx/
Powered by TCPDF (www.tcpdf.org)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy