0% found this document useful (0 votes)
9 views

Topic 8 Questions

This document contains 10 questions about simple linear regression. The questions cover topics such as interpreting correlation coefficients, estimating regression equations from data, making predictions using regression models, and assessing the strength and statistical significance of linear relationships between variables.

Uploaded by

Hyondae Bae
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Topic 8 Questions

This document contains 10 questions about simple linear regression. The questions cover topics such as interpreting correlation coefficients, estimating regression equations from data, making predictions using regression models, and assessing the strength and statistical significance of linear relationships between variables.

Uploaded by

Hyondae Bae
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Topic 8: Simple Linear Regression Exercises

Q1
Suppose that Y = 67+27X for all values of X and Y. What can you say about the sample
correlation coefficient r between X and Y?
a) r = -1
b) r = 0
c) r = 1
d) The magnitude of r is unknown until it is estimated using sample data

Q2
Suppose that r = 0, where r is the sample correlation coefficient between X and Y. Then
a) X and Y are not related at all
b) X and Y are very closely related
c) Neither of the above is necessarily true

Q3
You want to predict expenditure on hamburger purchases Y using one of the income
measures X or Z. If rXY = 0.4, rZY = 0.3 and rXZ = 0.6, then we should use
a) X
b) Z
c) Not use this information

Q4
You want to predict the consumption sauce W using one of the income measures X or Z.
Given the information rWX = 0.6, rXZ = 0.9 and rZW = -0.8, you should use
a) X
b) Z
c) Not use this information

Q5
Suppose we find that rXY is very close to -1 for a sample. That means
a) X and Y have very weak correlation
b) The X and Y values lie very close to a straight line with slope that equals -1
c) Neither of the above
Q6
A fuel-oil distribution company has collected data over a number of years in order to
determine the statistical relationship between the daily temperature and the consumption of
fuel-oil in single family dwellings. Given the temperature, the company would like to be able
to predict the consumption of fuel-oil in order to service their customers better. The company
draws sample observations which yield the following information:

Consumption of fuel-oil (litres) Daily temperature (in Centigrade)


7.0 -6
6.3 -3
5.1 0
4.6 3
3.4 6
2.9 9
1.3 12
1.0 15
0.6 18

a) Plot the data on a scatter diagram.

b) Estimate the linear regression equation of daily fuel consumption on daily temperature.
Superimpose the estimated regression line on the graph constructed in a).

c) Relate the values of the estimated regression coefficients to your diagram constructed in
b.

d) Predict the level of consumption of fuel when the daily temperature is 7.5c.

e) Measure the strength of the linear relationship between the daily temperature and daily
fuel consumption in terms of the sample correlation coefficient. Interpret this value of
the coefficient which you obtain.

Q7
You want to develop a model to predict assessed value of homes based on gross area. A
sample of 15 single-family apartments is selected in the Midwestern district. The assessed
value (Y, in hundred thousands of dollars) and the gross area of the apartments (X, in
thousands of square feet) are recorded, with the following results of statistical analysis:

Yˆ  51.915  16.633 X
r  0.812

a) Interpret the meaning of the Y-intercept and the slope of the regression model.
b) Predict the assessed value for an apartment whose gross area is 1750 square feet.
c) Interpret the meaning of the coefficient of determination in this problem.
Q8
Circulation is the lifeblood of the publishing business. The larger the sales of magazine, the
more it can charge advertisers. Recently, a circulation gap has appeared between the
publishers’ reports of magazines’ newsstand sales and subsequent audits by the Audit Bureau
of Circulations. The data in the file Circulation represent the reported and audited newsstand
yearly sales (in thousands) for the following 10 magazines (i.e. n = 10)

Magazine Reported (X) independent variable Audited (Y) dependent variable


YM 621.0 299.6
CosmoGirl 359.7 207.7
Rosie 530.0 325.0
Playboy 492.1 336.3
Esquire 70.5 48.6
TeenPeople 567.0 400.3
More 125.5 91.2
Spin 50.6 39.1
Vogue 353.3 268.6
Elle 263.6 214.3
Source: Data extracted from M. Rose, “In Fight for Ads, Publishers Often Overstate Their
Sales,” The Wall Street Journal, August 6, 2003, pp. A1, A10.

For these data, b0 = 26.724, b1 = 0.5719 and Sb1 = 0.0668

a) Interpret the meaning of the slope, b1, in this problem.


b) Predict the audited newsstand sales for a magazine that reports newsstand sales of
300,000.
c) At the 0.05 level of significance, is there evidence of a linear relationship between
reported sales and audited sales?
d) Construct a 95% confidence interval estimate of the population slope  1.
Q9
An agent for a residential real estate company in a large city would like to be able to predict
the monthly rental cost for apartments, based on the size of the apartment, as defined by
square footage. A sample of 25 apartments in a particular residential neighborhood was
selected, and the information gathered revealed the following:
Rent Size Rent Size
950 850 1800 1369
1600 1450 1400 1175
1200 1085 1450 1225
1500 1232 1100 1245
950 718 1700 1259
1700 1485 1200 1150
1650 1136 1150 896
935 726 1600 1361
875 700 1650 1040
1150 956 1200 755
1400 1100 800 1000
1650 1285 1750 1200
2300 1985

a) Use the least-squares method to find the regression coefficients b0 and b1.
b) Interpret the meaning of b0 and b1 in this problem.
c) Predict the monthly rent for an apartment that has 1,000 square feet.
d) Why would it not be appropriate to use the model to predict the monthly rent for
apartments that have 500 square feet?
e) Given the standard error of slope coefficient is 0.1376, at the 0.05 level of significance,
is there evidence of a linear relationship between the size of the apartment and the
monthly rent?
f) Construct a 95% confidence interval estimate of the population slope, 1 .

Q10
A random sample of 12 companies was selected and the sales and earnings, in millions of
dollars, are reported below:

Company Sales Earnings


C1 40.2 5.3
C2 10.4 3.7
C3 18.6 4.4
C4 71.7 8.0
C5 58.6 6.6
C6 46.8 5.1
C7 17.5 2.6
C8 11.9 1.7
C9 19.6 3.5
C10 51.2 8.2
C11 28.6 6.0
C12 69.2 10.8
SUMMARY OUTPUT
Regression Statistics ANOVA
Significance
Multiple R 0.887504192 df SS MS F F
R Square 0.787663691 Regression 1 58.75117833 58.7511783 37.0951014 0.000117143
Adjusted R
Square 0.76643006 Residual 10 15.83798834 1.58379883
Standard
Error 1.258490697 Total 11 74.58916667
Observations 12

Standard Lower
Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% Upper 95.0%
Intercept 1.64574594 0.728503942 2.259076233 0.047442872 0.022538011 3.26895387 0.022538011 3.268953869
X Variable 1 0.103873619 0.017054814 6.090574801 0.000117143 0.065873126 0.14187411 0.065873126 0.141874111

a) Determine a regression equation


Yˆ  a  bx
by the least-squares principle so that we can predict the value of earnings based on the
value of sales. Interpret the meaning of a and b in this problem.
b) Find the sample coefficient of correlation between sales and earnings. Is there evidence
that there exists a positive linear relationship between the two variables? Use the 0.05
level of significance.
c) What is the coefficient of determination? Interpret the meaning of this coefficient.
d) For a company with $35.0 million is sales, predict the earnings.
e) Corresponding to the regression equation in part (a), the regression model is given by
Y    x  
where  ~ N (0,  2 ) .
Test H0:   0.15 against H1:   0.15 at the 0.05 level of significance.
Q11
The following table shows, for eight brands of tea, the average purchase quantity per buyer
(Y) and the purchasing frequency (X) in a year.
Y (kg) 3.6 3.3 2.8 2.6 2.7 2.9 2.0 2.6
X 24 21 22 22 18 13 9 6

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.647613
R Square 0.419402
Adjusted R
Square 0.322636
Standard Error 0.396999
Observations 8

ANOVA
Significance
df SS MS F F
Regression 1 0.683101 0.683101 4.334174 0.082521
Residual 6 0.945649 0.157608
Total 7 1.62875

Standard Upper Lower Upper


Coefficients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
Intercept 2.028994 0.40167 5.0514 0.00233 1.046144 3.011844 1.046144 3.011844
X Variable 1 0.04643 0.022302 2.081868 0.082521 -0.008141 0.101001 -0.00814 0.101001

a) Estimate the regression of average purchase quantity per buyer on purchase frequency.
b) Interpret the slope of the estimated regression line.
c) Estimate the average purchase per buyer for the purchasing frequency 20.
d) Test the hypothesis that the slope of the population regression line is zero against that
the slope is not zero. Use a 5% level of significance.
e) Find and interpret the coefficient of determination. Comment on the result.
Q12
In a simple linear regression model, Yi   0   1 X i   i , where  i ~ N (0,  2 ) , the sample
information are obtained as follows:

Xi 2 2 4 4 6 6
Yi 10 12 6 8 1 5

After fitting the above regression model, the following output is obtained from Excel:

Coefficients Standard Error t Stat P-value


Intercept 15 1.870828693 8.0178373 0.0013127
X Variable 1 -2 0.433012702 -4.618802 0.00989

a) Write down the prediction equation for Y. Find the predicted values ( Ŷi ) when Xi = 2, 4
and 6.
b) Use the observed values (Yi) and predicted values ( Ŷi ) compute the SSE for this
regression model.
c) Find Y and hence calculate the SST for this regression model.
d) Make use of the results obtained in parts (b) and (c), find and interpret the coefficient of
determination (R2).
e) Based on your result obtained in part (d), or otherwise, find the coefficient of
correlation.
f) According to the p-value(s) from the Excel output, is there any evidence showing that
variable X is an important factor affecting Y?
Q13
A retails company conducted a study on the cost of opening stores in 2004. the information
about number of stores (NUMSTORE), total store area (in square feet) (STORESIZE), and
corresponding total cost in (USD) to set up these stores (COST) is collected from 14 areas.
The administrator of the company wants to develop an equation that is helpful in pricing the
opening of new stores. With the data available, two regression models are constructed with
COST as the dependent variable. The Excel regression results are given below. Use the
outputs to answer the following questions (α = 0.05).

a) What is the sample linear regression equation relating COST and NUMSTORE?
b) Suppose the population linear regression equation relating COST and STORESIZE is
COST   0   1 STORESIZE   , find the point estimate and 95% interval estimate for
1 .
c) A claim is made that each new store adds at least 3000USD of cost. Can you find any
evidence to support this claim?
d) Which variable, NUMSTOTE or STORESIZE, is more useful to explain the variations
in store setup cost? Why?
e) Based on the regression results, describe the relationship between number of stores in
an area and the corresponding setup cost.
f) Based on the regression results, predict the total setup cost for a business area that needs
1000 square feet of store area.
Q14
In planning for an orientation gathering with new Management Science (MS) major students,
the Head of the Department wants to emphasize the importance of doing well in the major
courses in order to get better-paying jobs after graduation. To support this point, the Head
plans to show that there is a strong positive correlation between starting salaries (SALARY)
for recent MS graduates and their grade-point averages (GPA) in the major courses. Records
for seven of last year’s MS graduates are selected at random and given in the table. The Excel
regression results are given below.

a) What is the sample linear regression equation relating SALARY and GPA? Interpret the
intercept and slope of the sample linear regression equation.
b) What is the estimated starting salary for MS graduates with GPA 4.0?
c) Test whether there is a positive linear relationship between SALARY and GPA with
level of significance 0.01.
d) The personal secretary of the Head of Department has conducted another study on the
same group of graduated to investigate the relationship between SALARY and their
IELTS results (IELTS) with the coefficient of correlation 0.92. Which variable, GPA or
IELTS, is more useful to explain the variations in starting salary of MS graduates?
Explain.
Q15
In a manufacturing process, the assembly line speed, Xi (feet per minute) was thought to
affect the number of defective parts found, Yi during the inspection process. To test this
theory, managers devised a situation in which the same batch of parts was inspected visually
at a variety of line speeds. The following tables list the collected data and the Excel output.

a) Develop the estimated regression equation that relates line speed to the number of
defective parts found. Interpret the intercept and slope of the estimated regression
equation?
b) What is the estimated number of defective parts found with line speed 55 feet per
minute?
c) At a 0.05 level of significance, determine whether line speed and number of defective
parts found are negatively related.
d) Find the predicted values, Ŷi when Xi = 20, 40 and 60. Compute the SSE for this
regression model using the observed values, Yi and predicted values, Ŷi . Find Y and
hence calculate the SST for this regression model.
e) From the results in (d), find the coefficient of determination. Compute the percentage of
the total variation unexplained by the estimated regression equation.
f) Determine and interpret the correlation coefficient.
Q16
Suppose a fire insurance company wants to relate the amount of fire damage in major
residential fires to the distance between the burning house and the nearest fire station. The
study is to be conducted in a large suburb of a major city; a sample of 15 recent fires in this
suburb is selected. The amount of damage, y, and the distance between the fire and the
nearest fire station, x, are recorded for each fire. The results are given as below (α = 0.05).

a) According to the scatter diagram, describe the relationship between X and Y.


b) Find the regression equation.
c) Interpret the slope of the regression equation.
d) Is there sufficient evidence to show that X and Y are linearly correlated? Test this
hypothesis using p-value approach and use α = 0.05.
e) Find coefficient of determination and interpret the result.
f) Can we use the above regression equation to do prediction? Why?
g) Find correlation coefficient.
h) Find the 95% confidence interval of β1.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy