Topic 8 Questions
Topic 8 Questions
Q1
Suppose that Y = 67+27X for all values of X and Y. What can you say about the sample
correlation coefficient r between X and Y?
a) r = -1
b) r = 0
c) r = 1
d) The magnitude of r is unknown until it is estimated using sample data
Q2
Suppose that r = 0, where r is the sample correlation coefficient between X and Y. Then
a) X and Y are not related at all
b) X and Y are very closely related
c) Neither of the above is necessarily true
Q3
You want to predict expenditure on hamburger purchases Y using one of the income
measures X or Z. If rXY = 0.4, rZY = 0.3 and rXZ = 0.6, then we should use
a) X
b) Z
c) Not use this information
Q4
You want to predict the consumption sauce W using one of the income measures X or Z.
Given the information rWX = 0.6, rXZ = 0.9 and rZW = -0.8, you should use
a) X
b) Z
c) Not use this information
Q5
Suppose we find that rXY is very close to -1 for a sample. That means
a) X and Y have very weak correlation
b) The X and Y values lie very close to a straight line with slope that equals -1
c) Neither of the above
Q6
A fuel-oil distribution company has collected data over a number of years in order to
determine the statistical relationship between the daily temperature and the consumption of
fuel-oil in single family dwellings. Given the temperature, the company would like to be able
to predict the consumption of fuel-oil in order to service their customers better. The company
draws sample observations which yield the following information:
b) Estimate the linear regression equation of daily fuel consumption on daily temperature.
Superimpose the estimated regression line on the graph constructed in a).
c) Relate the values of the estimated regression coefficients to your diagram constructed in
b.
d) Predict the level of consumption of fuel when the daily temperature is 7.5c.
e) Measure the strength of the linear relationship between the daily temperature and daily
fuel consumption in terms of the sample correlation coefficient. Interpret this value of
the coefficient which you obtain.
Q7
You want to develop a model to predict assessed value of homes based on gross area. A
sample of 15 single-family apartments is selected in the Midwestern district. The assessed
value (Y, in hundred thousands of dollars) and the gross area of the apartments (X, in
thousands of square feet) are recorded, with the following results of statistical analysis:
Yˆ 51.915 16.633 X
r 0.812
a) Interpret the meaning of the Y-intercept and the slope of the regression model.
b) Predict the assessed value for an apartment whose gross area is 1750 square feet.
c) Interpret the meaning of the coefficient of determination in this problem.
Q8
Circulation is the lifeblood of the publishing business. The larger the sales of magazine, the
more it can charge advertisers. Recently, a circulation gap has appeared between the
publishers’ reports of magazines’ newsstand sales and subsequent audits by the Audit Bureau
of Circulations. The data in the file Circulation represent the reported and audited newsstand
yearly sales (in thousands) for the following 10 magazines (i.e. n = 10)
a) Use the least-squares method to find the regression coefficients b0 and b1.
b) Interpret the meaning of b0 and b1 in this problem.
c) Predict the monthly rent for an apartment that has 1,000 square feet.
d) Why would it not be appropriate to use the model to predict the monthly rent for
apartments that have 500 square feet?
e) Given the standard error of slope coefficient is 0.1376, at the 0.05 level of significance,
is there evidence of a linear relationship between the size of the apartment and the
monthly rent?
f) Construct a 95% confidence interval estimate of the population slope, 1 .
Q10
A random sample of 12 companies was selected and the sales and earnings, in millions of
dollars, are reported below:
Standard Lower
Coefficients Error t Stat P-value Lower 95% Upper 95% 95.0% Upper 95.0%
Intercept 1.64574594 0.728503942 2.259076233 0.047442872 0.022538011 3.26895387 0.022538011 3.268953869
X Variable 1 0.103873619 0.017054814 6.090574801 0.000117143 0.065873126 0.14187411 0.065873126 0.141874111
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.647613
R Square 0.419402
Adjusted R
Square 0.322636
Standard Error 0.396999
Observations 8
ANOVA
Significance
df SS MS F F
Regression 1 0.683101 0.683101 4.334174 0.082521
Residual 6 0.945649 0.157608
Total 7 1.62875
a) Estimate the regression of average purchase quantity per buyer on purchase frequency.
b) Interpret the slope of the estimated regression line.
c) Estimate the average purchase per buyer for the purchasing frequency 20.
d) Test the hypothesis that the slope of the population regression line is zero against that
the slope is not zero. Use a 5% level of significance.
e) Find and interpret the coefficient of determination. Comment on the result.
Q12
In a simple linear regression model, Yi 0 1 X i i , where i ~ N (0, 2 ) , the sample
information are obtained as follows:
Xi 2 2 4 4 6 6
Yi 10 12 6 8 1 5
After fitting the above regression model, the following output is obtained from Excel:
a) Write down the prediction equation for Y. Find the predicted values ( Ŷi ) when Xi = 2, 4
and 6.
b) Use the observed values (Yi) and predicted values ( Ŷi ) compute the SSE for this
regression model.
c) Find Y and hence calculate the SST for this regression model.
d) Make use of the results obtained in parts (b) and (c), find and interpret the coefficient of
determination (R2).
e) Based on your result obtained in part (d), or otherwise, find the coefficient of
correlation.
f) According to the p-value(s) from the Excel output, is there any evidence showing that
variable X is an important factor affecting Y?
Q13
A retails company conducted a study on the cost of opening stores in 2004. the information
about number of stores (NUMSTORE), total store area (in square feet) (STORESIZE), and
corresponding total cost in (USD) to set up these stores (COST) is collected from 14 areas.
The administrator of the company wants to develop an equation that is helpful in pricing the
opening of new stores. With the data available, two regression models are constructed with
COST as the dependent variable. The Excel regression results are given below. Use the
outputs to answer the following questions (α = 0.05).
a) What is the sample linear regression equation relating COST and NUMSTORE?
b) Suppose the population linear regression equation relating COST and STORESIZE is
COST 0 1 STORESIZE , find the point estimate and 95% interval estimate for
1 .
c) A claim is made that each new store adds at least 3000USD of cost. Can you find any
evidence to support this claim?
d) Which variable, NUMSTOTE or STORESIZE, is more useful to explain the variations
in store setup cost? Why?
e) Based on the regression results, describe the relationship between number of stores in
an area and the corresponding setup cost.
f) Based on the regression results, predict the total setup cost for a business area that needs
1000 square feet of store area.
Q14
In planning for an orientation gathering with new Management Science (MS) major students,
the Head of the Department wants to emphasize the importance of doing well in the major
courses in order to get better-paying jobs after graduation. To support this point, the Head
plans to show that there is a strong positive correlation between starting salaries (SALARY)
for recent MS graduates and their grade-point averages (GPA) in the major courses. Records
for seven of last year’s MS graduates are selected at random and given in the table. The Excel
regression results are given below.
a) What is the sample linear regression equation relating SALARY and GPA? Interpret the
intercept and slope of the sample linear regression equation.
b) What is the estimated starting salary for MS graduates with GPA 4.0?
c) Test whether there is a positive linear relationship between SALARY and GPA with
level of significance 0.01.
d) The personal secretary of the Head of Department has conducted another study on the
same group of graduated to investigate the relationship between SALARY and their
IELTS results (IELTS) with the coefficient of correlation 0.92. Which variable, GPA or
IELTS, is more useful to explain the variations in starting salary of MS graduates?
Explain.
Q15
In a manufacturing process, the assembly line speed, Xi (feet per minute) was thought to
affect the number of defective parts found, Yi during the inspection process. To test this
theory, managers devised a situation in which the same batch of parts was inspected visually
at a variety of line speeds. The following tables list the collected data and the Excel output.
a) Develop the estimated regression equation that relates line speed to the number of
defective parts found. Interpret the intercept and slope of the estimated regression
equation?
b) What is the estimated number of defective parts found with line speed 55 feet per
minute?
c) At a 0.05 level of significance, determine whether line speed and number of defective
parts found are negatively related.
d) Find the predicted values, Ŷi when Xi = 20, 40 and 60. Compute the SSE for this
regression model using the observed values, Yi and predicted values, Ŷi . Find Y and
hence calculate the SST for this regression model.
e) From the results in (d), find the coefficient of determination. Compute the percentage of
the total variation unexplained by the estimated regression equation.
f) Determine and interpret the correlation coefficient.
Q16
Suppose a fire insurance company wants to relate the amount of fire damage in major
residential fires to the distance between the burning house and the nearest fire station. The
study is to be conducted in a large suburb of a major city; a sample of 15 recent fires in this
suburb is selected. The amount of damage, y, and the distance between the fire and the
nearest fire station, x, are recorded for each fire. The results are given as below (α = 0.05).