Statistics Module 11
Statistics Module 11
Regression analysis
A parametric tool used to describe the linear relationship between the independent
and dependent variables.
Develops a model to predict the values of the dependent variable based on the
values of the independent variables.
Multiple regression analysis which involves two or more independent variables and
one dependent variable in which the relationship among the variables is estimated
by a straight line
Y = a + b1x1 + b2x2 + b3x3 + … or y = bo + b1x1 + b2x2 +b3x3 + ... (the formula goes
on depending on the number of independent variables)
The difference of this module from the previous ones is that we will not be doing manual
computation of the multiple linear regression; instead, we will discuss about the multiple
linear regression analysis output using statistical software like the data analysis of MS
Excel.
Problem illustrations
Given these data, we will use MS Excel data analysis, regression analysis. The results
of the analysis are as follow:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.947248
R Square 0.897278
Adjusted R Square 0.867929
Standard Error 0.582776
Observations 10
ANOVA
df SS MS F Significance F
Regression 2 20.76661 10.3833 30.5726 0.000347
Residual 7 2.377394 0.339628
Total 9 23.144
Coefficients Standard Error t Stat P-value
Intercept -0.78387 0.967542 -0.81016 0.444512
Number of miles traveled (X1) 0.059413 0.010055 5.909002 0.000594
Number of deliveries (X2) 0.920966 0.22483 4.096277 0.004595
If significance F-value is less than .05 level of significance, reject the null hypothesis.
If significance F-value is greater than .05 level of significance, do not reject the null
hypothesis.
As shown on the ANOVA table, reject the null hypothesis, significance F-value of
0.000347 is less than .05 level of significance, which means at least one of the
Business Statistics: Module 11. Multiple Linear Regression Page 3 of 9
To determine which of the independent variables (number of miles traveled and number
of deliveries) has significant relationship with travel times in hours, look into the p-values
shown on the table and compare it with .05 level of significance.
If p-value is less than .05 level of significance, it means that particular independent
variable has significant linear relationship with the dependent variable.
Both the number of miles traveled (X1) and number of deliveries (X2) have p-values
(0.000594 and 0.004595 respectively) which are less than .05 level of significance,
which means both independent variables have significant linear relationships with travel
time in hours. This suggests that an increase on the number of miles traveled and on
the number of deliveries, travel time in hours increases as well.
To develop the multiple linear regression equation or model, focus on the coefficient
values shown on the table.
This model means that the unit change in travel time in hours (Y) is 0.059413 for every
unit change in number of miles traveled (X1) when the other independent variables are
constant; and the unit change in travel time in hours (Y) is 0.920966 for every unit
change in number of deliveries (X2) when the other independent variables are constant.
In addition, the estimated value of travel time in hours (Y) is -0.78387 if both the
numbers of miles traveled and deliveries are zero (0).
Use the model to predict the value of travel time in hours with given values of number of
miles traveled and number of deliveries. For example,
What is the estimated travel time in hours if the number of miles traveled in 115 and
number of deliveries is 6?
The number of miles traveled and number of deliveries have very strong positive linear
relationship with travel time in hours based on its multiple r-value of 0.9472. Likewise,
the multiple r2 suggests that 89.73% of the variation in travel time in hours can be
explained by its interaction with the number of miles traveled and number of deliveries,
and that 10.27% were caused by unexplained factors or factors which were not included
in the study.
Business Statistics: Module 11. Multiple Linear Regression Page 4 of 9
In particular, the drive-through sales has significant positive linear relationship with the
franchisees’ net profit as its p-value of 0.023076 is less than .05 level of significance;
while counter sales has p-value of 0.090775 is greater than .05 level of significance.
Business Statistics: Module 11. Multiple Linear Regression Page 5 of 9
The model suggests that the value of net profit is -0.22098 if both counter sales and
drive-through sales are zero. It also shows that 0.086339 is the change in net profit for
every unit change in counter sales when other variables are held constant; and that
0.113513 is the change in net profit for every unit change in drive through sales when
other variables are held constant.
What is the estimate net profit for counter sales of 9.1 and drive-through sales of 8.9 (in
million dollars)?
Based on the multiple r-value of 0.876635, counter sales, drive-through sales have high
positive linear relationship with net profit of franchisees. Likewise, its multiple r 2 value
indicates that 76.85% of the variation in the net profit can be caused by its interactions
with the counter sales and drive-through sales and the remaining 23.15% are caused by
other factors which were not covered in this study.
,
End of Module Exercises
2. The owner of a chain of health spas has selected 10 of her smaller clubs for test in
which she varies the size of newspaper ad and the amount of initiation fee discount
to see how this might affect the number of prospective members who visit each club
during the following week. The results were recorded, and multiple regression
analysis was performed on it. Interpret the results of the multiple regression analysis
shown below. Determine the number of new visitors for an add column of 9 inches
and discount amount of $75.
Regression Statistics
Multiple R 0.799957448
R Square 0.639931918
Adjusted R Square 0.537055323
Standard Error 3.54273123
Observations 10
ANOVA
df SS MS F Significance F
Regression 2 156.143388 78.07169 6.220384 0.028012134
Residual 7 87.85661197 12.55094
Total 9 244
Coefficients Standard Error t Stat P-value
Intercept 11.49005792 4.013760848 2.862666 0.024245
ad column - inches 2.139333977 0.612334395 3.493735 0.010078
discount amount 0.031486486 0.04511417 0.697929 0.507735
3. The Conde Nast Traveler Gold List provides ratings for the top 20 small cruise ships.
Each score represents the percentage of respondents who rated a ship as excellent
or very good on several criteria including shore excursions and food/dining. An
overall score is also reported and used to rank the ships. The data were analyzed
using multiple regression analysis, the output of which is shown below. Interpret the
result. Predict the overall score for cruise ship with a shore excursion of 80 and a
food/dining score of 90.
Regression Statistics
Multiple R 0.859345037
R Square 0.738473893
Adjusted R Square 0.707706116
Standard Error 1.376504389
Observations 20
Business Statistics: Module 11. Multiple Linear Regression Page 7 of 9
ANOVA
df SS MS F Significance F
Regression 2 90.95450632 45.47725 24.00154 1.11912E-05
Residual 17 32.21099368 1.894764
Total 19 123.1655
Coefficients Standard Error t Stat P-value
Intercept 45.17795848 6.951847689 6.498698 5.45765E-06
shore excursions 0.252892474 0.041891237 6.036882 1.33357E-05
food/dining 0.248188929 0.061605727 4.028667 0.00087138
4. The Tire Rank, an online distributor of tires and wheels, conducts extensive testing
to provide customers with products that are right for their vehicle, driving style, and
driving conditions. In addition, The Tire Rack maintains an independent consumer
survey to help drivers help each other by sharing their long-term tire experiences
(The Tire Rack website, August 1, 2016). The survey use 1 to 10 rating scale with
10 as the highest rating for 18 high-performance all-season tires. The tread wear
variable rates quickness of wear based on the driver’s expectations; the dry traction
variable rates the grip of a tire on a dry road; the steering variable rates the tire’s
steering responses; and the buy again variable rates the driver’s desire to purchase
the same tire again. These data were analyzed using multiple regression analysis,
which results of which are shown on the table below. Interpret the results.
Regression Statistics
Multiple R 0.96956582
R Square 0.94005788
Adjusted R Square 0.926225083
Standard Error 0.57191668
Observations 17
ANOVA
df SS MS F Significance F
Regression 3 66.68549411 22.2285 67.95862652 3.36066E-08
Residual 13 4.252152953 0.327089
Total 16 70.93764706
Coefficients Standard Error t Stat P-value
Intercept -10.3397728 2.126333551 -4.86272 0.000310027
tread wear 1.200056035 0.146227731 8.206761 1.68927E-06
dry traction 0.621872989 0.504775274 1.23198 0.239775859
steering 0.325237913 0.424731891 0.765749 0.457504454
the result. What is the estimated overall rating for a vehicle that scores 6 on ride, 9
on handling, and 7 on driver comfort?
Regression Statistics
Multiple R 0.818288937
R Square 0.669596785
Adjusted R Square 0.504395177
Standard Error 3.099875295
Observations 10
ANOVA
df SS MS F Significance F
Regression 3 116.8446389 38.94821 4.05320986 0.068370032
Residual 6 57.65536105 9.609227
Total 9 174.5
Coefficients Standard Error t Stat P-value
Intercept 46.74070022 15.53545447 3.008647 0.023741891
ride 3.463894967 1.157780465 2.991841 0.024262446
handling 3.915754923 1.30102027 3.009757 0.023707941
driver comfort -1.90809628 1.294945434 -1.4735 0.191049531
References
Albright, S. et al. (2015). Business analytics: data analysis and decision making (5th
ed). Cengage Learning.
Anderson, D., Sweeney, D.J., et.al., (2018). Modern business statistics. Australia:
Cengage Learning.
Antivola, H. (2015). Business statistics: a modular approach. Books Atbp. Publishing.
Anywhere Math. (2016). Introduction to Statistics.
https://www.youtube.com/watch?v=LMSyiAJm99g.
Berenson, M.L., Levine, D.M., & Krehbiel, T.C. (2015). Basic business statistics:
concepts and applications. Pearson Education Sou7th Asia Pte. Ltd.
Bowerman, B. (2017). Business statistics in practice: using modeling, data, and
analytics (8th ed.). McGraw-Hill Education.
Jaggia, S. (2019). Business statistics: communicating with numbers (3rd ed.). McGraw-
Hill Education.
Lee, N. (2016). Business statistics: using excel & SPSS. Sage.
Mukaka, M.M. (2012). A guide to appropriate use of correlation coefficient in medical
research. Malawi Medical Journal, v.24(3).
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3576830/
Simple Learning Pro. (2015). Mean, median, mode, range, and standard deviation.
https://www.youtube.com/watch?v=mk8tOD0t8M0.
Sharpe, N. (2015). Business statistics 3rd ed. Pearson Education.
Weier, R.M. (2014). Introduction to business statistics, 7th edition. Cengage Learning
Asia Pte. Ltd.
Willoughby, D. (2015). An essential guide to business statistics. John Wiley & Sons.
Business Statistics: Module 11. Multiple Linear Regression Page 9 of 9