PART II (Second, Third and Final Year) Management Science MSCI 212 Statistical Methods For Business
PART II (Second, Third and Final Year) Management Science MSCI 212 Statistical Methods For Business
MANAGEMENT SCIENCE
Incidents of plagiarism are recorded on a student’s file. Penalties are in line with the
institutional framework of the University.
c) In accordance with University regulations, marks are deducted from any coursework
which is not submitted by the deadline. This penalty will apply for 3 days after the
deadline and then a mark of zero will be given to any work not submitted. However,
if an extension is given then the rule applies from the date of the extension.
d) This coursework has two questions and you must answer both questions in full.
e) Both questions carry equal marks (50%) and you should be able to begin Question 1
immediately; for Question 2 some other important tools will still be covered later.
f) Each of your answers should state clearly your reasoning. Please also state clearly
any assumptions that you have made in addition to those given in the questions.
g) You are allowed to submit handwritten answers but should include carefully selected
extracts of your SPSS output to justify your answers. Also, please write neatly and if
we cannot read your handwriting your answer will NOT be marked.
Page Limit: 20 pages – 8 pages for Q1 part a), 2 pages for Q1 part b), 10 pages for Q2.
1
Question 1 [Worth 50% of the marks]
An online article is being prepared on big budget movies. You have been asked to support the
article by providing statistical analyses. You have been provided with data from a random sample
of 150 movies from the top 500 movies as measured by production budget.
The SPSS data file ‘MovieStats.sav’ contains the following data for the top 500 movies as measured
by production budget:
Note that, two versions of the categorical data, mpaa and genre, have been provided. The original
categories and aggregated categories re-coded into numerical values. They represent the same
variables, but the re-coded version may be more useful in some charts that you may wish to use.
Draw your random sample of size 150 from this population and investigate your sample using SPSS.
If your sample contains potential data anomalies you need to decide how to use (or not use) these
data and report any steps you take regarding them.
a) In no more than 8 pages describe the main features of the movies in your sample, as if
reporting to the writers of the article. You should include the main features of individual vari-
ables and of the relationships between them. You may include SPSS numerical and graphical
output and/or you may quote values from your SPSS output. (The clarity and content of your
report are both important). [Worth about 80% of the marks for Q1]
b) Without looking at the full data set, suggest which of the patterns/features noted in your
sample (of 150) are also likely to be true of the full population of 500 movies. [No more than
2 pages. Worth about 20% of the marks for Q1]
2
Question 2 [Worth 50% of the marks]
You should include key parts of your SPSS output in your answer. You must explain your answer
clearly and you are limited to a maximum of 10 pages.
The famous CEO of Seoul Rental Bike has hired you as an external consultant to evaluate factors,
e.g., weather indicators, such as temperature, humidity, wind, etc., that affect the demand for the
total number of rented bikes. Specifically, he has asked you to develop a regression model that
shows the important factors of the rental demand and can predict the demand under his weather
scenarios. He has provided the data to analyse. It contains 353 days of data from 2017 to 2018
(Source: http://data.seoul.go.kr/, see file SeoulRentalBike.sav). The data description is as follows.
• TotRent – Total number of bike rentals eacch day
• Temp – Average temperature (in degrees Celsius ◦ C)
• Hum – Average humidity (%)
• Wind – Average wind speed (metres per second, m/s)
• Visib – Average visibility (in metres, m). Maximum visibility recorded is 2000m.
• Dew – Average dew point temperature each day, this is an alternative measure of humidity
(in degrees Celsius ◦ C)
• Solar – Average solar radiation (in mega joules per metre squared, MJ/m2 )
• Rain – Total rainfall (in millimetres, mm)
• Snow – Total snowfall (in centimetres, cm)
a) Carry out a preliminary analysis of the data using scatterplots, correlations, or anything else
you think appropriate to demonstrate the relationship between the total number of bike rentals
each day and all explanatory variables, and any relationships between explanatory variables.
Report your preliminary findings. [Worth about 20% of the marks for Q2]
b) Use stepwise regression starting with an “all-in” model and identify “the best” model from the
output. Justify your answer. Discuss the significant and insignificant variables in the model.
[Worth about 20% of the marks for Q2]
c) Redo (b) but use stepwise regression starting with a “no-variable” model. Identify “the best”
model from the output. Justify your answer. Discuss the significant and insignificant variables
in the model. [Worth about 10% of the marks for Q2]
d) Compare the “best” models from (b) and (c) and identify which one is your preferred model.
Explain the causes for any difference between models in (b) and (c), if any. You can refer
to your preliminary analysis to explain the difference(s) between both “best” models. [Worth
about 10% of the marks for Q2]
e) Carry out a residuals analysis to check whether or not the usual regression assumptions
seem to hold for your preferred model. Carefully justify your conclusions, noting any reserva-
tions you have about your equation. [Worth about 20% of the marks for Q2]
f) The CEO would like you to predict the number of rentals under several scenarios he sets
up. Use your preferred model to comment carefully on the scenario in light of your residual
diagnostics. The scenarios are the following:
Scenario Temp Humid Wind Visib Dew Solar Rain Snow
Decent weather 12 60 1.5 1400 4 0.6 1 0
Hot weather 35 80 1 1200 20 1 0 0
Cold, windy, snowy weather -20 20 3 1000 -25 0.3 0 30
[Worth about 20% of the marks for Q2]