Stat7055 T01
Stat7055 T01
1. Data was collected on 105 homes in Canberra in 2003. For each house, the following
information was collected: the estimated price of the house (in dollars); the number of
bedrooms; the size of the house (in square metres); whether or not a pool was present (yes
or no); the distance from Civic; the rating of the insulation in the house (none, average
or high); the suburb; the number of bathrooms; and the type of internet connectivity
available (dialup, ADSL or the NBN, where dialup is the slowest connection and the NBN
is the fastest). Classify each variable as either nominal, ordinal, discrete or continuous.
2. You work in a country where every resident plays a sport every day. However the only
two sports played are table tennis (when it is raining) and golf (when it is sunny). Your
job is to provide statistical analysis to the management of a company that sells “ping-
pong” (table tennis) balls directly through the internet. Over the past eight months you
have collected the following data:
For this data, the sample coefficients of variation for marketing expenditure, number of
rainy days per month, and number of sales have been calculated to be 0.642849, 0.656009,
and 1.194023, respectively.
(a) The marketing manager has told you that it simply makes sense that there is a
strong and positive correlation between marketing expenditure and the number
of sales made. Provide some analysis regarding this relationship. What do you
conclude from your results?
(b) Using the data above, calculate the correlation coefficient between the number of
rainy days per month and the number of sales. The covariance between the number
of rainy days per month and the number of sales has been calculated as 14012.23.
(c) What does the result in (b) above suggest, and provide a potential reason for this
result.
Try using R to calculate the sample correlation coefficients from the raw data given in
the table.
Page 1 of 4
STAT7055 Topic 01 Tutorial Questions
3. A quality control officer in a chocolate factory records the number of minutes it takes
for the company’s signature chocolate bar to melt at room temperature. He recorded
the following 11 times for 11 different chocolate bars:
14 20 20 12 9 13 35 12 11 12 46
4. There is a shortcut version for calculating the sample variance given by the following
formula: n n
1 ( X i )
2
s2 = Xi2 − i=1
n−1 i=1
n
Show that this is equivalent to the definition given in the lectures. In other words, show
that: n n n
1 2 1 ( X i )
2
Xi − X̄ = Xi2 − i=1
n − 1 i=1 n−1 i=1
n
Bonus: Show that the shortcut version of the sample covariance given below is equivalent
to the definition given in lectures.
n n n
1 ( X i ) ( Y )
i=1 i=1 i
sXY = X i Yi −
n−1 i=1
n
Page 2 of 4
STAT7055 Topic 01 Tutorial Questions
5. The Hula painted frog is an extremely rare species of frog that was thought to be extinct
but was rediscovered in 2011. Only 11 are believed to be living in the wild. Suppose the
weights of these 11 frogs are known and given in the table below (in grams):
13 26 22 16 18 28 14 15 15 17 25
Suppose now we take five random samples of size four from this population, with each
new sample being taken after returning the previous sample to the population. The five
samples, along with some sample statistics, are listed below:
n
Sample X̄ i=1 Xi2
13, 22, 18, 16 17.25 1233
26, 15, 17, 15 18.25 1415
14, 18, 15, 25 18 1370
25, 14, 16, 17 18 1366
13, 26, 25, 18 20.5 1794
(b) Calculate the sample variance for each of the five samples.
(c) Calculate the sample variance for each of the five samples, but this time using n as
the denominator, instead of n − 1. That is, calculate:
n
∗2 1 2
s = Xi − X̄
n i=1
(d) Calculate the average of the five samples variances in part (b) and the average of
the five sample variances in part (c). What do you notice?
Page 3 of 4
STAT7055 Topic 01 Tutorial Questions
6. The average score for a class of 30 students was 75. The 20 male students in the class
averaged 70. The boxplots for the scores for the male and female students are given
below.
100
90
80
70
60
50
40
Male Female
(a) What was the average of the 10 female students in the class?
(b) Describe the relationship between the median and the mean for both male students
and female students.
(c) Did a greater proportion of male students or female students score at least 83?
7. Discussion Question
Some scientists are conducting a study to investigate the effects of exercise and caffeine
on sleep quality. A random sample of 300 people aged between 20 and 50 were included
in the study. For a particular day, each person was asked to record the number of
cups of coffee/tea they drank, the number of minutes they exercised, and the number
of hours they slept that night. The scientists have asked you to help them analyse their
data. They would like to summarise each variable in their sample data. They are also
interested in determining whether doing more exercise or consuming less caffeine is more
likely to cause the person to sleep for longer. Discuss some approaches that you could
use to help the scientists. Remember to note any important issues that need to be
considered in the analysis or in the interpretation of the analysis.
8. swirl
Work through lessons 1 and 2 of the R Programming course.
Page 4 of 4