Basic Statistics Questions
Basic Statistics Questions
The relationship between H and T can be modelled by the regression line with equation H = aT + b.
(b) Using the regression equation, estimate the number of hot chocolates that Lucy will sell on a
day when the maximum temperature is 12°C. [2]
(a) Write down the modal group for these data. [1]
(b) Use your graphic display calculator to find an estimate of the standard deviation of the weights
of mangoes from this harvest. [2]
(c) On the grid below, draw a histogram for the data in the table.
[3]
The distance in miles, M , can be converted to the distance in kilometres, K , using the formula K =
8
5
M.
(b) Write down the value of the median distance in kilometres (km). [1]
A total of 600 athletes from different teams compete in a 5 km race. The times the 600 athletes took to run the
5 km race are shown in the following cumulative frequency graph.
There were 400 athletes who took between 22 and m minutes to complete the 5 km race.
(d) Find m. [3]
(e) The first 150 athletes that completed the race won a prize.
Given that an athlete took between 22 and m minutes to complete the 5 km race, calculate
the probability that they won a prize. [5]
Sarah, a regular customer, visited the café on five consecutive days. The following table shows the number of
customers, x, ahead of Sarah who have already ordered and are waiting to receive their coffee and Sarah’s
waiting time, y minutes.
The relationship between x and y can be modelled by the regression line of y on x with equation y = ax + b.
(a.ii) Write down the value of Pearson’s product-moment correlation coefficient, r. [1]
(c) On another day, Sarah visits the café to order a coffee. Seven customers have already ordered
their coffee and are waiting to receive it.
Use the result from part (a)(i) to estimate Sarah’s waiting time to receive her coffee. [2]
The relationship between x and y can be modelled by the regression line of y on x with equation y = ax + b,
where a, b ∈ R.
(b) Use this model to predict the value of y when x = 18. [2]
(c)
¯
¯ Write down the value of x and the value of y . [1]
(d) Draw the line of best fit on the scatter diagram. [2]
6. [Maximum mark: 14] 21M.1.SL.TZ2.7
A large school has students from Year 6 to Year 12.
A group of 80 students in Year 12 were randomly selected and surveyed to find out how many hours per week
they each spend doing homework. Their results are represented by the following cumulative frequency graph.
(a) Find the median number of hours per week these Year 12 students spend doing homework. [2]
(b) Given that 10% of these Year 12 students spend more than k hours per week doing homework,
find the value of k. [3]
(e.i) Explain why this sampling method might not provide an accurate representation of the amount
of time all of the students in the school spend doing homework. [1]
The interquartile range is 20 grams and there are no outliers in the results.
(a) Find Pearson’s product-moment correlation coefficient, r, for these data. [2]
(b) The relationship between the variables can be modelled by the regression equation
D = ah + b. Write down the value of a and the value of b. [1]
(c) One of these eight students was disappointed with her result and wished she had practised
more. Based on the given data, determine how her score could have been expected to alter had
she practised an extra five hours per week. [2]
9. [Maximum mark: 6] 22M.2.SL.TZ2.5
A random sample of nine adults were selected to see whether sleeping well affected their reaction times to a
visual stimulus. Each adult’s reaction time was measured twice.
The first measurement for reaction time was taken on a morning after the adult had slept well. The second
measurement was taken on a morning after the same adult had not slept well.
The box and whisker diagrams for the reaction times, measured in seconds, are shown below.
Consider the box and whisker diagram representing the reaction times after sleeping well.
(a) State the median reaction time after sleeping well. [1]
(c) State why it appears that the mean reaction time is greater than the median reaction time. [1]
Comment on whether these box and whisker diagrams provide any evidence that
might suggest that not sleeping well causes an increase in reaction time. [1]
The regression line of y on x for this data can be written in the form y = ax + b.
(a) Find the value of a and the value of b. [2]
(b) Write down the value of the Pearson’s product-moment correlation coefficient, r. [1]
(c) Use the equation of your regression line to predict the Science test score for a student who has a
score of 78 on the Mathematics test. Express your answer to the nearest integer. [2]
The ages of the eldest child are summarized in the following box and whisker diagram.
(a) Find the largest value of c that would not be considered an outlier. [3]
4
c + 20. The regression line of c on a is c =
1
2
a − 9.
(b.i) One of the adults surveyed is 42 years old. Estimate the age of their eldest child. [2]
(b.ii) Find the mean age of all the adults surveyed. [2]
It can be assumed that this sample is representative of all visitors to the park for the following day.
(a.i) the probability that a randomly selected visitor will ride The Dragon; [2]
(a.ii) the expected number of times a visitor will ride The Dragon. [2]
It is known that 1000 visitors will attend the amusement park on Tuesday. The Dragon can carry a maximum of 10
people each time it runs.
(b) Estimate the minimum number of times The Dragon must run to satisfy demand. [2]
(a)
¯ Given that x = 7, verify that y = 16̄. [1]
(b) Given that q − p = 3, find the value of p and the value of q. [4]
(b.ii) the number of performances where at least 80 % of the tickets were sold. [3]
After a performance, the company decides to conduct a survey to obtain feedback from the audience.
(c.i) State one disadvantage of the company surveying only the first 5 % of the audience as they
leave the theatre. [1]
(c.ii) Describe briefly how the company could collect feedback from 5 % of the audience using the
systematic sampling method. [2]
(c.iii) State the sampling method which should be used if the survey is to be representative of the
number of children and the number of adults in the audience. [1]
(d) The following box and whisker diagram displays the amount spent by the audience at the
souvenir shop when they attended the performance.
(d.i) Estimate the number of people who spent between $3 and $25. [2]
(d.ii) Half the audience spent less than $a. Estimate the value of a. [1]
(e) This year the company will again give 60 performances and expects to sell 17 additional tickets
for each performance.
(e.i) Calculate the mean number of tickets the company expects to sell this year for each
performance. [3]
(e.ii) State what effect, if any, this increase in ticket sales would have on the variance of the number of
tickets sold for each performance. [1]
(a)
¯ Given that x = 7, verify that y = 16̄. [1]
(b) Given that q − p = 3, find the value of p and the value of q. [4]
17. [Maximum mark: 6] 24M.2.SL.TZ2.1
In a study, the mobile phone usage of a random sample of ten students was examined on a particular day.
The length of time, t hours, that the ten students used their phones for are listed below.
0. 7 1. 2 1. 9 4. 0 4. 4 4. 5 4. 9 5. 7 6. 5 11. 7
The teacher finds that the equation of the regression line of y on x for these scores is y = 0. 822x + 18. 4.
Giovanni was absent for Test A and Paulo was absent for Test B.
The teacher uses the regression line of y on x to estimate the missing scores.
The teacher estimated his score on Test B to be 27 to the nearest integer using the following calculation:
y = 0. 822(10) + 18. 4 ≈ 27
(b) Give a reason why this method is not appropriate for Paulo. [1]
Giovanni scored 90 on Test B.
The teacher estimated his score on Test A to be 87 to the nearest integer using the following calculation:
90 = 0. 822x + 18. 4, so x =
90−18.4
≈ 87
0.822
(c.i) Give a reason why this method is not appropriate for Giovanni. [1]
(c.ii) Use an appropriate method to show that the estimated Test A score for Giovanni is 86 to the
nearest integer. [3]
To test her claim, a randomly selected sample of rabbits was collected in each country.
The length of one ear of each rabbit was measured and the value recorded correct to the nearest millimetre (mm
).
In the Australian sample, the median recorded value was 80 mm and the interquartile range was 11 mm.
The recorded values for the Spanish sample are shown in the following box and whisker diagram.
(a) Complete the following table for the recorded values of the lengths of the rabbits’ ears in each
sample.
[3]
(b) Justifying your answers, compare the distributions of the lengths of rabbits’ ears in Australia and
Spain using
Let L1 be the regression line of x on y. The equation of the line L1 can be written in the form x = ay + b.
(b) Let L2 be the regression line of y on x. The lines L1 and L2 pass through the same point with
coordinates (p , q).
(a) State the name for this type of sampling technique. [1]
(b.i) Show that 3 students will be selected from grade 12. [3]
(b.ii) Calculate the number of students in each grade in the sample. [2]
In order to select the 3 students from grade 12, the principal lists their names in alphabetical order and selects the
28th, 56th and 84th student on the list.
(c) State the name for this type of sampling technique. [1]
Once the principal has obtained the names of the 9 students in the random sample, she surveys each student to
find out how long they used social media the previous day and measures their self-esteem using the Rosenberg
scale. The Rosenberg scale is a number between 10 and 40, where a high number represents high self-esteem.
(d.i) Calculate Pearson’s product moment correlation coefficient, r. [2]
(d.ii) Interpret the meaning of the value of r in the context of the principal’s concerns. [1]
(d.iii) Explain why the value of r makes it appropriate to find the equation of a regression line. [1]
(e) Another student at the school, Jasmine, has a self-esteem value of 29.
By finding the equation of an appropriate regression line, estimate the time Jasmine spent on
social media the previous day. [4]
(a.i) Determine the value of Pearson’s product‐moment correlation coefficient, r, for these data. [2]
(a.ii) Interpret, in context, the value of r found in part (a) (i). [1]
The relationship between t and p can be modelled by the regression line of p on t with equation p = at + b .
A 50‐year‐old male patient enters the medical clinic for his appointment.
(c) Use the regression equation from part (b) to predict this patient’s systolic blood pressure. [2]
(d) A 16‐year‐old male patient enters the medical clinic for his appointment.
Explain why the regression equation from part (b) should not be used to predict this patient’s
systolic blood pressure. [1]
(b) Hence, show that a data set with only 5 numbers in it cannot have any outliers. [5]
(c) Give an example of a set of data with 7 numbers in it that does have an outlier, justify this fact by
stating the Interquartile Range. [2]