0% found this document useful (0 votes)
185 views16 pages

Basic Statistics Questions

Uploaded by

beautysarah1000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
185 views16 pages

Basic Statistics Questions

Uploaded by

beautysarah1000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Basic Statistics [180 marks]

1. [Maximum mark: 6] 20N.2.SL.TZ0.S_2


Lucy sells hot chocolate drinks at her snack bar and has noticed that she sells more hot chocolates on cooler days.
On six different days, she records the maximum daily temperature, T , measured in degrees centigrade, and the
number of hot chocolates sold, H . The results are shown in the following table.

The relationship between H and T can be modelled by the regression line with equation H = aT + b.

(a.i) Find the value of a and of b. [3]

(a.ii) Write down the correlation coefficient. [1]

(b) Using the regression equation, estimate the number of hot chocolates that Lucy will sell on a
day when the maximum temperature is 12°C. [2]

2. [Maximum mark: 6] 20N.1.SL.TZ0.T_3


Hafizah harvested 49 mangoes from her farm. The weights of the mangoes, w, in grams, are shown in the
following grouped frequency table.

(a) Write down the modal group for these data. [1]

(b) Use your graphic display calculator to find an estimate of the standard deviation of the weights
of mangoes from this harvest. [2]

(c) On the grid below, draw a histogram for the data in the table.
[3]

3. [Maximum mark: 15] 20N.1.SL.TZ0.S_8


Each athlete on a running team recorded the distance (M miles) they ran in 30 minutes.

The median distance is 4 miles and the interquartile range is 1. 1 miles.

This information is shown in the following box-and-whisker plot.

(a) Find the value of a. [2]

The distance in miles, M , can be converted to the distance in kilometres, K , using the formula K =
8

5
M.

(b) Write down the value of the median distance in kilometres (km). [1]

The variance of the distances run by the athletes is .


16 2
km
9

The standard deviation of the distances is b miles.

(c) Find the value of b. [4]

A total of 600 athletes from different teams compete in a 5 km race. The times the 600 athletes took to run the
5 km race are shown in the following cumulative frequency graph.
There were 400 athletes who took between 22 and m minutes to complete the 5 km race.
(d) Find m. [3]

(e) The first 150 athletes that completed the race won a prize.

Given that an athlete took between 22 and m minutes to complete the 5 km race, calculate
the probability that they won a prize. [5]

4. [Maximum mark: 6] 21M.2.SL.TZ2.1


At a café, the waiting time between ordering and receiving a cup of coffee is dependent upon the number of
customers who have already ordered their coffee and are waiting to receive it.

Sarah, a regular customer, visited the café on five consecutive days. The following table shows the number of
customers, x, ahead of Sarah who have already ordered and are waiting to receive their coffee and Sarah’s
waiting time, y minutes.
The relationship between x and y can be modelled by the regression line of y on x with equation y = ax + b.

(a.i) Find the value of a and the value of b. [2]

(a.ii) Write down the value of Pearson’s product-moment correlation coefficient, r. [1]

(b) Interpret, in context, the value of a found in part (a)(i). [1]

(c) On another day, Sarah visits the café to order a coffee. Seven customers have already ordered
their coffee and are waiting to receive it.

Use the result from part (a)(i) to estimate Sarah’s waiting time to receive her coffee. [2]

5. [Maximum mark: 7] 21M.2.SL.TZ1.2


The following table shows the data collected from an experiment.

The data is also represented on the following scatter diagram.

The relationship between x and y can be modelled by the regression line of y on x with equation y = ax + b,

where a, b ∈ R.

(a) Write down the value of a and the value of b. [2]

(b) Use this model to predict the value of y when x = 18. [2]

(c)
¯
¯ Write down the value of x and the value of y . [1]

(d) Draw the line of best fit on the scatter diagram. [2]
6. [Maximum mark: 14] 21M.1.SL.TZ2.7
A large school has students from Year 6 to Year 12.

A group of 80 students in Year 12 were randomly selected and surveyed to find out how many hours per week
they each spend doing homework. Their results are represented by the following cumulative frequency graph.

(a) Find the median number of hours per week these Year 12 students spend doing homework. [2]

(b) Given that 10% of these Year 12 students spend more than k hours per week doing homework,
find the value of k. [3]

This same information is represented by the following table.

(c) Find the value of p and the value of q. [4]

There are 320 students in Year 12 at this school.


(d) Estimate the number of Year 12 students that spend more than 15 hours each week doing
homework. [3]

(e.i) Explain why this sampling method might not provide an accurate representation of the amount
of time all of the students in the school spend doing homework. [1]

(e.ii) Suggest a more appropriate sampling method. [1]

7. [Maximum mark: 5] 21M.1.SL.TZ1.4


A research student weighed lizard eggs in grams and recorded the results. The following box and whisker
diagram shows a summary of the results where L and U are the lower and upper quartiles respectively.

The interquartile range is 20 grams and there are no outliers in the results.

(a) Find the minimum possible value of U . [3]

(b) Hence, find the minimum possible value of L. [2]

8. [Maximum mark: 5] 21N.2.SL.TZ0.1


In Lucy’s music academy, eight students took their piano diploma examination and achieved scores out of 150.
For her records, Lucy decided to record the average number of hours per week each student reported practising
in the weeks prior to their examination. These results are summarized in the table below.

(a) Find Pearson’s product-moment correlation coefficient, r, for these data. [2]

(b) The relationship between the variables can be modelled by the regression equation
D = ah + b. Write down the value of a and the value of b. [1]

(c) One of these eight students was disappointed with her result and wished she had practised
more. Based on the given data, determine how her score could have been expected to alter had
she practised an extra five hours per week. [2]
9. [Maximum mark: 6] 22M.2.SL.TZ2.5
A random sample of nine adults were selected to see whether sleeping well affected their reaction times to a
visual stimulus. Each adult’s reaction time was measured twice.

The first measurement for reaction time was taken on a morning after the adult had slept well. The second
measurement was taken on a morning after the same adult had not slept well.

The box and whisker diagrams for the reaction times, measured in seconds, are shown below.

Consider the box and whisker diagram representing the reaction times after sleeping well.

(a) State the median reaction time after sleeping well. [1]

(b) Verify that the measurement of 0. 46 seconds is not an outlier. [3]

(c) State why it appears that the mean reaction time is greater than the median reaction time. [1]

(d) Now consider the two box and whisker diagrams.

Comment on whether these box and whisker diagrams provide any evidence that
might suggest that not sleeping well causes an increase in reaction time. [1]

10. [Maximum mark: 5] 22N.2.SL.TZ0.1


The following table shows the Mathematics test scores (x) and the Science test scores (y) for a group of eight
students.

The regression line of y on x for this data can be written in the form y = ax + b.
(a) Find the value of a and the value of b. [2]

(b) Write down the value of the Pearson’s product-moment correlation coefficient, r. [1]

(c) Use the equation of your regression line to predict the Science test score for a student who has a
score of 78 on the Mathematics test. Express your answer to the nearest integer. [2]

11. [Maximum mark: 7] 22M.1.SL.TZ1.3


A survey at a swimming pool is given to one adult in each family. The age of the adult, a years old, and of their
eldest child, c years old, are recorded.

The ages of the eldest child are summarized in the following box and whisker diagram.

(a) Find the largest value of c that would not be considered an outlier. [3]

The regression line of a on c is a =


7

4
c + 20. The regression line of c on a is c =
1

2
a − 9.

(b.i) One of the adults surveyed is 42 years old. Estimate the age of their eldest child. [2]

(b.ii) Find the mean age of all the adults surveyed. [2]

12. [Maximum mark: 4] 22M.2.SL.TZ1.2


The number of hours spent exercising each week by a group of students is shown in the following table.

The median is 4. 5 hours.

(a) Find the value of x. [2]


(b) Find the standard deviation. [2]

13. [Maximum mark: 6] 23M.1.SL.TZ1.3


On a Monday at an amusement park, a sample of 40 visitors was randomly selected as they were leaving the
park. They were asked how many times that day they had been on a ride called The Dragon. This information is
summarized in the following frequency table.

It can be assumed that this sample is representative of all visitors to the park for the following day.

(a) For the following day, Tuesday, estimate

(a.i) the probability that a randomly selected visitor will ride The Dragon; [2]

(a.ii) the expected number of times a visitor will ride The Dragon. [2]

It is known that 1000 visitors will attend the amusement park on Tuesday. The Dragon can carry a maximum of 10
people each time it runs.

(b) Estimate the minimum number of times The Dragon must run to satisfy demand. [2]

14. [Maximum mark: 5] 24M.2.AHL.TZ2.2


Consider the following bivariate data set where p, q ∈ Z
+
.

The regression line of y on x has equation y = 2. 1875x + 0. 6875.

The regression line passes through the mean point (x, y ).


¯
¯

(a)
¯ Given that x = 7, verify that y = 16̄. [1]
(b) Given that q − p = 3, find the value of p and the value of q. [4]

15. [Maximum mark: 17] 23N.1.SL.TZ2.7


A ballet company performs The Nutcracker every year. Last year they gave a total of 60 performances at their theatre
which has a maximum capacity of 800. The number of tickets sold, n, at each performance is shown in the
following frequency table.

(a.i) Find the value of p. [1]

(a.ii) Write down the modal class. [1]

The following cumulative frequency diagram also displays these data.

(b) Use the cumulative frequency curve to estimate

(b.i) the median number of tickets sold. [1]

(b.ii) the number of performances where at least 80 % of the tickets were sold. [3]

After a performance, the company decides to conduct a survey to obtain feedback from the audience.
(c.i) State one disadvantage of the company surveying only the first 5 % of the audience as they
leave the theatre. [1]

(c.ii) Describe briefly how the company could collect feedback from 5 % of the audience using the
systematic sampling method. [2]

(c.iii) State the sampling method which should be used if the survey is to be representative of the
number of children and the number of adults in the audience. [1]

Last year 36 000 tickets were sold to The Nutcracker.

(d) The following box and whisker diagram displays the amount spent by the audience at the
souvenir shop when they attended the performance.

(d.i) Estimate the number of people who spent between $3 and $25. [2]

(d.ii) Half the audience spent less than $a. Estimate the value of a. [1]

(e) This year the company will again give 60 performances and expects to sell 17 additional tickets
for each performance.

(e.i) Calculate the mean number of tickets the company expects to sell this year for each
performance. [3]

(e.ii) State what effect, if any, this increase in ticket sales would have on the variance of the number of
tickets sold for each performance. [1]

16. [Maximum mark: 5] 24M.2.SL.TZ2.3


Consider the following bivariate data set where p, q ∈ Z
+
.

The regression line of y on x has equation y = 2. 1875x + 0. 6875.

The regression line passes through the mean point (x, y ).


¯
¯

(a)
¯ Given that x = 7, verify that y = 16̄. [1]

(b) Given that q − p = 3, find the value of p and the value of q. [4]
17. [Maximum mark: 6] 24M.2.SL.TZ2.1
In a study, the mobile phone usage of a random sample of ten students was examined on a particular day.

The length of time, t hours, that the ten students used their phones for are listed below.

0. 7 1. 2 1. 9 4. 0 4. 4 4. 5 4. 9 5. 7 6. 5 11. 7

(a) For these data, find the

(a.i) median; [1]

(a.ii) interquartile range. [2]

An outlier is a value that is less than Q 1 − 1. 5 × IQR or greater than Q 3 + 1. 5 × IQR.

(b) Show that 11. 7 is an outlier. [3]

18. [Maximum mark: 7] 24M.2.SL.TZ1.6


A class is given two tests, Test A and Test B. Each test is scored out of a total of 100 marks. The scores of the
students are shown in the following table.

Let x be the score on Test A and y be the score on Test B.

The teacher finds that the equation of the regression line of y on x for these scores is y = 0. 822x + 18. 4.

(a) Find the value of Pearson’s product-moment correlation coefficient, r. [2]

Giovanni was absent for Test A and Paulo was absent for Test B.

The teacher uses the regression line of y on x to estimate the missing scores.

Paulo scored 10 on Test A.

The teacher estimated his score on Test B to be 27 to the nearest integer using the following calculation:

y = 0. 822(10) + 18. 4 ≈ 27

(b) Give a reason why this method is not appropriate for Paulo. [1]
Giovanni scored 90 on Test B.

The teacher estimated his score on Test A to be 87 to the nearest integer using the following calculation:

90 = 0. 822x + 18. 4, so x =
90−18.4
≈ 87
0.822

(c.i) Give a reason why this method is not appropriate for Giovanni. [1]

(c.ii) Use an appropriate method to show that the estimated Test A score for Giovanni is 86 to the
nearest integer. [3]

19. [Maximum mark: 7] 24M.2.SL.TZ1.1


Janie claims that rabbits in Australia have longer ears than rabbits in Spain.

To test her claim, a randomly selected sample of rabbits was collected in each country.

The length of one ear of each rabbit was measured and the value recorded correct to the nearest millimetre (mm
).

In the Australian sample, the median recorded value was 80 mm and the interquartile range was 11 mm.

The recorded values for the Spanish sample are shown in the following box and whisker diagram.

(a) Complete the following table for the recorded values of the lengths of the rabbits’ ears in each
sample.

[3]

(b) Justifying your answers, compare the distributions of the lengths of rabbits’ ears in Australia and
Spain using

(b.i) the median; [2]

(b.ii) the interquartile range. [2]


20. [Maximum mark: 5] SPM.2.SL.TZ0.5
The following table below shows the marks scored by seven students on two different mathematics tests.

Let L1 be the regression line of x on y. The equation of the line L1 can be written in the form x = ay + b.

(a) Find the value of a and the value of b. [2]

(b) Let L2 be the regression line of y on x. The lines L1 and L2 pass through the same point with
coordinates (p , q).

Find the value of p and the value of q. [3]

21. [Maximum mark: 15] EXM.2.SL.TZ0.1


The principal of a high school is concerned about the effect social media use might be having on the self-esteem
of her students. She decides to survey a random sample of 9 students to gather some data. She wants the number
of students in each grade in the sample to be, as far as possible, in the same proportion as the number of students
in each grade in the school.

(a) State the name for this type of sampling technique. [1]

The number of students in each grade in the school is shown in table.

(b.i) Show that 3 students will be selected from grade 12. [3]

(b.ii) Calculate the number of students in each grade in the sample. [2]

In order to select the 3 students from grade 12, the principal lists their names in alphabetical order and selects the
28th, 56th and 84th student on the list.

(c) State the name for this type of sampling technique. [1]

Once the principal has obtained the names of the 9 students in the random sample, she surveys each student to
find out how long they used social media the previous day and measures their self-esteem using the Rosenberg
scale. The Rosenberg scale is a number between 10 and 40, where a high number represents high self-esteem.
(d.i) Calculate Pearson’s product moment correlation coefficient, r. [2]

(d.ii) Interpret the meaning of the value of r in the context of the principal’s concerns. [1]

(d.iii) Explain why the value of r makes it appropriate to find the equation of a regression line. [1]

(e) Another student at the school, Jasmine, has a self-esteem value of 29.

By finding the equation of an appropriate regression line, estimate the time Jasmine spent on
social media the previous day. [4]

22. [Maximum mark: 8] EXN.2.SL.TZ0.4


The following table shows the systolic blood pressures, p mmHg, and the ages, t years, of 6 male patients at a
medical clinic.

(a.i) Determine the value of Pearson’s product‐moment correlation coefficient, r, for these data. [2]

(a.ii) Interpret, in context, the value of r found in part (a) (i). [1]

The relationship between t and p can be modelled by the regression line of p on t with equation p = at + b .

(b) Find the equation of the regression line of p on t. [2]

A 50‐year‐old male patient enters the medical clinic for his appointment.

(c) Use the regression equation from part (b) to predict this patient’s systolic blood pressure. [2]

(d) A 16‐year‐old male patient enters the medical clinic for his appointment.

Explain why the regression equation from part (b) should not be used to predict this patient’s
systolic blood pressure. [1]

23. [Maximum mark: 9] EXM.1.SL.TZ0.2


A set of data comprises of five numbers x 1 , x2 , x3 , x4 , x 5 which have been placed in ascending order.

(a) Recalling definitions, such as the Lower Quartile is the n+1


th piece of data with the data
4

placed in order, find an expression for the Interquartile Range.


[2]

(b) Hence, show that a data set with only 5 numbers in it cannot have any outliers. [5]

(c) Give an example of a set of data with 7 numbers in it that does have an outlier, justify this fact by
stating the Interquartile Range. [2]

24. [Maximum mark: 4] EXN.2.SL.TZ0.1


A data set consisting of 16 test scores has mean 14. 5 . One test score of 9 requires a second marking
and is removed from the data set.

Find the mean of the remaining 15 test scores. [4]

© International Baccalaureate Organization, 2024

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy