Bivariate Data Booklet 2024
Bivariate Data Booklet 2024
1 Day
Ex 5B p221 AM
2 Correlation 1 Week
1, 2, 5, 7, 8acegik, 9aceg, 11
1 Month
1 Day
Ex 5C p227 AM
3 Regression Lines 1 Week
1a, 2ad, 3a, 4, 6, 11ac
1 Month
1 Day
Interpolation and Ex 5D p227 AM
4 1 Week
Extrapolation 1, 3, 6, 9
1 Month
1 Day
Bivariate Data in
5 NA – finish pages 1 Week
Practical Contexts
1 Month
1 of 30
Syllabus – recognise the limitations of interpolation and extrapolation, and
interpolate from plotted data to make predictions where appropriate
S4 Bivariate Data Analysis
(ACMGM062)
Content • solve problems that involve identifying, analysing and describing
Students: associations between two numerical variables AAM
• construct, interpret and analyse scatterplots for bivariate numerical data
• construct a bivariate scatterplot to identify patterns in the data that
in practical contexts AAM
suggest the presence of an association (ACMGM052) AAM
– demonstrate an awareness of issues of privacy and bias, ethics, and
• use bivariate scatterplots (constructing them when needed) to describe
responsiveness to diverse groups and cultures when collecting and
the patterns, features and associations of bivariate datasets, justifying
using data
any conclusions AAM
– investigate using biometric data obtained by measuring the body or
– describe bivariate datasets in terms of form (linear/non-linear) and,
by accessing published data from sources including government
in the case of linear, the direction (positive/negative) and strength of
organisations, and determine if any associations exist between
any association (strong/moderate/weak)
identified variables
– identify the dependent and independent variables within bivariate
datasets where appropriate
– describe and interpret a variety of bivariate datasets involving two
numerical variables using real-world examples from the media or
freely available from government or business datasets
– calculate and interpret Pearson’s correlation coefficient (𝑟) using
technology to quantify the strength of a linear association of a
sample (ACMGM054)
• model a linear relationship by fitting an appropriate line of best fit to a
scatterplot and using it to describe and quantify associations AAM
– fit a line of best fit both by eye and by using technology to the data
(ACMEM141, ACMEM142)
– fit a least-squares regression line to the data using technology
– interpret the intercept and gradient of the fitted line (ACMGM059)
• use the appropriate line of best fit, both found by eye and by applying
the equation, to make predictions by either interpolation or extrapolation
2 of 30
Where there is an independent variable, it must be graphed on the
S4 BIVARIATE DATA ANALYSIS horizontal axis (and the dependent on the vertical).
Scatterplots
“Correlation is not causation”. It is important to realise that although
Scatterplots (also known as scattergrams or scatter graphs) are used to there exists a correlation between two variables, it does not necessarily
graphically compare bivariate (two variable) data. mean one causes the other.
o http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation
When you draw out a scatterplot the aim is to investigate whether there
is a relationship between the two variables (correlation).
When the dots form some sort of a distinct pattern, there may be a
relationship. Cheese consumption per capita vs Deaths by
being tangled in bedsheets 2000-2009 in US
When the points in the scatterplot approximate a line, the relationship
would be called linear. 900
800
y = 113.13x - 2977.3
8 700
600
6
Deaths
500
4 400
300
2
200
2 4 6 8 100
2 4 6 8
3 of 30
1. The scatterplot below shows results of 66 students on two tests; Test A (b) Are the following statements true or false?
and Test B.
Statement Veracity
(a) One student was missing. They got 19 on Test A and 16 on Test B.
Plot this student’s results on the axes above.
4 of 30
2. The table shows the average temperature on each of seven days and the 3. Put the data shown into a scatterplot. Is the relationship linear or
number of units of gas used to heat a house on these days. non-linear?
Test 2
40
30
20
10
10 20 30 40 Test 1
5 of 30
4. GEN HSC12 Q29(a) (b) Julia saw two consecutive eruptions, one hour apart. Based on 1
Tourists visit a park where steam erupts from a particular geyser. the data in the graph, what was the longest possible duration of
The brochure for the park has a graph of the data collected for this the first eruption that she saw?
geyser over a period of time.
The graph shows the duration of an eruption and the time until the
next eruption, timed from the end of one eruption to the beginning
of the next.
(c) What does the graph suggest about the relationship between 1
the duration of an eruption and the time to the next eruption?
(a) Tony sees an eruption that lasts 4 minutes. Based on the data 1
in the graph, what is the minimum time that he can expect to
wait for the next eruption?
6 of 30
5. GEN HSC10 Q6 Correlation
A survey of Year 7 students found a number of relationships with a high
degree of correlation. The correlation coefficient (r) tells you how strong a relationship there
Which of the following relationships also demonstrates causality? may be between two variables. In this course, we only interpret the
correlation coefficient for linear relationships.
(A) Students’ height and the length of their arm span
(B) The size of students’ left feet and the size of their right feet
The closer the number to one (or negative one), the stronger the
relationship.
(C) Students’ test scores in Mathematics and their test scores in Music
If the correlation coefficient is positive, then an increase in one variable
(D) The number of hours students spent studying for a test and their will mean an increase in the other i.e. the data points should basically
results in that test be going up L→R (positive gradient)
CORRELATION
DESCRIPTION GRAPH
COEFFICIENT
Perfect positive correlation
1
(the points are all in a line, positive
gradient)
CORRELATION
DESCRIPTION GRAPH
COEFFICIENT
Weak negative correlation
Between –0.25
(challenging to place regression (C) (D)
and –0.5
line through slightly stretched blob,
negative gradient)
Moderate negative correlation
Between –0.5
(a thick line could cover some of
and –0.75
the points, others a bit either side,
negative gradient)
Note:
In order to make any claim of a relationship between the variables, one
should look for a moderate correlation or stronger. (C) (D)
Website: http://www.stat.berkeley.edu/~stark/Java/Html/Correlation.htm
8 of 30
3. Std HSC20 Q12 5. GEN HSC12 Q11
For a set of bivariate data, Pearson’s correlation coefficient is –1. Which of the following relationships would most likely show a negative
Which graph could best represent this set of bivariate data? correlation?
(A) The population of a town and the number of hospitals in that town.
(B) The hours spent training for a race and the time taken to complete
the race.
(C) The price per litre of petrol and the number of people riding
bicycles to work.
(D) The number of pets per household and the number of computers
per household.
4. GEN HSC07 Q9
Which of the following would be most likely to have a positive
correlation?
(A) The population of a town and the number of schools in that town
(B) The price of petrol per litre and the number of litres of petrol sold
(C) The hours training for a marathon and the time taken to complete
the marathon
(D) The number of dogs per household and the number of televisions
per household
9 of 30
6. Are the following likely to have positive, negative, or no correlation? 7. GEN HSC08 Q12
A scatterplot is shown.
(a) Education and income
(b) Number of pets a person has and number of books a person has read
(h) How tall a person is and how fast they drive (C) (D)
10 of 30
9. GEN HSC13 Q2 11. Which of the following scatterplots would have the greatest positive
Which graph best shows data with a correlation closest to 0.3? correlation coefficient?
11 of 30
Using The Calculator to Find the Correlation Coefficient Regression Lines
MODE , 2 (STAT), 2 (A+BX) A regression line (trend-line or line of best fit) is a straight line (for the
linear relationships we will look at) that attempts to show the general
Input (both sets of) scores trend for bivariate (two variable) data with a linear relationship.
AC
Drawing Regression Lines by Eye
SHIFT , 1 (STAT), 5 (REG), 3 (r)
Move the edge of your ruler through the scatterplot until you get it so
that the number of dots and their distance from the ruler is roughly
13. Find the correlation coefficient for the data given and describe the equal on both sides of the ruler. Now trace along the ruler’s edge.
correlation between the variables.
Homework Versus Test Result
(a)
x y 100
1 3
3 5
5 7
80
7 9
60
(%)
Re su l t
(b) 40
Test 1
35 21 33 26 26 28 20 34 28 26 28 23 27 32
Mark
Test 2 20
30 18 37 28 32 33 13 28 35 22 28 19 32 25
Mark
5 10 15 20
Homework (h ours )
12 of 30
1. Draw a line of best fit on the following. 2. Some students had their height and foot lengths measured and
recorded. The results were graphed and a line of best fit was
drawn by four different students as shown.
Which of the following shows the most suitable line of best fit?
13 of 30
Remember straight lines take the form y = mx + c where m and c are 3. SP GEN HSC01 Q10
replaced with numerals, found by considering:
o m is the ___________________
i.e. m =
o c is the __________________
i.e. where it cuts the ____________
x x
Using the Calculator for Regression Lines (A) y = +3 (B) y = −3
4 4
(C) y = 4 x − 12 (D) y = 4 x + 3
MODE , 2 (STAT), 2 (A+BX)
AC
OR
o A = c (y-intercept)
o B = m (gradient)
14 of 30
4. GEN HSC01 Q23(a)
The 11 people in Sam’s cricket team always bat in the same order.
Sam recorded the batting order and the average number of runs
scored by each player during the season.
Batting Average
order number of runs
1 16
2 10
3 11
4 8
5 7
6 4
7 4
8 5
9 3
10 1
11 1
15 of 30
5. GEN HSC02 Q26(c) (c) What is the equation of the line of fit drawn? 2
A class of 30 students sat for an algebra test and a geometry test.
The results were displayed in a scatterplot, and a line of fit was
drawn, as shown.
16 of 30
6. GEN HSC09 Q28(b) (a) Describe the correlation between the height and mass of this 1
The height and mass of a child are measured and recorded over its child, as shown in the graph.
first two years.
This information is displayed in a scatter graph. (b) A line of best fit has been drawn on the graph. Find the 2
equation of this line.
17 of 30
7. GEN HSC16 Q29(d) #Modified Interpolation and Extrapolation
Five students sat both a Physics and a Chemistry exam. Their
results are shown in the table. The mean and standard deviation of We can use regression lines to make predictions about data points
each exam are also shown. between (interpolation) and beyond (extrapolation) our data set.
18 of 30
1. Data collected on the age (a) and height (h) of 10- to 15-year-olds 2. GEN HSC06 Q27(b)
were used to create a scatterplot. A line of best fit to model the Each member of a group of males had his height and foot length
relationship between the age and height of students was then measured and recorded. The results were graphed and a line of fit
constructed as shown. drawn.
(a) Why does the value of the y-intercept have no meaning in this 1
situation?
(b) George is 10 cm taller than his brother Harry. Use the line of 1
fit to estimate the difference in their foot lengths.
(b) Based on the line of best fit, what is the height of a typical 1
15‑year‑old?
19 of 30
3. The homework habits and test results of 9 year 12 students are (b) A comparison between the two sets of data is shown on the
shown in the table below. scatter plot below.
Homework 4 12 9 14 15 6 8 18 12
Test Result 52 66 59 87 82 41 59 93 73
(d) Explain why this equation cannot work for someone who did
32 hours of homework.
20 of 30
4. Std HSC19 Q23 (b) Identify the direction and the strength of the linear association 1
A set of bivariate data is collected by measuring the height and between height and arm span.
arm span of seven children. The graph shows a scatterplot of these
measurements.
21 of 30
5. GEN HSC13 Q28(b) (c) Determine the equation of the line of best fit shown on the 2
Ahmed collected data on the age (a) and height (h) of males aged graph.
11 to 16 years. He created a scatterplot of the data and constructed
a line of best fit to model the relationship between the age and
height of males.
(d) Use the line of best fit to predict the height of a typical 1
17-year-old male.
(a) Determine the gradient of the line of best fit shown on the 1
(e) Why would this model not be useful for predicting the height 1
graph.
of a typical 45-year-old male?
(b) Explain the meaning of the gradient in the context of the data. 1
22 of 30
6. GEN HSC15 Q28(e) (b) Use the line of fit to estimate the height difference between a 1
The shoe size and height of ten students were recorded. student who wears a size 7.5 shoe and one who wears a size
9 shoe.
(a) Complete the scatter plot AND draw a line of fit by eye. 2
23 of 30
Bivariate Data in Practical Contexts 2. List 3 considerations that need to be made regarding issues of
privacy and bias, ethics, and responsiveness to diverse groups and
1. GEN HSC17 Q30d #Modified cultures when collecting and using data.
In an investigation, students used different numbers of identical
small solar panels to power model cars. The cars were then tested
and their speed measured in km/h. The results are summarised in
the table.
y
(b) Using the formula m = r , 2
x
calculate the correlation coefficient between the number of
solar panels and the speed of a car.
4. A student claimed that as study time increases, test scores increase. 1
After collecting and analysing some data, the student found the
correlation coefficient, r, to be 0.83.
What does this correlation indicate about the relationship between
time spent on study and test scores?
24 of 30
5. GEN HSC18 Q29d #Modified (b) A student uses the least-squares line of best fit from part (a) 1
Data for life expectancy (expected remaining years of life) for to estimate the life expectancy of her grandmother who is
females at selected ages are given in the table. currently aged 87.
Explain why this does NOT give a valid estimate.
(d) For males, the least-squares line of best fit relating life 2
expectancy (y) and age (x) has the equation
y = −0.972 x + 80.44 .
James is a male. He marries Sally who is aged 37. On their
wedding day, they have the same life expectancy.
Calculate James’s age on their wedding day. Give your
(a) Use your calculator to find the equation of the least-squares 1 answer in years.
line of best fit. Show the gradient correct to 3 decimal places
and the y-intercept to 2 decimal places
25 of 30
6. Std HSC20 Q36 #Stoopidquestion Calculate the number of chirps expected in a 15-second interval when
A cricket is an insect. The male cricket produces a chirping sound. 5 the temperature is 19° Celsius. Give your answer correct to the
nearest whole number.
A scientist wants to explore the relationship between the temperature
in degrees Celsius and the number of cricket chirps heard in a 15-
second time interval.
Once a day for 20 days, the scientist collects data. Based on the 20
data points, the scientist provides the information below.
y = −10.6063 + bx ,
Shoe size
Forearm length
(cm)
29 of 30
SUMMARY
30 of 30