0% found this document useful (0 votes)
22 views30 pages

Bivariate Data Booklet 2024

The document outlines a curriculum for Bivariate Data Analysis, detailing exercises, homework, and revision timelines for various topics such as scatterplots, correlation, and regression lines. It emphasizes the importance of understanding relationships between two numerical variables, constructing scatterplots, and interpreting correlation coefficients. Additionally, it highlights ethical considerations in data collection and the limitations of interpolation and extrapolation.

Uploaded by

Jeb807
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views30 pages

Bivariate Data Booklet 2024

The document outlines a curriculum for Bivariate Data Analysis, detailing exercises, homework, and revision timelines for various topics such as scatterplots, correlation, and regression lines. It emphasizes the importance of understanding relationships between two numerical variables, constructing scatterplots, and interpreting correlation coefficients. Additionally, it highlights ethical considerations in data collection and the limitations of interpolation and extrapolation.

Uploaded by

Jeb807
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

12MS1 12MS-S4: Bivariate Data Analysis 2024

Skill/Content Oxford Maths Standard Yr 12 Ch5 Homework? Revision?


1 Day
Ex 5A p215 AM
1 Scatterplots 1 Week
1abc, 2a, 3ad, 6
1 Month

1 Day
Ex 5B p221 AM
2 Correlation 1 Week
1, 2, 5, 7, 8acegik, 9aceg, 11
1 Month

1 Day
Ex 5C p227 AM
3 Regression Lines 1 Week
1a, 2ad, 3a, 4, 6, 11ac
1 Month

1 Day
Interpolation and Ex 5D p227 AM
4 1 Week
Extrapolation 1, 3, 6, 9
1 Month

1 Day
Bivariate Data in
5 NA – finish pages 1 Week
Practical Contexts
1 Month

Review Multiple Choice Questions p242


# Mixed Revision Practice Examination Question p249

1 of 30
Syllabus – recognise the limitations of interpolation and extrapolation, and
interpolate from plotted data to make predictions where appropriate
S4 Bivariate Data Analysis
(ACMGM062)
Content • solve problems that involve identifying, analysing and describing
Students: associations between two numerical variables AAM
• construct, interpret and analyse scatterplots for bivariate numerical data
• construct a bivariate scatterplot to identify patterns in the data that
in practical contexts AAM
suggest the presence of an association (ACMGM052) AAM
– demonstrate an awareness of issues of privacy and bias, ethics, and
• use bivariate scatterplots (constructing them when needed) to describe
responsiveness to diverse groups and cultures when collecting and
the patterns, features and associations of bivariate datasets, justifying
using data
any conclusions AAM
– investigate using biometric data obtained by measuring the body or
– describe bivariate datasets in terms of form (linear/non-linear) and,
by accessing published data from sources including government
in the case of linear, the direction (positive/negative) and strength of
organisations, and determine if any associations exist between
any association (strong/moderate/weak)
identified variables
– identify the dependent and independent variables within bivariate
datasets where appropriate
– describe and interpret a variety of bivariate datasets involving two
numerical variables using real-world examples from the media or
freely available from government or business datasets
– calculate and interpret Pearson’s correlation coefficient (𝑟) using
technology to quantify the strength of a linear association of a
sample (ACMGM054)
• model a linear relationship by fitting an appropriate line of best fit to a
scatterplot and using it to describe and quantify associations AAM
– fit a line of best fit both by eye and by using technology to the data
(ACMEM141, ACMEM142)
– fit a least-squares regression line to the data using technology
– interpret the intercept and gradient of the fitted line (ACMGM059)
• use the appropriate line of best fit, both found by eye and by applying
the equation, to make predictions by either interpolation or extrapolation

2 of 30
Where there is an independent variable, it must be graphed on the
S4 BIVARIATE DATA ANALYSIS horizontal axis (and the dependent on the vertical).
Scatterplots
“Correlation is not causation”. It is important to realise that although
Scatterplots (also known as scattergrams or scatter graphs) are used to there exists a correlation between two variables, it does not necessarily
graphically compare bivariate (two variable) data. mean one causes the other.

o http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation
When you draw out a scatterplot the aim is to investigate whether there
is a relationship between the two variables (correlation).

When the dots form some sort of a distinct pattern, there may be a
relationship. Cheese consumption per capita vs Deaths by
being tangled in bedsheets 2000-2009 in US
When the points in the scatterplot approximate a line, the relationship
would be called linear. 900
800
y = 113.13x - 2977.3
8 700
600
6

Deaths
500
4 400
300
2
200

2 4 6 8 100

When the points in the scatterplot approximate a curve, the relationship 0


29.5 30 30.5 31 31.5 32 32.5 33 33.5
is non-linear.
Cheese consumption (pounds per capita)
r = 0.9470
8

2 4 6 8
3 of 30
1. The scatterplot below shows results of 66 students on two tests; Test A (b) Are the following statements true or false?
and Test B.
Statement Veracity

The highest score on Test A was 30 T/F

Two people were equal first on Test B T/F


One student scored 20 on Test A and 22 on
T/F
Test B
Nobody scored the same mark on both tests T/F
Doing well on Test A causes you to do well on
T/F
Test B
As a general rule, the better you did on Test A,
T/F
the better you did on Test B
The lowest score on Test A is lower than the
T/F
lowest score for Test B.
The students found Test B harder T/F

The range of scores on Test B is 25. T/F


The student with the highest score on Test A also
T/F
has the highest score on Test B.
The biggest difference between a student’s
T/F
scores on the two tests is 5.

(a) One student was missing. They got 19 on Test A and 16 on Test B.
Plot this student’s results on the axes above.

4 of 30
2. The table shows the average temperature on each of seven days and the 3. Put the data shown into a scatterplot. Is the relationship linear or
number of units of gas used to heat a house on these days. non-linear?

Plot the missing points. Test 1


35 21 33 26 26 28 20 34 28 26 28 23 27 32
Mark
Test 2
30 18 37 28 32 33 13 28 35 22 28 19 32 25
Mark

Test 2

40

The first 5 points have been plotted for you.


Complete the scatterplot.

30

20

10

10 20 30 40 Test 1

5 of 30
4. GEN HSC12 Q29(a) (b) Julia saw two consecutive eruptions, one hour apart. Based on 1
Tourists visit a park where steam erupts from a particular geyser. the data in the graph, what was the longest possible duration of
The brochure for the park has a graph of the data collected for this the first eruption that she saw?
geyser over a period of time.
The graph shows the duration of an eruption and the time until the
next eruption, timed from the end of one eruption to the beginning
of the next.

(c) What does the graph suggest about the relationship between 1
the duration of an eruption and the time to the next eruption?

(a) Tony sees an eruption that lasts 4 minutes. Based on the data 1
in the graph, what is the minimum time that he can expect to
wait for the next eruption?

6 of 30
5. GEN HSC10 Q6 Correlation
A survey of Year 7 students found a number of relationships with a high
degree of correlation. The correlation coefficient (r) tells you how strong a relationship there
Which of the following relationships also demonstrates causality? may be between two variables. In this course, we only interpret the
correlation coefficient for linear relationships.
(A) Students’ height and the length of their arm span

(B) The size of students’ left feet and the size of their right feet
The closer the number to one (or negative one), the stronger the
relationship.
(C) Students’ test scores in Mathematics and their test scores in Music
If the correlation coefficient is positive, then an increase in one variable
(D) The number of hours students spent studying for a test and their will mean an increase in the other i.e. the data points should basically
results in that test be going up L→R (positive gradient)

If the correlation coefficient is negative, then an increase in one


variable will mean a decrease in the other i.e. the data points should
basically be going down L→R (negative gradient)

CORRELATION
DESCRIPTION GRAPH
COEFFICIENT
Perfect positive correlation
1
(the points are all in a line, positive
gradient)

Strong positive correlation


Between 0.75
and 1 (a thick line could cover most of the
points, positive gradient)

Moderate positive correlation


Between 0.5
(a thick line could cover some of
and 0.75
the points, others a bit either side,
positive gradient)
Weak positive correlation
Between 0.25
(challenging to place regression
and 0.5
line through slightly stretched blob,
positive gradient)
7 of 30
CORRELATION
DESCRIPTION GRAPH 1. GEN HSC03 Q8
COEFFICIENT Which scatterplot shows a low (weak) positive correlation?
No correlation
Between –0.25 (A) (B)
and 0.25 (just a blob, random spray of dots,
like a shot gun blast)

CORRELATION
DESCRIPTION GRAPH
COEFFICIENT
Weak negative correlation
Between –0.25
(challenging to place regression (C) (D)
and –0.5
line through slightly stretched blob,
negative gradient)
Moderate negative correlation
Between –0.5
(a thick line could cover some of
and –0.75
the points, others a bit either side,
negative gradient)

Strong negative correlation 2. GEN HSC11 Q8 / SP GEN HSC14 Q1


Between –0.75
In which graph would the data have a correlation coefficient closest to
and –1 (a thick line could cover most of the
points, negative gradient) –0.9?

Perfect negative correlation (A) (B)


–1
(the points are all in a line, negative
gradient)

Note:
In order to make any claim of a relationship between the variables, one
should look for a moderate correlation or stronger. (C) (D)

The correlation coefficient is always between –1 and 1.

Website: http://www.stat.berkeley.edu/~stark/Java/Html/Correlation.htm

8 of 30
3. Std HSC20 Q12 5. GEN HSC12 Q11
For a set of bivariate data, Pearson’s correlation coefficient is –1. Which of the following relationships would most likely show a negative
Which graph could best represent this set of bivariate data? correlation?

(A) The population of a town and the number of hospitals in that town.

(B) The hours spent training for a race and the time taken to complete
the race.

(C) The price per litre of petrol and the number of people riding
bicycles to work.

(D) The number of pets per household and the number of computers
per household.

4. GEN HSC07 Q9
Which of the following would be most likely to have a positive
correlation?

(A) The population of a town and the number of schools in that town

(B) The price of petrol per litre and the number of litres of petrol sold

(C) The hours training for a marathon and the time taken to complete
the marathon

(D) The number of dogs per household and the number of televisions
per household
9 of 30
6. Are the following likely to have positive, negative, or no correlation? 7. GEN HSC08 Q12
A scatterplot is shown.
(a) Education and income

(b) Number of pets a person has and number of books a person has read

(c) Number of days absent from school vs. maths average

Which of the following best describes the correlation between R and T?


(d) Test scores vs. shoe sizes
(A) Positive (B) Negative
(C) Positively skewed (D) Negatively skewed

(e) Distance travelled vs. amount of petrol in the car


8. SP GEN HSC01 Q18
Which graph shows a high positive correlation?

(f) Hours of studying vs. marks (A) (B)

(g) Hours in a shopping centre vs. amount of money spent

(h) How tall a person is and how fast they drive (C) (D)

(i) Temperature and number of people wearing jackets

10 of 30
9. GEN HSC13 Q2 11. Which of the following scatterplots would have the greatest positive
Which graph best shows data with a correlation closest to 0.3? correlation coefficient?

12. Gen HSC17 Q12


Which of the data sets graphed below has the largest positive correlation
coefficient value?

10. GEN HSC16 Q3


The graph shows a scatterplot for a set of data.

Which of the following is the best approximation for the correlation


coefficient of this set of data?

(A) −1 (B) − 0.3 (C) 0.3 (D) 1

11 of 30
Using The Calculator to Find the Correlation Coefficient Regression Lines

MODE , 2 (STAT), 2 (A+BX) A regression line (trend-line or line of best fit) is a straight line (for the
linear relationships we will look at) that attempts to show the general
Input (both sets of) scores trend for bivariate (two variable) data with a linear relationship.

AC
Drawing Regression Lines by Eye
SHIFT , 1 (STAT), 5 (REG), 3 (r)
Move the edge of your ruler through the scatterplot until you get it so
that the number of dots and their distance from the ruler is roughly
13. Find the correlation coefficient for the data given and describe the equal on both sides of the ruler. Now trace along the ruler’s edge.
correlation between the variables.
Homework Versus Test Result
(a)
x y 100
1 3
3 5
5 7
80
7 9

60

(%)
Re su l t
(b) 40

Test 1
35 21 33 26 26 28 20 34 28 26 28 23 27 32
Mark
Test 2 20
30 18 37 28 32 33 13 28 35 22 28 19 32 25
Mark

5 10 15 20
Homework (h ours )

12 of 30
1. Draw a line of best fit on the following. 2. Some students had their height and foot lengths measured and
recorded. The results were graphed and a line of best fit was
drawn by four different students as shown.
Which of the following shows the most suitable line of best fit?

13 of 30
Remember straight lines take the form y = mx + c where m and c are 3. SP GEN HSC01 Q10
replaced with numerals, found by considering:

o m is the ___________________
i.e. m =

o c is the __________________
i.e. where it cuts the ____________

A line of fit, l, is drawn through the points as shown.


What is the correct equation for line l?

x x
Using the Calculator for Regression Lines (A) y = +3 (B) y = −3
4 4
(C) y = 4 x − 12 (D) y = 4 x + 3
MODE , 2 (STAT), 2 (A+BX)

Input (both sets of) scores

AC

SHIFT , 1 (STAT), 5 (REG), 1 (A)

OR

SHIFT , 1 (STAT), 5 (REG), 2 (B)

NOTE to match mx + c to A + BX then:

o A = c (y-intercept)

o B = m (gradient)
14 of 30
4. GEN HSC01 Q23(a)
The 11 people in Sam’s cricket team always bat in the same order.
Sam recorded the batting order and the average number of runs
scored by each player during the season.

Batting Average
order number of runs
1 16
2 10
3 11
4 8
5 7
6 4
7 4
8 5
9 3
10 1
11 1

(a) Display the data as a scatterplot on the graph paper provided 2


below.
Make sure that you have labelled the axes.

(b) Draw a line of fit on your scatterplot on the graph paper 1


provided above.
(No calculations are necessary.)

(c) Using your scatterplot, describe the correlation between the 1


batting order and the average number of runs.

15 of 30
5. GEN HSC02 Q26(c) (c) What is the equation of the line of fit drawn? 2
A class of 30 students sat for an algebra test and a geometry test.
The results were displayed in a scatterplot, and a line of fit was
drawn, as shown.

(d) Describe the correlation between geometry test results and 1


algebra test results.

(e) Mitchell looked at the scatterplot and said: 1


‘In this class, all students who are near the top in algebra are
(a) How many students scored less than 30 on the algebra test? 1 also near the top in geometry’.
Explain why his statement is incorrect.

(b) Calculate the gradient of the line of fit drawn. 1

16 of 30
6. GEN HSC09 Q28(b) (a) Describe the correlation between the height and mass of this 1
The height and mass of a child are measured and recorded over its child, as shown in the graph.
first two years.

This information is displayed in a scatter graph. (b) A line of best fit has been drawn on the graph. Find the 2
equation of this line.

17 of 30
7. GEN HSC16 Q29(d) #Modified Interpolation and Extrapolation
Five students sat both a Physics and a Chemistry exam. Their
results are shown in the table. The mean and standard deviation of We can use regression lines to make predictions about data points
each exam are also shown. between (interpolation) and beyond (extrapolation) our data set.

You must be careful when extrapolating that it actually makes sense to


do so – and you certainly should not go a long way beyond your data
points. Extrapolation should be viewed critically.

A line of best fit should not be used to make predictions about a


population that is different from the population from which the sample
was drawn.
(a) Find the correlation coefficient, correct to three decimal 1
places. You have 2 options:

o Using the graph


Draw a line vertically (or horizontally) until you hit the trend line
then draw horizontally (vertically) to the axis.

o Using the equation


Sub in and evaluate (or solve)

(b) Determine the equation of the line of best fit. 2

18 of 30
1. Data collected on the age (a) and height (h) of 10- to 15-year-olds 2. GEN HSC06 Q27(b)
were used to create a scatterplot. A line of best fit to model the Each member of a group of males had his height and foot length
relationship between the age and height of students was then measured and recorded. The results were graphed and a line of fit
constructed as shown. drawn.

(a) Why does the value of the y-intercept have no meaning in this 1
situation?

(a) Determine the equation of the line of best fit shown. 3

(b) George is 10 cm taller than his brother Harry. Use the line of 1
fit to estimate the difference in their foot lengths.
(b) Based on the line of best fit, what is the height of a typical 1
15‑year‑old?

(c) Sam calculated a correlation coefficient of −1.2 for the data. 2


Give TWO reasons why Sam must be incorrect.
(c) Why would this model NOT be useful for predicting the 1
height of a typical 35-year-old?

19 of 30
3. The homework habits and test results of 9 year 12 students are (b) A comparison between the two sets of data is shown on the
shown in the table below. scatter plot below.

Homework 4 12 9 14 15 6 8 18 12
Test Result 52 66 59 87 82 41 59 93 73

(a) Find the equation of the line of best fit.


Round your answers to 1 decimal place where necessary.

Draw in the regression line found in part (a).

(c) If someone did 16 hours of homework, what test result would


you expect them to get?

(d) Explain why this equation cannot work for someone who did
32 hours of homework.

20 of 30
4. Std HSC19 Q23 (b) Identify the direction and the strength of the linear association 1
A set of bivariate data is collected by measuring the height and between height and arm span.
arm span of seven children. The graph shows a scatterplot of these
measurements.

(c) The equation of the least-squares regression line is shown. 1


Height = 0.866  ( arm span ) + 23.7
A child has an arm span of 143 cm.
Calculate the predicted height for this child using the equation
of the least-squares regression line.

(a) Calculate Pearson’s correlation coefficient for the data, correct 1


to two decimal places.

21 of 30
5. GEN HSC13 Q28(b) (c) Determine the equation of the line of best fit shown on the 2
Ahmed collected data on the age (a) and height (h) of males aged graph.
11 to 16 years. He created a scatterplot of the data and constructed
a line of best fit to model the relationship between the age and
height of males.

(d) Use the line of best fit to predict the height of a typical 1
17-year-old male.

(a) Determine the gradient of the line of best fit shown on the 1
(e) Why would this model not be useful for predicting the height 1
graph.
of a typical 45-year-old male?

(b) Explain the meaning of the gradient in the context of the data. 1

22 of 30
6. GEN HSC15 Q28(e) (b) Use the line of fit to estimate the height difference between a 1
The shoe size and height of ten students were recorded. student who wears a size 7.5 shoe and one who wears a size
9 shoe.

(a) Complete the scatter plot AND draw a line of fit by eye. 2

(c) A student calculated the correlation coefficient to be 1 for this 1


set of data. Explain why this cannot be correct.

23 of 30
Bivariate Data in Practical Contexts 2. List 3 considerations that need to be made regarding issues of
privacy and bias, ethics, and responsiveness to diverse groups and
1. GEN HSC17 Q30d #Modified cultures when collecting and using data.
In an investigation, students used different numbers of identical
small solar panels to power model cars. The cars were then tested
and their speed measured in km/h. The results are summarised in
the table.

3. A statistical investigation is being planned. 2


The process of the statistical investigation involves several steps.
The equation of the least-squares line of best fit, relating the speed
List the steps.
and the number of solar panels, has been calculated to be
y = 2.125x + 2.0375 .

(a) What would be the speed of a car powered by 5 solar panels, 1


based on this equation?

y
(b) Using the formula m = r  , 2
x
calculate the correlation coefficient between the number of
solar panels and the speed of a car.
4. A student claimed that as study time increases, test scores increase. 1
After collecting and analysing some data, the student found the
correlation coefficient, r, to be 0.83.
What does this correlation indicate about the relationship between
time spent on study and test scores?

24 of 30
5. GEN HSC18 Q29d #Modified (b) A student uses the least-squares line of best fit from part (a) 1
Data for life expectancy (expected remaining years of life) for to estimate the life expectancy of her grandmother who is
females at selected ages are given in the table. currently aged 87.
Explain why this does NOT give a valid estimate.

(c) Sally is a female aged 37. 1


Use the least-squares line of best fit for females to calculate
her life expectancy.

(d) For males, the least-squares line of best fit relating life 2
expectancy (y) and age (x) has the equation
y = −0.972 x + 80.44 .
James is a male. He marries Sally who is aged 37. On their
wedding day, they have the same life expectancy.
Calculate James’s age on their wedding day. Give your
(a) Use your calculator to find the equation of the least-squares 1 answer in years.
line of best fit. Show the gradient correct to 3 decimal places
and the y-intercept to 2 decimal places

25 of 30
6. Std HSC20 Q36 #Stoopidquestion Calculate the number of chirps expected in a 15-second interval when
A cricket is an insect. The male cricket produces a chirping sound. 5 the temperature is 19° Celsius. Give your answer correct to the
nearest whole number.
A scientist wants to explore the relationship between the temperature
in degrees Celsius and the number of cricket chirps heard in a 15-
second time interval.

Once a day for 20 days, the scientist collects data. Based on the 20
data points, the scientist provides the information below.

• A box-plot of the temperature data is shown.

• The mean temperature in the dataset is 0.525°C below the median


temperature in the dataset.

• A total of 684 chirps was counted when collecting the 20 data


points.

The scientist fits a least-squares regression line using the data ( x, y ) ,


where x is the temperature in degrees Celsius and y is the number of
chirps heard in a 15-second time interval. The equation of this line is

y = −10.6063 + bx ,

where b is the slope of the regression line.

The least-squares regression line passes through the point ( x , y )


where x is the sample mean of the temperature data and y is the
sample mean of the chirp data.
26 of 30
27 of 30
28 of 30
7. Complete the following table for 10 people in the class.

Shoe size
Forearm length
(cm)

Hence, complete the scatterplot for this biometric data below.

29 of 30
SUMMARY

30 of 30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy