0% found this document useful (0 votes)
16 views28 pages

Chapter 2 Correlation Notes (AS FS2)

Chapter 2 discusses the quantification of correlation between two linked variables using correlation coefficients that range from -1 to +1. It introduces the Product Moment Correlation Coefficient (PMCC) for continuous data and Spearman's rank correlation for ranked data, explaining their applications and differences. The chapter also covers hypothesis testing for zero correlation, providing examples and formulas for calculating correlation coefficients.

Uploaded by

isfakfx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

Chapter 2 Correlation Notes (AS FS2)

Chapter 2 discusses the quantification of correlation between two linked variables using correlation coefficients that range from -1 to +1. It introduces the Product Moment Correlation Coefficient (PMCC) for continuous data and Spearman's rank correlation for ranked data, explaining their applications and differences. The chapter also covers hypothesis testing for zero correlation, providing examples and formulas for calculating correlation coefficients.

Uploaded by

isfakfx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

FS2: Chapter 2, Correlation

In this chapter, we want to quantify how well correlated two linked variables are (bivariate data).

Our correlation coefficients will vary between −1 and +1, so that a value of −1 corresponds to perfect
negative correlation, and a value of +1 corresponds to perfect positive correlation, and a value of 0
corresponds to no correlation at all.

Note that the strength of correlation is a measure of how closely the data falls into a straight line with
either positive or negative correlation – it doesn’t indicate how steep the line is, 𝑏 does that from
Chapter 1.

Ex 2A Product moment correlation


coefficient, PMCC or 𝑟
Ex 2B Spearman’s rank correlation
coefficient, 𝑟𝑠

Ex 2C Hypothesis testing

Exam Questions

Note: some of this content appears in AS Statistics, and A2 Statistics – we explore it more deeply here in FS2.
The PMCC was designed to measure the correlation for continuous data that comes
from a normal distribution.

Spearman’s rank is a special case of the PMCC where the data are converted to rankings
before calculating the coefficient.

PMCC, 𝒓 Spearman’s rank, 𝒓𝒔


If data is continuous… If one or both of the data sets represents a ranking
(things in order, not measured on a continuous scale)
… and taken from two normal distributions If one or both of the data sets is not from a normal
distribution
If correlation is linear If correlation is non-linear

If you want an easier calculation for an


approximation of PMCC

So when you do Spearman’s rank, all the data is converted to rankings – meaning they
are assigned a number 1, 2, 3, etc. according to their preference/position.

You can give the highest or lowest number position 1, as long as you are consistent for
the question.
PMCC, the formula -
X
X

Note: 𝑟 is unaffected by linear coding of the form 𝑎𝑥𝑖 + 𝑏 for 𝑎 > 0

A value of 𝑟 close to 1 or −1 suggests that the points are strongly correlated and lie
very close to a straight line.
Residuals would be ,
low / small
What would this suggest about the RSS? small
hence RSS would also be
Would this support the use of a linear model?
Yes because the data
,

r= 1 r=
-
1
would lie close to
- ,
very
RSS =
Syyx(1-1) a straight
.

From Chapter 1 = O
=
PMCC, examples
The number of vehicles, 𝑥 millions, and the number of accidents, 𝑦 thousands, in 15
different countries were recorded. The following summary statistics were calculated and
a scatter graph of the data is given.
σ 𝑥 = 176.9 σ 𝑦 = 679 σ 𝑥 2 = 2576.47 σ 𝑦2 = 39771 σ 𝑥𝑦 = 9915.3
a) Calculate the product moment correlation coefficient between 𝑥 and 𝑦.
b) With reference to your answer to a) and the scatter graph, comment on the
suitability of a linear regression model for these data.

If you’ve studied Stats Year 2, Regression and Correlation,


you’ll have already used your calculator on raw data to
calculate the PMCC, 𝑟, and 𝑎 and 𝑏. Here we will focus on
using the summary statistics, rather than the calculator.

Sa = Ex2-(Ex)" -
= 490 .
2293 ...

15

Syn zy2- Ly 9034 93


=
=
.

...

Ey 1907 626
.
Say Exay
=
·

-
=
907626
= 9064
.
=
0 .
...

490 2293 ...


4 93 .

(3st)
-
..

=
0 906
.

b) Our PMCC is close to 1


,
and the points
to line the
appear
to lie close a straight on

model
scatter diagram, so a linear regression
is suitable
.
Data are collected on the amount of supplement, 𝑑 grams per metre2, given to a
sample of 8 oat harvest fields and their oat-milk yield, 𝑚 litres per metre2. The data
-
𝑑 𝑚
were coded using 𝑥 = 2 − 6 and 𝑦 = 20. The following summary statistics were
obtained: - -

σ 𝑑 2 = 4592 𝑆𝑑𝑚 = 90.6 σ 𝑥 = 44 𝑆𝑦𝑦 = 0.05915


- -

a) Use the formula for 𝑆𝑦𝑦 to show that 𝑆𝑚𝑚 = 23.66


= -

b) Find the value of the product moment correlation coefficient between 𝑑 and 𝑚.
Note: 𝑟 is unaffected by linear coding of the form 𝑎𝑥 𝑖 + 𝑏 for 𝑎 > 0

Syy
=
Ey2-(Ey) 400x0 05915
.
= Sum

·
-

g
Sum = 23 66 .

=
S

0059m J
-
b) r =

T
y
O~
Sod Sum Soon
-
Syy x =
A -
6

Sdd = Ed - Ed = E(2x+ 12) et l =

@
184 25x + [12
=

d = 2x+ 12
= 4592 -

= 2x44 + 8x12
S
=
184
= 360

(3sf)
=
r = Sam = 0 .
982
- -

-
SddSmm 360 x 23 66
-

Ex 2A
Spearman’s Rank, the formulae 𝑑 is the difference between ranks
𝑛 is the number of pairs of observations

Note that Spearman’s rank is a special case of the PMCC, so the PMCC
formula could still be used – the top formula is much quicker, however!
It should only be used if there are no tied rankings.

Of course, Spearman’s rank also varies from +1 to −1, but its meaning is about the
agreement of the rankings, not about a linear relationship on a graph.
A A
If 𝑟𝑠 = 1, then the rankings are in perfect agreement. B B
C C

A C
If 𝑟𝑠 = −1, then the rankings are in exact reverse order.
B B
C A
If 𝑟𝑠 = 0, then there is no correlation between the rankings.
Example
g
During a dog show, two judges rankedumten dogs according to how cute they were. The
results are shown in the table below.
Find the Spearman’s rank correlation coefficient between the two judges and comment
on the result.

Dog 𝐴 𝐵 𝐶 𝐷 𝐸 𝐹 𝐺 𝐻
Judge 1 1 5 2 6 4 8 3 7
Judge 2 3 6 2 7 5 8 1 4

Judge 1 Judge 2 𝑑 𝑑2 6x20 0 762


1 -
=
rj
.

&
1 3 -

5 6 8(82 1)-

List
2 2

! quite
6 7
and
4 5 As r
,
is positive
8 8
close to 1
,
there is a
good
3 T
3 1

of agreement
between
7 4 sense
Ed' 20 =
the judges .
Tied rankings
Sometimes, you’ll have two pieces of data with the same value, so as a ranking
they will be tied.

To deal with this, we replace their rankings with the mean average of the rankings
they occupy:
2 3 6 7 8
-
C
Data 200 350 350 400 700 800 800 800 1200
2 5 7
9
Rank I
.

2 5-
4 5 F F
57 56 3
9
34
7
.
.

5
87 2

If there are any tied rankings, then we must use the full PMCC formula
by calculating 𝑆𝑥𝑥, etc.

The other formula gives an approximation for 𝑟𝑠 if there are any tied
rankings, and in recent exam questions, they will not accept this method
as it gives an approximation.

ranks
If the question asks for the Spearman’s rank coefficient and there are tied n
rans, you must
use the full PMCC formula (unless it states otherwise)
Example
The marks of eight pupils in French and German tests are shown in the table below.
6 σ 𝑑2
a) Use the formula 𝑟𝑠 = 1 − 𝑛(𝑛2−1)
to find an estimate for Spearman’s rank
-

correlation coefficient, showing clearly how you deal with tied ranks. Give your
answer to 2 decimal places.
b) Without recalculating the correlation coefficient, state how your answer to part a)
would change if:
i. Pupil H’s mark for German was changed to 38%.
ii. A ninth pupil was included who scored 95% in French and 89% in German.
c) The teacher collects extra data from other students in the class and finds that there
are now many tied ranks. Describe how she would now find a measure of the
correlation.

Pupil
French %
𝐴 𝐵 𝐶 𝐷 𝐸
52 25 86 33 55 55 54 46
𝐹 𝐺 𝐻
a) r
=
1 -

16695
.

8(82 1)
--------
-

German % 40 48 65 57 40 39 63 34
-

F , rank 41 8 26 56 553
: .

= 0 1726
...
, rank 3 5271
.

G 3 5 5 S 6 .

y
.

d
d'
0

0 25
.
.

5404345
16 0 16 9 20 .

2544
22
= 0 17 .

(2dp)
Ed ? = 69 5 .
(weak positive correlation)
S
estimate)

&
Rank (not an
Find Spearman's . . .

ete
[x2 = 36 [x2 = 203 5
·
Exy
etc
cy
=
36 Ey2 =
ete .

r= Say
-

-
SaaSyy
ai) it would not change as the ranking would
be the same 38 % is still the lowest .

would be 9 F and 6 so
i) the ranking for ,

the cliff . is 0 hence Ed is


.
unchanged
,

1 denominator
But as
a has increased by ,
the
correlation coefficient
increases in /1- ) hence the
would increase
. Ex 2B
& Too tied ranks means the
formula should
many
be changed to the full
-
PMCC formula .

The tied ranks should be


given the mean of
the runkings.

Ex 2B
Hypothesis Testing for zero correlation
This is identical to hypothesis testing for zero correlation from normal maths – here we just
used PMCC, but of course we could now use 𝑟𝑠 too. Regression & Correlation 4 • Hypothesis Tests •
Stats2 Ex1C • (youtube.com)

We will use the critical values from the tables in the formula booklet:

Key points:
O • 𝑟 is the correlation
O coefficient for the sample
- rho
• 𝜌 is the correlation
coefficient for the
population

• 𝐻0 is always that 𝜌 = 0

O -
• 𝐻1 is either 𝜌 > 0, 𝜌 < 0

&toilet Tile
or 𝜌 ≠ 0

-
O
halve the
Sig level.
>
-
test
Examples = 20
An
A chemist observed 20 reactions, and recorded the mass of the reactant, 𝑥 grams, and
-

the duration of the reaction, 𝑦 minutes.


O
𝑟 was calculated as 0.934 to 3 decimal places.
-

Test, at the 5% significance level, whether these results show evidence of any correlation
-
between the mass of the reactant and the duration of the reaction.
in the
- -

Ho p : = 0 2 failed test
population
,

+ 2 5% 0 025
+
=
0
.
.

H: p
value is 0 4438
critical
C
.

itis
observed sample significant of
(
v is
The value
-1 than the critical
closer to 1 or
there
934 > 0 4438, we rejectHo suggesting
As 0
.
. ,

between the mass of the reactant and


is correlation
the duration of the reaction
.
The popularity of 16 subjects at a school was found by counting the number of Year 12s
and the number of Year 13s who chose each subject and then ranking the subjects. The
value of 𝑟𝑠 was calculated as 0.685 to 3 decimal places.
-

Test, at the 1% significance level, the assertion that Year 12’s and Year 13’s choices are
positively correlated.
-

0
Ho p : =

0 01 X n = 16
Hailed test
. =

H: > 0 =
9
-

0 5824 0 685
vs
.

critical
=
value is
.

5824 we can reject ,


Ho
As 0 685 > 0
.

correlation between the


meaning there is positive
Year 13's choices.
Year 12's and

Ex 2C
Exam Questions AS 2018

-- -

123 456

=
a) E F

I
A B C D
tied ranking
- -
2 2 5 4 6 5
.

h 5
.

I

h 2 5 I 4 6 3
1- 65d
?
cannot use -

n(n
?
-
1)
r
= [h2 90 5
=

y
Eh = 21 = .

- see
El =
21212 =
9/

Ehl =
89

= = 0 . 09864..

= 0 .
899(3sf)
=

b) Ho : p = 0
n= 6 critical value is 0 .
8286
5%
H: > o x =

0 89970 8286,
hence ,
reject o
p .
.

there is positive correlation between high


jump and long jump results .
0 7293
2)
0 is
critical value
.

Ho 6
p n
= =
:

x= 5 %
H ,: 030 0 678
r =
.

0 .
678(0 7293 .

Ho
accept no
Not significant, so ,

correlation between
evidence
of pos.
highjump and
long jump
.

d) If the data comes


from two normal distributions

Chivariate normal distribution)


there correlation
are
ofthe rais
,

e) Although is
positive M
s
this is forth
non-near
length not significant
it is possible
with PMCC .
A2 2020

T -22 -
=zzz

=> - = 2 -

3 4 5 I 4
-

7 5 6 9 27 5
.

. .
I
B C D E F 6 H I
A

4 I g 2 5 79 6
M 3

91468 7
j 253
-
d 11 211111 I

d 141 11116

Ed? = 12 n
=
9

1 d = 1 -
1
0 9
vj
- .
= =

=
n(n -1) ?
q(q2 1)
-

I
b) A B CD EF 6 H

I
-
7
J 2 5391468
69 2 7 5 .

34 5 14 5
7 5
YP
. .

for Dawn must v Say


Tied rankings
=
so use

-
Sabyy
Ex = 45 Ey = 45 Ex = 285
Exy
= 176
284
Ey
=

·
=
-
0 .
82355 ...

- - 0 .
824
-

Jahil to have given


2) Mary and appear

high scores for good features


,

whereas Dawn gave


low scores
for goodfeatures
.

correlation
M and I had strong positive ,
correlation
J
and and D had strong negative
which would mean they all
.
agree
A2 2019

> 0
p

768
d I 2 I O 03

de 4 1009
a) Ho :
Ps
= 0
n= 9
r = 0 85
.

x = 5%
S
H: > 0
ps
critical value is 0 6000
.

0 6000 there is enduce


As 0 85 >
.
.

to rejectHo there is positive correlation


,

between 100m Sprint position and longjump


position
.

b) I- Ed =
0 85
.
n =9

n(n2 -1)

d
? 0 85
1-6
.

720
?
1 -
0 .
85 = Gd
728

Ed2 = 18
to 15 so the d for
the d so far sum ,

and D must to 3
B C sum

.
This means each ranking must differ by 1

6 7 and 8
,
As we are assigning positions ,

1 D must be 8
.
and
they can
only differ by ,

6
.
So B must be 7, and I must be

2) Ed " won't change, but


a decreases

rj = 1 - Ed' >
- denominator decreases

n(n2 -1)
and so r
,
also decreases
-

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy