STB1003 Unit-3A
STB1003 Unit-3A
BA/BSc I Semester
by
2 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Correlation
Correlation measures how variables are related. The correlation coefficient is used to
quantify the strength and the direction of the linear relationship between two variables and
usually denoted by . The linear correlation coefficient is sometimes referred to as
the Pearson product moment correlation coefficient in honour of its developer Professor
Karl Pearson.
Pearson’s correlation coefficient, used when
Data is quantitative.
Distribution of variables is normal.
Linear relationship between two variables.
All observations should be independent.
The mathematical formula for computing Karl Pearson coefficient of correlation between two
variables X and Y is:
Where and
Alternate formula:
3 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Scatter Diagram or Scatter Plot
Remarks:
1. The value of r is such that . The + and – signs are used for positive linear
correlations and negative linear correlations, respectively.
2. Positive Correlation: If two variables X and Y deviate in same direction, that is the
increase/decrease in one variable results in corresponding increase/decrease in other, the
correlation is said to be positive. For example height and weight of group of persons.
3. Negative Correlation: If two variables X and Y deviate in opposite direction, that is the
increase/decrease in one variable results in corresponding decrease/increase in other, the
correlation is said to be negative. For example pressure and volume of gas.
4. No Correlation: The two variables X and Y is said to be uncorrelated if and
increase/decrease in one results in no change in other. For example weight and IQ.
5. Perfect Correlation: The correlation is said to be perfect if the deviation in one variable
followed by the corresponding proportional deviation in other. The value r 1 indicates
perfect positive correlation whereas r 1 means perfect negative correlation.
6. The correlation coefficient r is a dimensionless quantity; that means it does not depend on
the units employed.
7. Note that
If r = 0 this means no association or correlation between the two variables.
If 0 r 0.25 / 0.25 r 0 : weak positive/negative correlation.
If 0.25 r 0.75 / 0.75 r 0.25 : intermediate positive/negative correlation.
If 0.75 r 1/1 r 0.75 : strong positive/negative correlation.
If r 1/ r 1 : perfect positive/negative correlation.
Properties:
1. Correlation coefficient lies between . That is
.
Proof.
4 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Let and , then
Therefore
Proof. Let
This implies
5 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Similarly
Therefore
3. If two variables are independent, then correlation between then is zero but converse
is not true.
Proof. If X and Y are independent, then
Thus
,
which implies
Now, let there be two variables and and correlation between and is zero, then
,
which implies
But here we may observe the X and Y are related by the relation .
6 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Example 3.1: Weight and Systolic Blood Pressure of 10 patient are given below. Find
product moment correlation coefficient between Wt. and SBP.
Solution:
Here
and
Therefore,
Example 3.2: A sample of 6 children was selected, data about their age in weeks and weight
in kilograms was recorded as shown in the following table . It is required to find the
correlation between age and weight.
7 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Solution:
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Therefore,
Example 3.3: In the following table age of the students and their yoga habits are given.
Calculate correlation between age of the students and their yoga habits.
Age 15 16 17 18 19 20
No. of Students 250 200 150 120 100 80
Students do Yoga 200 150 90 48 30 12
Solution:
% of
Age (X) Students X X U Y Y V U2 V2 UV
15 80 -2.5 30 6.25 900 -75
16 75 -1.5 25 2.25 625 -37.5
17 60 -0.5 10 0.25 100 -5
18 40 0.5 -10 0.25 100 -5
19 30 1.5 -20 2.25 400 -30
20 15 2.5 -35 6.25 1225 -87.5
X 105 Y 300 U 2 17.5 V 2 3350 UV 240
8 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
1 105
X
n
X
6
17.5
1 300
Y Y 50
n 6
UV 240
r 0.992
(U 2 )(V 2 ) 17.5 3350
The above analysis shows that there is strong negative correlation between age of the students
and their Yoga habits.
Practice Exercises:
1. Calculate Karl Pearson coefficient of correlation between and for the following data:
1 2 3 4 5 7 8 10
2 6 8 10 14 16 18 20
2. Anxiety Level and Test Score of 6 High School students are recorded as below:
Anxiety Level 10 8 2 1 5 6
Test Score 2 3 9 7 6 5
Arithmetic Means 25 18
Sum of Square of Deviations from 136 138
Means
5. A student performed calculations on the values of X and Y variables and following results
were obtained
N 30, X 120, X 2 600
Y 90, Y 2 250, XY 356
Later on verification it was found that value of two combinations (6,8) and (10,5) were
wrong and correct values were (6, 10) and (8,5). Calculate the correct correlation
coefficient.
9 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Correlation in two way classified data
Sometimes number of observations is so large and data is often classified into two-way
frequency distribution
where
and
Marks in Mathematics
Marks in Statistics
5-15 15-25 25-35 35-45
0 – 10 1 1 - -
10 – 20 3 6 5 1
20 – 30 1 8 9 2
30 – 40 - 3 9 3
40 – 50 - - 4 4
Marks in Mathematics
C.I. 5-15 15-25 25-35 35-45
M1 10 20 30 40
Marks in Statistics dx -1 0 1 2 Total
C.I. M2 dy f fdy fdy2 fdxdy
0 – 10 5 -2 1 1 - - 2 -4 8 2
10 – 20 15 -1 3 6 5 1 15 -15 15 -4
20 – 30 25 0 1 8 9 2 20 0 0 0
30 – 40 35 1 - 3 9 3 15 15 15 15
40 – 50 45 2 - - 4 4 8 16 32 24
f 5 18 27 10 60 12 70 37
fdx -5 0 27 20 42
Total
fdx2 5 0 27 40 72
fdxdy 5 0 12 20 37
10 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Therefore,
Example 3.4: Five applicants for a job are rated by two experts, with the following results
Applicant A B C D E
Expert 1 4 1 3 2 5
Expert 2 3 2 5 1 4
Solution:
A 4 3 1 1
B 1 2 -1 1
C 3 5 -2 4
D 2 1 1 1
E 5 4 1 1
Note that
, and , thus
11 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Example 3.5: Marks of ten students in Mathematics and Statistics are given below.
Calculate Spearman coefficient of rank correlation.
Marks
Mathematics 56 75 45 71 62 64 58 80 76 61
Statistics 66 70 40 60 65 68 59 77 67 63
Solution:
Mathematics Statistics
Marks Rank Marks Rank
56 9 66 5 4 16
75 3 70 2 1 1
45 10 40 10 0 0
71 4 60 8 -4 16
62 6 65 6 0 0
64 5 68 3 2 4
58 8 59 9 -1 1
80 1 77 1 0 0
76 2 67 4 -2 4
61 7 63 7 0 0
= 42
12 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Example 3.6: Suppose in the Example 3.5, marks of some students in Mathematics and
some students in Statistics are same which are listed as below:
Marks
Mathematics 56 75 62 71 62 62 58 80 76 61
Statistics 66 70 40 60 66 68 59 77 67 63
Mathematics Statistics
Marks Rank Marks Rank
56 10 66 5.5 4.5 20.25
75 3 70 2 1 1
62 6 40 10 -4 16
71 4 60 8 -4 16
62 6 66 5.5 0.5 0.25
62 6 68 3 3 9
58 9 59 9 0 0
80 1 77 1 0 0
76 2 67 4 -2 4
61 8 63 7 -1 1
= 67.5
Here , thus
Therefore,
13 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Partial Correlation
Partial correlation analysis involves studying the linear relationship between two variables
after excluding the effect of one or more independent factors. For example, suppose that
there are three variables Sale , Price and Production and one wants to find the
linear relationship between Sale and Price after controlling the Production .
Coefficient of Partial Correlation
Partial correlation coefficient between and after controlling the effect of and usually
denoted by is given as
Similarly,
and
Multiple Correlation
Sometimes we find interrelationship between many variables and value of one variable may
be influenced by many others. For example the yield of crop per hectare, say depends
upon quality of seed , fertility of soil , fertilizer used , irrigation facilities ,
weather condition and so on. When someone is interested in studying joint effect of
variables upon a variable not included in that group, our study is that of multiple correlation
and multiple linear regression analysis.
Coefficient of Multiple Correlations
Consider a trivariate distribution in which each of the variable , and has
observations.
The multiple correlation coefficient of on and , usually denoted by is given as
This is also called simple correlation between and joint effect of and on .
14 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Properties of Multiple Correlation
i)
ii) If , then association is perfect, and the regression residuals are zero.
iii) If , then all total and partial correlation involving are zero. So, is
totally uncorrelated with all other variables.
iv) is not less than any total correlation coefficient
Example 3.7: Express multiple correlation in terms of total and partial correlations or show
that
Solution: We have
Also,
which implies
(ii) We have
15 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Then
Thus,
Then
(iv) If , then
Now implies
Thus if , then
That means is uncorrelated with and .
Example 3.9: From the data relating to the yield of dry bark , height and girth
for 18 cinchona plants, the following correlation coefficients were obtained
and
Find the partial correlation coefficient and multiple correlation coefficient .
16 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Example 3.10: A study was conducted to know role of hours spent on revision and anxiety
level on exam score. The following data were recorded
Student No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Score 62 58 52 55 75 82 38 55 48 68 62 62 72 58
Hours 40 31 35 26 51 48 25 37 30 44 32 40 61 35
Anxiety 40 65 34 91 46 52 48 61 34 74 54 61 26 13
Compute
i) The correlation between exam score and hours of revision after controlling the effect of
anxiety level.
ii) The correlation between exam scores and anxiety level after controlling the effect of
hours of revision.
iii) The joint effect of hours of revision and anxiety level on exam score.
Solution:
Score Hours Anxiety
( X1) (X2) ( X3 ) X12 X 22 X 32 X1 X 2 X 2 X3 X1 X 3
n X1 X 3 X1 X 3
Corr ( X1 , X 3 ) r13
n X 2
1 X1
2
n X 2
3 X3
2
14 42336 847 699
0.0156
14 52919 (847) 14 40221 (699)
2 2
17 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
n X 2 X 3 X 2 X 3
Corr ( X 2 , X 3 ) r23
n X 2
2 X 2
2
n X 2
3 X3
2
14 25955 535 699
0.2853
14 21767 (535)2 14 40221 (699)2
(i) The partial correlation between exam score and hours of revision after controlling the
effect of anxiety level is
r12 r13r23 0.8226 0.0156 (0.2853)
r12.3 0.863
2 2 2 2
(1 r13 )(1 r23 ) (1 (0.0156) )(1 (0.2853) )
(ii) The partial correlation between exam score and anxiety level after controlling the effect
of hours of revision
r13 r12 r23 0.0156 0.8226 (0.2853)
r13.2 0.459
2 2
(1 r12 )(1 r23 ) (1 (0.8226)2 )(1 (0.2853)2 )
iii) The joint effect of hours of revision and anxiety level on exam score.
2 r 2 r 2 2r12 r13r23
R1.23 12 13
2
1 r23
0.6842
0.7448
0.9186
18 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh