0% found this document useful (0 votes)
22 views18 pages

STB1003 Unit-3A

Statistics notes of curve fitting

Uploaded by

shoreznazeer2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views18 pages

STB1003 Unit-3A

Statistics notes of curve fitting

Uploaded by

shoreznazeer2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Depar tment of Statistics & O.R.

Aligarh Muslim University Aligarh

BA/BSc I Semester

Introduction to Statistics (STBMN 1003)

by

Dr. Haseeb Athar


Unit - 3
Correlation
 Simple Correlation
 Partial Correlation
 Multiple Correlation

2 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Correlation

Correlation measures how variables are related. The correlation coefficient is used to
quantify the strength and the direction of the linear relationship between two variables and
usually denoted by . The linear correlation coefficient is sometimes referred to as
the Pearson product moment correlation coefficient in honour of its developer Professor
Karl Pearson.
Pearson’s correlation coefficient, used when
 Data is quantitative.
 Distribution of variables is normal.
 Linear relationship between two variables.
 All observations should be independent.
The mathematical formula for computing Karl Pearson coefficient of correlation between two
variables X and Y is:

Where and

n is the number of pairs of data

Alternate formula:

3 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Scatter Diagram or Scatter Plot

Remarks:
1. The value of r is such that . The + and – signs are used for positive linear
correlations and negative linear correlations, respectively.
2. Positive Correlation: If two variables X and Y deviate in same direction, that is the
increase/decrease in one variable results in corresponding increase/decrease in other, the
correlation is said to be positive. For example height and weight of group of persons.
3. Negative Correlation: If two variables X and Y deviate in opposite direction, that is the
increase/decrease in one variable results in corresponding decrease/increase in other, the
correlation is said to be negative. For example pressure and volume of gas.
4. No Correlation: The two variables X and Y is said to be uncorrelated if and
increase/decrease in one results in no change in other. For example weight and IQ.
5. Perfect Correlation: The correlation is said to be perfect if the deviation in one variable
followed by the corresponding proportional deviation in other. The value r  1 indicates
perfect positive correlation whereas r  1 means perfect negative correlation.
6. The correlation coefficient r is a dimensionless quantity; that means it does not depend on
the units employed.
7. Note that
 If r = 0 this means no association or correlation between the two variables.
 If 0  r  0.25 / 0.25  r  0 : weak positive/negative correlation.
 If 0.25  r  0.75 / 0.75  r  0.25 : intermediate positive/negative correlation.
 If 0.75  r  1/1  r  0.75 : strong positive/negative correlation.
 If r  1/ r  1 : perfect positive/negative correlation.
Properties:
1. Correlation coefficient lies between . That is
.
Proof.

4 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Let and , then

We have Schwartz inequality

Therefore

2. Correlation coefficient is independent of origin and scale.

Proof. Let

where are constants.


Then

This implies

5 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Similarly

Therefore

3. If two variables are independent, then correlation between then is zero but converse
is not true.
Proof. If X and Y are independent, then

Thus

,
which implies

Now, let there be two variables and and correlation between and is zero, then

, which is not true.


Hence X and Y are not independent.
The above can also be illustrated with the help of following example, let
X: -3 -2 -1 1 2 3
Y: 9 4 1 1 4 9
XY: -27 -8 -1 1 8 27
Here

,
which implies

But here we may observe the X and Y are related by the relation .

6 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Example 3.1: Weight and Systolic Blood Pressure of 10 patient are given below. Find
product moment correlation coefficient between Wt. and SBP.

Wt. (kg) 67 69 85 83 74 81 97 92 114 85


SBP (mmHg) 120 125 140 160 130 180 150 140 200 130

Solution:

67 120 -17.70 313.29 -27.5 756.25 486.75


69 125 -15.70 246.49 -22.50 506.25 353.25
85 140 0.30 0.09 -7.50 56.25 -2.25
83 160 -1.70 2.89 12.50 156.25 -21.25
74 130 -10.70 114.49 -17.50 306.25 187.25
81 180 -3.70 13.69 32.50 1056.25 -120.25
97 150 12.30 151.29 2.50 6.25 30.75
92 140 7.30 53.29 -7.50 56.25 -54.75
114 200 29.30 858.49 52.50 2756.25 1538.25
85 130 0.30 0.09 -17.50 306.25 -5.25
= =
847 1475

Here
and
Therefore,

Example 3.2: A sample of 6 children was selected, data about their age in weeks and weight
in kilograms was recorded as shown in the following table . It is required to find the
correlation between age and weight.

S. No. Age Weight


(years) (Kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13

7 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Solution:

S. No. Age Weight


(years) (Kg)

1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169

Therefore,

That means strong positive correlation between Age and Weight.

Example 3.3: In the following table age of the students and their yoga habits are given.
Calculate correlation between age of the students and their yoga habits.

Age 15 16 17 18 19 20
No. of Students 250 200 150 120 100 80
Students do Yoga 200 150 90 48 30 12

Solution:
% of
Age (X) Students X  X U Y Y V U2 V2 UV
15 80 -2.5 30 6.25 900 -75
16 75 -1.5 25 2.25 625 -37.5
17 60 -0.5 10 0.25 100 -5
18 40 0.5 -10 0.25 100 -5
19 30 1.5 -20 2.25 400 -30
20 15 2.5 -35 6.25 1225 -87.5
 X  105  Y  300 U 2  17.5 V 2  3350 UV  240

8 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
1 105
X
n
 X
6
 17.5

1 300
Y  Y   50
n 6
UV 240
r   0.992
(U 2 )(V 2 ) 17.5  3350

The above analysis shows that there is strong negative correlation between age of the students
and their Yoga habits.
Practice Exercises:
1. Calculate Karl Pearson coefficient of correlation between and for the following data:
1 2 3 4 5 7 8 10
2 6 8 10 14 16 18 20

2. Anxiety Level and Test Score of 6 High School students are recorded as below:

Anxiety Level 10 8 2 1 5 6
Test Score 2 3 9 7 6 5

Find correlation between Anxiety level and Test score.


3. Consider the following data
1 2 3 4 5 7 8 10
2 6 8 10 14 16 18 20
i) Calculate Karl Pearson coefficient of correlation between and .
ii) If 4 is subtracted from each value of X and each value of Y is divided by 2, then what
will be correlation between new series of and .
4. From the following data compute the coefficient of correlation between and for 15 set
of observations:

Arithmetic Means 25 18
Sum of Square of Deviations from 136 138
Means

5. A student performed calculations on the values of X and Y variables and following results
were obtained
N  30,  X  120,  X 2  600
Y  90, Y 2  250,  XY  356
Later on verification it was found that value of two combinations (6,8) and (10,5) were
wrong and correct values were (6, 10) and (8,5). Calculate the correct correlation
coefficient.

9 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Correlation in two way classified data
Sometimes number of observations is so large and data is often classified into two-way
frequency distribution

where

and

Marks in Mathematics
Marks in Statistics
5-15 15-25 25-35 35-45
0 – 10 1 1 - -
10 – 20 3 6 5 1
20 – 30 1 8 9 2
30 – 40 - 3 9 3
40 – 50 - - 4 4

Marks in Mathematics
C.I. 5-15 15-25 25-35 35-45
M1 10 20 30 40
Marks in Statistics dx -1 0 1 2 Total
C.I. M2 dy f fdy fdy2 fdxdy
0 – 10 5 -2 1 1 - - 2 -4 8 2
10 – 20 15 -1 3 6 5 1 15 -15 15 -4
20 – 30 25 0 1 8 9 2 20 0 0 0
30 – 40 35 1 - 3 9 3 15 15 15 15
40 – 50 45 2 - - 4 4 8 16 32 24
f 5 18 27 10 60 12 70 37
fdx -5 0 27 20 42
Total
fdx2 5 0 27 40 72
fdxdy 5 0 12 20 37

10 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Therefore,

Spearman's rank correlation coefficient


Spearman's rank correlation coefficient, named after Charles Spearman, is a non-parametric
measure of correlation. Unlike the Pearson product-moment correlation coefficient, it does
not require the assumption that the relationship between the variables is linear, nor does it
require the variables to be measured on interval scales; it can be used for variables measured
at the ordinal level.

Steps to calculate Spearman’s rank correlation coefficient


 Assign ranks to the value of each variable. Ranking can be descending in
order or ascending in order. However, both data sets should use the same ordering.
 For each pair of values , calculate the difference of ranks, that is
.

 We calculate Spearman's Rank Order Correlation Coefficient as follows:

where number of observations in each series.

Example 3.4: Five applicants for a job are rated by two experts, with the following results
Applicant A B C D E
Expert 1 4 1 3 2 5
Expert 2 3 2 5 1 4

Calculate the coefficient of rank correlation between opinions of two experts.

Solution:

Applicants Expert 1 Expert 2

A 4 3 1 1
B 1 2 -1 1
C 3 5 -2 4
D 2 1 1 1
E 5 4 1 1

Note that
, and , thus

11 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Example 3.5: Marks of ten students in Mathematics and Statistics are given below.
Calculate Spearman coefficient of rank correlation.

Marks
Mathematics 56 75 45 71 62 64 58 80 76 61
Statistics 66 70 40 60 65 68 59 77 67 63

Solution:
Mathematics Statistics
Marks Rank Marks Rank
56 9 66 5 4 16
75 3 70 2 1 1
45 10 40 10 0 0
71 4 60 8 -4 16
62 6 65 6 0 0
64 5 68 3 2 4
58 8 59 9 -1 1
80 1 77 1 0 0
76 2 67 4 -2 4
61 7 63 7 0 0
= 42

Tied Ranks or Repeated Ranks:


Sometimes, when ranking data, there are two or more numbers that are the same. When this
happens in, we take the mean or average of the ranks that are the same. These are
called tied ranks. To do this, we rank the tied numbers as if they were not tied. Then, we add
up all the ranks that they would have, and divide it by how many there are.
Since the Spearman’s formula is based on the assumption of different ranks of different
individuals, in case of tied ranks its correction is necessary.
The correction factor is given by

where number of observations tied to a particular rank


This cf is added to once for every tie.

12 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Example 3.6: Suppose in the Example 3.5, marks of some students in Mathematics and
some students in Statistics are same which are listed as below:

Marks
Mathematics 56 75 62 71 62 62 58 80 76 61
Statistics 66 70 40 60 66 68 59 77 67 63

Then coefficient of rank correlation is computed as:

Mathematics Statistics
Marks Rank Marks Rank
56 10 66 5.5 4.5 20.25
75 3 70 2 1 1
62 6 40 10 -4 16
71 4 60 8 -4 16
62 6 66 5.5 0.5 0.25
62 6 68 3 3 9
58 9 59 9 0 0
80 1 77 1 0 0
76 2 67 4 -2 4
61 8 63 7 -1 1
= 67.5

Here , thus

Therefore,

13 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Partial Correlation
Partial correlation analysis involves studying the linear relationship between two variables
after excluding the effect of one or more independent factors. For example, suppose that
there are three variables Sale , Price and Production and one wants to find the
linear relationship between Sale and Price after controlling the Production .
Coefficient of Partial Correlation
Partial correlation coefficient between and after controlling the effect of and usually
denoted by is given as

Similarly,

and

Multiple Correlation
Sometimes we find interrelationship between many variables and value of one variable may
be influenced by many others. For example the yield of crop per hectare, say depends
upon quality of seed , fertility of soil , fertilizer used , irrigation facilities ,
weather condition and so on. When someone is interested in studying joint effect of
variables upon a variable not included in that group, our study is that of multiple correlation
and multiple linear regression analysis.
Coefficient of Multiple Correlations
Consider a trivariate distribution in which each of the variable , and has
observations.
The multiple correlation coefficient of on and , usually denoted by is given as

This is also called simple correlation between and joint effect of and on .

Similarly, the multiple correlation coefficient of on and is given as

and the multiple correlation coefficient of on and is

where is the correlation between and .

14 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Properties of Multiple Correlation
i)
ii) If , then association is perfect, and the regression residuals are zero.
iii) If , then all total and partial correlation involving are zero. So, is
totally uncorrelated with all other variables.
iv) is not less than any total correlation coefficient

Example 3.7: Express multiple correlation in terms of total and partial correlations or show
that

Solution: We have

Also,

Hence the result.


Example 3.8: If , then show that
i)
ii)
iii) , provided all the coefficients of zero order are equal to .
iv) If , is uncorrelated with any other variable, i.e.
Solution:
(i) Since , then

which implies

(ii) We have

15 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Then

(iii) We have given that

Thus,

Then

(iv) If , then

Since and and (*) will hold iff and .

Now implies

Thus if , then
That means is uncorrelated with and .
Example 3.9: From the data relating to the yield of dry bark , height and girth
for 18 cinchona plants, the following correlation coefficients were obtained
and
Find the partial correlation coefficient and multiple correlation coefficient .

16 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
Example 3.10: A study was conducted to know role of hours spent on revision and anxiety
level on exam score. The following data were recorded
Student No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Score 62 58 52 55 75 82 38 55 48 68 62 62 72 58
Hours 40 31 35 26 51 48 25 37 30 44 32 40 61 35
Anxiety 40 65 34 91 46 52 48 61 34 74 54 61 26 13

Compute
i) The correlation between exam score and hours of revision after controlling the effect of
anxiety level.
ii) The correlation between exam scores and anxiety level after controlling the effect of
hours of revision.
iii) The joint effect of hours of revision and anxiety level on exam score.

Solution:
Score Hours Anxiety
( X1) (X2) ( X3 ) X12 X 22 X 32 X1 X 2 X 2 X3 X1 X 3

62 40 40 3844 1600 1600 2480 1600 2480


58 31 65 3364 961 4225 1798 2015 3770
52 35 34 2704 1225 1156 1820 1190 1768
55 26 91 3025 676 8281 1430 2366 5005
75 51 46 5625 2601 2116 3825 2346 3450
82 48 52 6724 2304 2704 3936 2496 4264
38 25 48 1444 625 2304 950 1200 1824
55 37 61 3025 1369 3721 2035 2257 3355
48 30 34 2304 900 1156 1440 1020 1632
68 44 74 4624 1936 5476 2992 3256 5032
62 32 54 3844 1024 2916 1984 1728 3348
62 40 61 3844 1600 3721 2480 2440 3782
72 61 26 5184 3721 676 4392 1586 1872
58 35 13 3364 1225 169 2030 455 754
847 535 699 52919 21767 40221 33592 25955 42336

First we shall compute the following simple correlations


n X1 X 2    X1   X 2 
Corr ( X1 , X 2 )  r12 
n X 2
1    X1 
2
 n  X 2
2   X 2 
2

14  33592  847  535
  0.8226
14  52919  (847) 14  21767  (535) 
2 2

n X1 X 3    X1   X 3 
Corr ( X1 , X 3 )  r13 
n  X 2
1    X1 
2
n X 2
3   X3 
2

14  42336  847  699
  0.0156
14  52919  (847) 14  40221  (699) 
2 2

17 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh
n X 2 X 3    X 2   X 3 
Corr ( X 2 , X 3 )  r23 
n X 2
2   X 2 
2
n X 2
3   X3 
2

14  25955  535  699
  0.2853
 
14  21767  (535)2 14  40221  (699)2 
(i) The partial correlation between exam score and hours of revision after controlling the
effect of anxiety level is
r12  r13r23 0.8226  0.0156  (0.2853)
r12.3    0.863
2 2 2 2
(1  r13 )(1  r23 ) (1  (0.0156) )(1  (0.2853) )

(ii) The partial correlation between exam score and anxiety level after controlling the effect
of hours of revision
r13  r12 r23 0.0156  0.8226  (0.2853)
r13.2    0.459
2 2
(1  r12 )(1  r23 ) (1  (0.8226)2 )(1  (0.2853)2 )

iii) The joint effect of hours of revision and anxiety level on exam score.

2 r 2  r 2  2r12 r13r23
R1.23  12 13
2
1  r23

(0.8226)2  (0.0156)2  2  0.8226  0.0156  (0.2853)



1  (0.2853) 2

0.6842
  0.7448
0.9186

18 Lecture notes by Haseeb Athar, Department of Statistics & O.R., A.M.U., Aligarh

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy