0% found this document useful (0 votes)
31 views

9correlation and Regression

The document discusses correlation and regression. Correlation measures the relationship between two variables, while regression allows estimating one variable based on another. Methods for measuring correlation include scatter plots, Pearson's correlation coefficient, and Spearman's rank correlation coefficient. Examples are also provided to demonstrate calculating and interpreting these statistics.

Uploaded by

yadavaryan2004cc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

9correlation and Regression

The document discusses correlation and regression. Correlation measures the relationship between two variables, while regression allows estimating one variable based on another. Methods for measuring correlation include scatter plots, Pearson's correlation coefficient, and Spearman's rank correlation coefficient. Examples are also provided to demonstrate calculating and interpreting these statistics.

Uploaded by

yadavaryan2004cc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Correlation and regression:

Correlation:
Sometime two continuous characters or
variables are measured in the same person
such as height and weight, temperature and
pulse rate, age and weight etc. The relationship
or association between two continuous
(quantitative) variable is called correlation.
Correlation coefficient:
The measure of degree of association between
variables is known as correlation coefficient. It is
generally denoted by ‘r’.

NOTE: Correlation coefficient lies between -1 to


+1 i.e.( -1 ≤ r ≤ +1)
Types of correlation:
1.Perfect positive correlation:
In this correlation, the two variables X and Y are
directly proportional and fully correlated with
each other (i.e. r= +1).
e.g. it is very rare in nature but some examples are
height and weight, age and height up to certain
age.
2.Perfect Negative correlation:
Two variables are inversely proportional to each
other i.e. (r = -1).
It is also very rare in nature but some examples
are pressure and volume, temperature and lipid
content if the body.
3. Moderately positive correlation:
In this case, 0 < r < 1.
e.g. nutrition and death rate in
pregnancy, fertility rate and over
crowding etc.
4. Moderately negative correlation:
In this case, -1 < r < 0.
e.g. economic condition and cases of TB, income
and mortality rate etc.

5. Absolutely no correlation:
In this case, r = 0 indicating no linear relation exist
between two variables.
e.g. body weight and I.Q
Methods to study correlation:
1.Scatter diagram
2. Karl Pearson's correlation
coefficient or co-variance method.
3.Rank method
1.Scatter diagram:
• Graphical method.
• In this method, one of two variables say
X, is taken along the horizontal axis and
other say Y is taken along vertical axis.
• It gives the direction of correlation but
fails to give the degree of correlation.
2. Karl Pearson's correlation coefficient:
•It is the mathematical method to
measure the degree of relationship
between two continuous variables(
quantitative variables).
•It is denoted by r.
Let X and Y be the two variables, then,
𝑪𝒐𝒗 (𝑿,𝒀)
r=
𝑽𝒂𝒓 𝑿 𝑽𝒂𝒓 𝒀
Where, Cov (X,Y) = covariance between X and Y.
𝟏
𝑿 −𝑿 ( 𝒀 − 𝒀)
𝒏
= 𝟐
𝟏 𝟏
(𝑿 −𝑿)𝟐 . ( 𝒀 − 𝒀)
𝒏 𝒏

(𝑿 − 𝑿) (𝒀 − 𝒀)
=
𝟐 𝟐
( 𝑿 − 𝑿) . ( 𝒀 − 𝒀)

𝒏 𝑿𝒀 − 𝑿 𝒀
Also, r =
𝟐
𝒏 𝑿𝟐 −( 𝑿) 𝒏 𝒀𝟐 −( 𝒀)𝟐
Coefficient of determination( 𝟐
𝒓 ):
It is square of coefficient of correlation .It has
always lies between 0 to 1. It explains the
amount of change in dependent variable (Y)
due to change of independent variable (X).
𝟐
For e.g. if r = 0.8 then 𝒓 = 0.64, it means that
64% amount in change in Y variable can be
explained by the change in X variable.
Test of significance of r.
Test statistics is,
𝒓−𝝆
t=
𝑺𝑬(𝒓)
𝟏 − 𝒓𝟐
Where, SE = the standard error of r =
𝒏 −𝟐

Where, r = correlation coefficient


𝝆 = 𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝒄𝒐𝒓𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏 coefficient
n – 2 = denotes degree of freedom of t.
Q. The following table shows the age (year) and blood
pressure (mm/hg) of the 12 subjects.

Age 30 50 40 35 55 60 43 58 65 70

BP 122 150 122 120 140 142 150 130 145 160

Find the correlation coefficient between two variables and


interpret the result. Find also coefficient of determination .
Test the significance of population correlation.
Calculation of correlation coefficient :
SN Age(x) BP(y) xy x² y²
1 30 122 3660 900 14884
2 50 150 7500 2500 22500
3 40 122 4880 1600 14884
4 35 120 4200 1225 14400
5 55 140 7700 3025 19600
6 60 142 8520 3600 20164
7 43 150 6450 1849 22500
8 58 130 7540 3364 16900
9 65 145 9425 4225 21025
10 70 160 11200 4900 25600
Total 506 1381 71075 27188 192457
Now the correlation coefficient(r)=

= 0.72

It means that there is high degree positive relationship between age


and BP.

Coefficient of determination (r²) = (0.72)² =0.52 =52%

It means that the 52% change in BP occurs due to the change of age.
Test of Significance of r :
H₀:Þ=0, there is no relationship between age and BP vs.
H₁ :Þ≠0, there is relationship between age and BP.
Test statistic:
t= ,Where SE(r) = the standard error =

Here SE(r) = = 0.245 Now, t = = 2.936

Here, tabulated value of t at 5% level of significance at 8 degree of


freedom = 2.306. So, calculated value of t (2.936) is greater than
tabulated value of t (2.03), we reject the null hypothesis.
Conclusion: There is relationship between age and BP.
3. Spearman’s Rank Correlation Coefficient (rank correlation):
Karl Pearson's correlation coefficient is specially useful when data
are quantitative in nature. So for qualitative data rank correlation
coefficient is used. The degree of relationship between two
variables with respect to their respective rank is known as “ rank
correlation coefficient.” It also lies between -1 to +1 .
𝟔 𝒅𝟐
Spearman’s rank correlation coefficient (k) = 1 -
𝒏(𝒏𝟐 −𝟏)

Where, d = difference between two ranks


n = numbers of pairs of observation
Methods of calculating rank correlation coefficient:
1.When ranks are given
2.When ranks are not given
3.When ranks are tied
When actual ranks are given:
a.Find the difference of ranks (𝑹𝟏 − 𝑹𝟐 ) and denote it by
d.
𝟐 𝟐
b.Find 𝒅 𝒂𝒏𝒅 𝒐𝒃𝒕𝒂𝒊𝒏 𝒅
𝟔 𝒅𝟐
c.Use the formula (K) = 1 - 𝟐
𝒏(𝒏 −𝟏)

Note: 𝒅 should always be zero. If 𝒅 is not equal to zero, then

there is something wrong either in ranking or in finding d.


Q. A firm wanted to employ some accountants. Six candidate
appeared for an aptitude test and were assigned for the following
ranks by 2 examines:

candidates A B C D E F

Rank by X 1 3 2 5 4 6

Rank by Y 2 1 3 6 4 5
Candidates Rank by X(𝑹𝟏 ) Rank by Y(𝑹𝟐 ) d = 𝑹𝟏 − 𝑹𝟐 𝒅𝟐

A 1 2 -1 1
B 3 1 2 4
C 2 3 -1 1
D 5 6 -1 1
E 4 4 0 0
F 6 5 1 1
n=7 𝒅=0 𝒅𝟐 = 8
𝟔 𝒅𝟐 𝟔×𝟖 𝟒𝟖
R=1- = 1- =1- = 0.77
𝒏(𝒏𝟐 −𝟏) 𝟔× 𝟑𝟔 −𝟏 𝟔×𝟑𝟓
When ranks are not given:
This method can be used for quantitative data when
ranks are not given. In such cases the actual data are
converted into ranks by assuming the ranks either in
ascending order or in descending order. We can give 1
for highest or lowest value and then we give 2 for the
second highest or second lowest value.similarly we give
rank to all data.
𝟔 𝒅𝟐
R=1-
𝒏(𝒏𝟐 −𝟏)
Q. The marks obtained by 9 students in Bio Statistics and Research
Methodology are as follows:
Marks in Bio 35 23 47 17 10 43 9 6 28
Statistics (x)
Marks in Research 30 33 45 23 8 49 12 4 3
Methodology (Y)

Compute their ranks in two subjects and the correlation coefficient of


ranks.
Solution: we assign the rank 1 to the highest marks in both series.
Calculation of rank correlation coefficient
X Y 𝑹𝟏 = Rank of X 𝑹𝟐 = Rank of Y d = 𝑹𝟏 − 𝑹𝟐 𝒅𝟐
35 30 3 5 -2 4
23 33 5 3 2 4
47 45 1 2 -1 1
17 23 6 6 0 0
10 8 7 8 -1 1
43 49 2 1 1 1
9 12 8 7 1 1
6 4 9 9 0 0
28 31 4 4 0 0
Total 𝒅=0 𝒅𝟐 = 𝟏2
Here, n = 9 , 𝒅𝟐 = 12
𝟔 𝒅𝟐
We know, R=1-
𝒏(𝒏𝟐 −𝟏)
𝟔×𝟏𝟐 𝟕𝟐 𝟕𝟐 𝟗
=1- =1- =1- = = 𝟎. 𝟗
𝟗(𝟖𝟏 −𝟏) 𝟗×𝟖𝟎 𝟕𝟐𝟎 𝟏𝟎

Which indicates high positive correlation between


Biostatistics and Research Methodology.
Regression analysis:
Correlation gives the degree of association
between two variables. But regression is the
estimation or prediction of one variables on the
basis of other variables.
e.g. estimation of height when weight is known.
This is possible when two variables are linearly
correlated. Here height to be estimated is
dependent variable and age is known as
independent variable.
• Equation of regression line of Y on X is given
by,
Y = a + bX
Where, a is Y-intercept or constant and b is
slope.
• Equation of regression of X on Y is given by,
X=a+bY
Where b is Y- intercept and a is slope.
Similarly,
• Regression line of Y on X is given by,
(Y - 𝒀) = 𝒃𝒚𝒙 ( 𝑿 − 𝑿 )
• Regression line of X on Y is given by,

( 𝑿 − 𝑿 ) = 𝒃𝒙𝒚 ( Y - 𝒀 )
Where, 𝑿 and 𝒀 are the arithmetic mean of X and Y series.

𝒃𝒙𝒚 = regression coefficient of x on y.


𝒃𝒚𝒙 = regression coefficient of y on x.
Regression coefficients

Regression coefficients of Y on X is denoted by 𝒃𝒚𝒙


and
regression coefficient of X on y is denoted by 𝒃𝒙𝒚 .
These are found by either of the following formula:

1)If correlation coefficient is already calculated


regression coefficient is derived as:
𝝈𝒚 𝝈𝒙 𝑺𝑫 𝒐𝒇 𝑿 𝒔𝒆𝒓𝒊𝒆𝒔
𝒃𝒚𝒙 = r and 𝒃𝒙𝒚 = r = r x
𝝈𝒙 𝝈𝒚 𝑺𝑫 𝒐𝒇 𝒀 𝒔𝒆𝒓𝒊𝒆𝒔
2)If means are already calculated, the regression coefficients
are derived by the least square method.
(𝑿 − 𝑿 ) (Y − 𝒀)
𝒃𝒚𝒙 =
( 𝑿 − 𝑿)𝟐

(𝑿 − 𝑿 ) (Y − 𝒀)
𝒃𝒙𝒚 =
( 𝒀 − 𝒀)𝟐
3. If means are not to be calculated, then a simple and direct
method is used,

n 𝑿𝒀 − 𝑿 𝒀
𝒃𝒚𝒙 =
𝒏 𝑿𝟐 − ( 𝑿)𝟐

n 𝑿𝒀 − 𝑿 𝒀
𝒃𝒙𝒚 =
𝒏 𝒀𝟐 − ( 𝒀)𝟐
Properties:
1.Correlation coefficient is the geometric mean of
regression coefficients.
i.e. r = 𝒃𝒚𝒙. 𝒃𝒙𝒚
2. The product of two regression coefficient must be less
than or equal to 1.
i.e. 𝒃𝒚𝒙 𝒃𝒙𝒚 ≤ 𝟏
3. Both regression coefficient must have same sign.
The following results of the height and weight of 1000
students:
𝒀 = 170 cm; 𝑿 = 60 Kg; r = 0.6; 𝝈𝒚 = 𝟔. 𝟓; 𝝈𝒙 = 5 Kg. Anil
weighs 45 Kg. Sunil is 165 cm tall. Estimate the height of Anil
from his weight and the weight of Sunil from his height.
Solution: Here, Height = Y and Weight = X
𝒀 = 170 cm; 𝑿 = 60 Kg; r = 0.6; 𝝈𝒚 = 𝟔. 𝟓; 𝝈𝒙 = 5 Kg.
a . The regression equation of Y on X is,
𝝈𝒚
Y -𝒀 = r 𝑿 − 𝑿
𝝈𝒙
𝟔.𝟓
Or, Y - 170 = 0.6 × 𝑿 − 𝟔𝟎
𝟓

Or, Y = 170 + 0.78 𝑿 − 𝟔𝟎


Or , Y = 170 + 0.78X – 46.8
Therefore, Y = 0.78X +123.2
When Anil’s weight X = 45Kg
Then his height Y will be = 0.78 × 𝟒𝟓 + 𝟏𝟐𝟑. 𝟐
= 35.1 + 123.2 = 158.3 cms.
Therefore, required height of Anil = 158.3 cms.
b . The regression equation of X and Y is,
𝝈𝒙
X -𝑿 = r 𝒀 − 𝒀
𝝈𝒚
𝟓
Or, X - 60 = 0.6 × 𝒀 − 𝟏𝟕𝟎
𝟔.𝟓

Or, X - 60= 0.46 𝒀 − 𝟏𝟕𝟎


Or , X = 0.46Y – 78.2 + 60
Therefore, Y = 0.46Y +18.2
When Sunil’s Height Y = 165cms.
Then his weight X will be = 0.46 × 𝟏𝟔𝟓 + 𝟏𝟖. 𝟐
= 75.9 - 18.2 = 57.7 kg.
Therefore, required weight of sunil = 57.7 kg.
Q. The table below gives the data regarding the age (yr) of
children (X) and weight (kg) Y. Fit the regression model.
Estimate the weight of children when age is 10.

X 2 3 10 5 1 7 8 11 12 9 3 5 4 8 7 9

Y 4 4 10 5 3 8 9 9 10 7 5 4 6 7 8 10

Ans : calculation of regression coefficient:


X Y X-𝑿 Y-𝒀 (𝒀 − 𝒀 )𝟐 (𝑿 − 𝑿 )𝟐 ( X -𝑿 )(Y -𝒀 )

2 4 - 4.5 - 2.81 7.8961 20.25 12.645


3 4 - 3.5 - 2.81 7.8961 12.25 9.835
10 10 3.5 3.19 10.1761 12.25 11.165
5 5 - 1.5 - 1.81 3.2761 2.25 2.715
1 3 - 5.5 - 3.81 14.5161 30.25 20.955
7 8 0.5 1.19 1.4161 0.25 0.595
8 9 1.5 2.19 4.7961 2.25 3.285
11 9 4.5 2.19 4.7961 20.25 9.855
12 10 5.5 3.19 10.1761 30.25 17.545
9 7 2.5 0.19 0.0361 6.25 0.475
3 5 - 3.5 - 1.81 3.2761 12.25 6.335
5 4 - 1.5 - 2.81 7.8961 2.25 4.215
4 6 - 2.5 - 0.81 0.6561 6.25 2.025
8 7 1.5 0.19 0.0361 2.25 0.285
7 8 0.5 1.19 1.4161 0.25 0.595
9 10 2.5 3.19 10.1761 6.25 7.975
TOTAL= 104 109 88.4376 166 110.5
Here, 𝑿 = 6.5 and 𝒀= 6.81
The regression coefficient (b);
(𝑿 − 𝑿) (Y − 𝒀)
b= 𝟐
( 𝑿 − 𝑿)
𝟏𝟏𝟎.𝟓
Substituting the value of b, b = = 0.665
𝟏𝟔𝟔
Since the mean X and mean of Y lies on the equation
Y = a + bX, then it becomes
𝒀 = 𝒂 + 𝒃𝑿
a = 𝒀 - b𝑿= 6.81 – 0.665× 𝟔. 𝟓= 2.49
So, the simple linear equation is,
𝒀= 2.49 + 0.665X
Given the value of X =10, we can estimate the value
of Y from the estimated equation, that is

𝒀= 2.49 + 0.665× 𝟏𝟎
= 9.09
Hence the estimated weight of baby is 9.09kg
when age of 10 month.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy