9correlation and Regression
9correlation and Regression
Correlation:
Sometime two continuous characters or
variables are measured in the same person
such as height and weight, temperature and
pulse rate, age and weight etc. The relationship
or association between two continuous
(quantitative) variable is called correlation.
Correlation coefficient:
The measure of degree of association between
variables is known as correlation coefficient. It is
generally denoted by ‘r’.
5. Absolutely no correlation:
In this case, r = 0 indicating no linear relation exist
between two variables.
e.g. body weight and I.Q
Methods to study correlation:
1.Scatter diagram
2. Karl Pearson's correlation
coefficient or co-variance method.
3.Rank method
1.Scatter diagram:
• Graphical method.
• In this method, one of two variables say
X, is taken along the horizontal axis and
other say Y is taken along vertical axis.
• It gives the direction of correlation but
fails to give the degree of correlation.
2. Karl Pearson's correlation coefficient:
•It is the mathematical method to
measure the degree of relationship
between two continuous variables(
quantitative variables).
•It is denoted by r.
Let X and Y be the two variables, then,
𝑪𝒐𝒗 (𝑿,𝒀)
r=
𝑽𝒂𝒓 𝑿 𝑽𝒂𝒓 𝒀
Where, Cov (X,Y) = covariance between X and Y.
𝟏
𝑿 −𝑿 ( 𝒀 − 𝒀)
𝒏
= 𝟐
𝟏 𝟏
(𝑿 −𝑿)𝟐 . ( 𝒀 − 𝒀)
𝒏 𝒏
(𝑿 − 𝑿) (𝒀 − 𝒀)
=
𝟐 𝟐
( 𝑿 − 𝑿) . ( 𝒀 − 𝒀)
𝒏 𝑿𝒀 − 𝑿 𝒀
Also, r =
𝟐
𝒏 𝑿𝟐 −( 𝑿) 𝒏 𝒀𝟐 −( 𝒀)𝟐
Coefficient of determination( 𝟐
𝒓 ):
It is square of coefficient of correlation .It has
always lies between 0 to 1. It explains the
amount of change in dependent variable (Y)
due to change of independent variable (X).
𝟐
For e.g. if r = 0.8 then 𝒓 = 0.64, it means that
64% amount in change in Y variable can be
explained by the change in X variable.
Test of significance of r.
Test statistics is,
𝒓−𝝆
t=
𝑺𝑬(𝒓)
𝟏 − 𝒓𝟐
Where, SE = the standard error of r =
𝒏 −𝟐
Age 30 50 40 35 55 60 43 58 65 70
BP 122 150 122 120 140 142 150 130 145 160
= 0.72
It means that the 52% change in BP occurs due to the change of age.
Test of Significance of r :
H₀:Þ=0, there is no relationship between age and BP vs.
H₁ :Þ≠0, there is relationship between age and BP.
Test statistic:
t= ,Where SE(r) = the standard error =
candidates A B C D E F
Rank by X 1 3 2 5 4 6
Rank by Y 2 1 3 6 4 5
Candidates Rank by X(𝑹𝟏 ) Rank by Y(𝑹𝟐 ) d = 𝑹𝟏 − 𝑹𝟐 𝒅𝟐
A 1 2 -1 1
B 3 1 2 4
C 2 3 -1 1
D 5 6 -1 1
E 4 4 0 0
F 6 5 1 1
n=7 𝒅=0 𝒅𝟐 = 8
𝟔 𝒅𝟐 𝟔×𝟖 𝟒𝟖
R=1- = 1- =1- = 0.77
𝒏(𝒏𝟐 −𝟏) 𝟔× 𝟑𝟔 −𝟏 𝟔×𝟑𝟓
When ranks are not given:
This method can be used for quantitative data when
ranks are not given. In such cases the actual data are
converted into ranks by assuming the ranks either in
ascending order or in descending order. We can give 1
for highest or lowest value and then we give 2 for the
second highest or second lowest value.similarly we give
rank to all data.
𝟔 𝒅𝟐
R=1-
𝒏(𝒏𝟐 −𝟏)
Q. The marks obtained by 9 students in Bio Statistics and Research
Methodology are as follows:
Marks in Bio 35 23 47 17 10 43 9 6 28
Statistics (x)
Marks in Research 30 33 45 23 8 49 12 4 3
Methodology (Y)
( 𝑿 − 𝑿 ) = 𝒃𝒙𝒚 ( Y - 𝒀 )
Where, 𝑿 and 𝒀 are the arithmetic mean of X and Y series.
(𝑿 − 𝑿 ) (Y − 𝒀)
𝒃𝒙𝒚 =
( 𝒀 − 𝒀)𝟐
3. If means are not to be calculated, then a simple and direct
method is used,
n 𝑿𝒀 − 𝑿 𝒀
𝒃𝒚𝒙 =
𝒏 𝑿𝟐 − ( 𝑿)𝟐
n 𝑿𝒀 − 𝑿 𝒀
𝒃𝒙𝒚 =
𝒏 𝒀𝟐 − ( 𝒀)𝟐
Properties:
1.Correlation coefficient is the geometric mean of
regression coefficients.
i.e. r = 𝒃𝒚𝒙. 𝒃𝒙𝒚
2. The product of two regression coefficient must be less
than or equal to 1.
i.e. 𝒃𝒚𝒙 𝒃𝒙𝒚 ≤ 𝟏
3. Both regression coefficient must have same sign.
The following results of the height and weight of 1000
students:
𝒀 = 170 cm; 𝑿 = 60 Kg; r = 0.6; 𝝈𝒚 = 𝟔. 𝟓; 𝝈𝒙 = 5 Kg. Anil
weighs 45 Kg. Sunil is 165 cm tall. Estimate the height of Anil
from his weight and the weight of Sunil from his height.
Solution: Here, Height = Y and Weight = X
𝒀 = 170 cm; 𝑿 = 60 Kg; r = 0.6; 𝝈𝒚 = 𝟔. 𝟓; 𝝈𝒙 = 5 Kg.
a . The regression equation of Y on X is,
𝝈𝒚
Y -𝒀 = r 𝑿 − 𝑿
𝝈𝒙
𝟔.𝟓
Or, Y - 170 = 0.6 × 𝑿 − 𝟔𝟎
𝟓
X 2 3 10 5 1 7 8 11 12 9 3 5 4 8 7 9
Y 4 4 10 5 3 8 9 9 10 7 5 4 6 7 8 10
𝒀= 2.49 + 0.665× 𝟏𝟎
= 9.09
Hence the estimated weight of baby is 9.09kg
when age of 10 month.