biostat lecture note 3
biostat lecture note 3
The Correlation Analysis is the statistical tool used to study the closeness of the relationship
between two or more variables. The variables are said to be correlated when the movement of
one variable is accompanied by the movement of another variable. Karl Pearson’s Co-efficient
of Correlation. Karl Pearson’s method, popularly known as Pearsonian co-efficient of
correlation, is most widely applied in practice to measure correlation. The Pearsonian co-
efficient of correlation is represented by the symbol r.
The correlation r falls between -1 and 1. Values of r near 0 indicate a very weak linear
relationship. The strength of the linear relationship increases as r moves away from 0 toward
either -1 or 1. Values of r close to -1 or 1 indicate that the points lie close to a straight line. The
extreme values r = −1 and r = 1 occur only in the case of a perfect linear relationship, when the
points in a scatterplot lie exactly along a straight line.
Example Determine the correlation co-efficient for the height and weight of 10 persons provided
as follows
Height cm Weight kg
158 48
162 57
163 57
170 60
154 45
167 55
177 62
170 65
179 70
179 68
r = ∑ [(x- ) (y-ȳ)]
Or r = ∑XY – (∑X)(∑Y)
N
Example: The following is the data of head and body weights of 10 insects (Drosophila
melanogaster). Head weight (mg) 20 22 25 27 31 32 35 38 39 40
Body weight (mg) 60 64 72 80 84 86 92 96 97 102.
Determine the correlation coefficient between body mass index and cholesterol level of 10
individuals
BMI kg/m2 Cholesterol Mg/dl
12 148
16 130
25 165
18 155
26 180
25 187
27 200
32 210
31 198
35 220
Simple regression
Linear regression is a statistical technique used to predict (forecast) the value of a variable from
known related variables. The relationship between two variables described using the equation of
a straight line y = ax + b: b = y-intercept (i.e value of y when x = 0) it is the point at which the
regression line crosses the y-axis, a = slope of the regression line (regression coefficient), it is
the direction/ strength of the relationship ( amount & direction of change in y for each one-unit
change in x.
Regression equations: Regression equations are algebraic expressions of the regression lines. As
there are two regression lines, there are two regression equations: i.e. y on x is used to describe
R = Pearson correlation coefficient, square root of R 2 it provides good estimate of the overall fit
Regression equation is in the form y = a + bx. It is found using the values in the table
above. The value of b is calculated first, then the value of ‘a’ is obtained using the value
obtained for b.
Calculation of b.
Height cm Weight kg
158 48
162 57
163 57
170 60
154 45
167 55
177 62
170 65
179 70
179 68
Solution: y = 0.8604x - 85.754; R² = 0.8671
The Correlation coefficient measures the “degree of relationship” between variables, say X and
Y whereas the Regression analysis studies the “nature of relationship” between the variables.
Correlation coefficient does not clearly indicate the cause-and-effect relationship between the
variables, i.e. it cannot be said with certainty that one variable is the cause, and the other is the
effect. Whereas, the Regression analysis clearly indicates the cause-and-effect relationship
between the variables.