0% found this document useful (0 votes)
15 views

biostat lecture note 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

biostat lecture note 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Department of Biology

Umaru Musa Yar’adua University, Katsina

BIO2226 BIOSTATISTICS 30/10/2024

Correlation and Simple regression


Correlation
The scatterplot provides a visual impression of the nature of relation between the x and y values
in a bivariate data set. In many cases, the points appear to band around the straight line. Our
visual impression of the closeness of the scatter to a linear relation can be quantified by
calculating a numerical measure, called the sample correlation coefficient. The sample
correlation coefficient, denoted by r (or in some cases r xy), is a measure of the strength of the
linear relation between the x and y variables.

The Correlation Analysis is the statistical tool used to study the closeness of the relationship
between two or more variables. The variables are said to be correlated when the movement of
one variable is accompanied by the movement of another variable. Karl Pearson’s Co-efficient
of Correlation. Karl Pearson’s method, popularly known as Pearsonian co-efficient of
correlation, is most widely applied in practice to measure correlation. The Pearsonian co-
efficient of correlation is represented by the symbol r.
The correlation r falls between -1 and 1. Values of r near 0 indicate a very weak linear
relationship. The strength of the linear relationship increases as r moves away from 0 toward
either -1 or 1. Values of r close to -1 or 1 indicate that the points lie close to a straight line. The
extreme values r = −1 and r = 1 occur only in the case of a perfect linear relationship, when the
points in a scatterplot lie exactly along a straight line.

Example Determine the correlation co-efficient for the height and weight of 10 persons provided
as follows

Height cm Weight kg
158 48
162 57
163 57
170 60
154 45
167 55
177 62
170 65
179 70
179 68
r = ∑ [(x- ) (y-ȳ)]

[√ (x- )2 (y- ȳ)2

Or r = ∑XY – (∑X)(∑Y)
N

[√∑X2 - ∑(X)2] [∑Y2 -∑(Y)2]


N N

Example: The following is the data of head and body weights of 10 insects (Drosophila
melanogaster). Head weight (mg) 20 22 25 27 31 32 35 38 39 40
Body weight (mg) 60 64 72 80 84 86 92 96 97 102.

Find r between head and body weights (r = 0.99).

Determine the correlation coefficient between body mass index and cholesterol level of 10
individuals
BMI kg/m2 Cholesterol Mg/dl
12 148
16 130
25 165
18 155
26 180
25 187
27 200
32 210
31 198
35 220

Simple regression
Linear regression is a statistical technique used to predict (forecast) the value of a variable from

known related variables. The relationship between two variables described using the equation of

a straight line y = ax + b: b = y-intercept (i.e value of y when x = 0) it is the point at which the
regression line crosses the y-axis, a = slope of the regression line (regression coefficient), it is

the direction/ strength of the relationship ( amount & direction of change in y for each one-unit

change in x.

Regression equations: Regression equations are algebraic expressions of the regression lines. As

there are two regression lines, there are two regression equations: i.e. y on x is used to describe

the variations in the values of y for given changes in x.

R = Pearson correlation coefficient, square root of R 2 it provides good estimate of the overall fit

of the regression model.

Regression equation is in the form y = a + bx. It is found using the values in the table

above. The value of b is calculated first, then the value of ‘a’ is obtained using the value

obtained for b.

Calculation of b.

‘a’ can be calculated using = a +b ȳ ; where and ȳ are respective means


Example Determine the regression equation for the height and weight of 10 persons provided as
follows

Height cm Weight kg
158 48
162 57
163 57
170 60
154 45
167 55
177 62
170 65
179 70
179 68
Solution: y = 0.8604x - 85.754; R² = 0.8671

The Correlation coefficient measures the “degree of relationship” between variables, say X and
Y whereas the Regression analysis studies the “nature of relationship” between the variables.
Correlation coefficient does not clearly indicate the cause-and-effect relationship between the
variables, i.e. it cannot be said with certainty that one variable is the cause, and the other is the
effect. Whereas, the Regression analysis clearly indicates the cause-and-effect relationship
between the variables.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy