0% found this document useful (0 votes)
92 views13 pages

Correlation Anad Regression

This document provides guidance for a practical lesson on the topic of linear correlation and linear regression. It begins with an overview of key questions that will be addressed, including defining correlation, different types of correlation, correlation coefficients, Pearson correlation, calculating correlation by hand and with software, regression analysis, regression equations, regression lines, and linear vs. nonlinear regression. It then provides more detailed explanations and examples of correlation, correlation coefficients, the Pearson correlation formula, positive and negative correlation, simple/partial/multiple correlations, linear and nonlinear correlations, and interpreting correlation values. The document concludes with a practical example of calculating a correlation coefficient between height and weight variables.

Uploaded by

MY LIFE MY WORDS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views13 pages

Correlation Anad Regression

This document provides guidance for a practical lesson on the topic of linear correlation and linear regression. It begins with an overview of key questions that will be addressed, including defining correlation, different types of correlation, correlation coefficients, Pearson correlation, calculating correlation by hand and with software, regression analysis, regression equations, regression lines, and linear vs. nonlinear regression. It then provides more detailed explanations and examples of correlation, correlation coefficients, the Pearson correlation formula, positive and negative correlation, simple/partial/multiple correlations, linear and nonlinear correlations, and interpreting correlation values. The document concludes with a practical example of calculating a correlation coefficient between height and weight variables.

Uploaded by

MY LIFE MY WORDS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

NJSC «Medical University Astana» MRL– 07.2.

9-
17
Department of “Biostatistics, bioinformatics Page1of13
and information technologies”

METHODICAL RECOMMENDATION FOR PRACTICAL LESSON

On discipline «Basics of Biostatistics»

Theme: Linear correlation. Linear regression.

Compiled by: Zhunissova U.M


NJSC «Medical University Astana» MRL– 07.2.9-
17
Department of “Biostatistics, bioinformatics Page2of13
and information technologies”

1. Theme: Linear correlation. Linear regression.

2. The main questions of a theme:


1. What is a correlation? Simple Definition.
2. Types of correlation? Give examples
3. What is a correlation coefficient? Degree of correlation?
4. What is Pearson Correlation? Formula.
5. Degree of correlation
6. How to Calculate a correlation coefficient:
 By hand
 By Excel
 By SPSS
7. Regression Analysis
8. Regression Equation
9. Regression Line
10. Linear or nonlinear Regression
Correlation analysis

Correlation is used to test relationships between quantitative variables or


categorical variables. In other words, it’s a measure of how things are related. The study of
how variables are correlated is called correlation analysis.

Some examples of data that have a high correlation:


 Your caloric intake and your weight.
 Your eye color and your relatives’ eye colors.
 The amount of time your study and your GPA.

Some examples of data that have a low correlation (or none at all):
 Your sexual preference and the type of cereal you eat.
 A dog’s name and the type of dog biscuit they prefer.
 The cost of a car wash and how long it takes to buy a soda inside the station.

A correlation coefficient is a way to put a value to the relationship. Correlation


coefficients have a value of between -1 and 1. A “0” means there is no relationship
between the variables at all, while -1 or 1 means that there is a perfect negative or
positive correlation (negative or positive correlation here refers to the type of graph the
relationship will produce).

Graphs showing a correlation of -1, 0 and +1

Correlation coefficients are used in statistics to measure how strong a relationship


is between two variables. There are several types of correlation coefficient. Pearson’s
correlation (also called Pearson’s R) is a correlation coefficient commonly used in linear
regression.

One of the most commonly used formulas in stats is Pearson’s correlation


coefficient formula:
 Shared variability of X and Y variables on the top
 Individual variability of X and Y variables on the bottom

1. Positive and Negative Correlation: Whether the correlation between the


variables is positive or negative depends on its direction of change. The correlation is
positive when both the variables move in the same direction, i.e. when one variable
increases the other on an average also increases and if one variable decreases the other
also decreases. The correlation is said to be negative when both the variables move in the
opposite direction, i.e. when one variable increases the other decreases and vice versa.

2. Simple, Partial and Multiple Correlations: Whether the correlation is simple,


partial or multiple depends on the number of variables studied. The correlation is said to
be simple when only two variables are studied. The correlation is either multiple or partial
when three or more variables are studied. The correlation is said to be Multiple when
three variables are studied simultaneously.

3. Linear and Non-Linear (Curvilinear) Correlation: Whether the correlation


between the variables is linear or non-linear depends on the constancy of ratio of change
between the variables. The correlation is said to be linear when the amount of change in
one variable to the amount of change in another variable tends to bear a constant ratio.
For example, from the values of two variables given below, it is clear that the ratio of
change between the variables is the same:

X: 10 20 30 40 50
Y: 20 40 60 80 100

The correlation is called as non-linear or curvilinear when the amount of change in


one variable does not bear a constant ratio to the amount of change in the other variable.

The absolute value of the correlation coefficient gives us the relationship strength.
The larger the number, the stronger the relationship. For example, |-0.75| = 0.75, which
has a stronger relationship than 0.65.
Degree of correlation (-1 ≤ r ≤ 1)

r value
The degree of correlation relationship
Positive Negative

No relationship [zero order correlation] 0

No or negligible relationship +0.01 to +0.19 -0.01 to -0.19

weak relationship +0.20 to +0.29 -0.20 to -0.29

Moderate relationship +0.30 to +0.39 -0.30 to -0.39

Strong relationship +0.40 to +0.69 -0.40 to -0.69

Very strong relationship +0.70 or higher -0.70 or higher


Regression Analysis

The degree to which the variables are correlated to each other depends on the
Regression Line. The regression also tells about the relationship between the two or more
variables, and then what is the difference between regression and correlation? Well, there
are two important points of differences between Correlation and Regression.
These are:

 The Correlation Coefficient measures the “degree of relationship” between


variables, say X and Y whereas the Regression Analysis studies the “nature of
relationship” between the variables.

 Correlation coefficient does not clearly indicate the cause-and-effect relationship


between the variables, i.e. it cannot be said with certainty that one variable is the cause,
and the other is the effect. Whereas, the Regression Analysis clearly indicates the cause-
and-effect relationship between the variables.

The Regression Equation is the algebraic expression of the regression lines. It is


used to predict the values of the dependent variable from the given values of independent
variables.
Regression Equation of Y on X: This is used to describe the variations in the value
Y from the given changes in the values of X. It can be expressed as follows:

Where Ye is the dependent variable, X is the independent variable, and a & b are the
two unknown constants that determine the position of the line. The parameter “a” is Y-
intercept and tells about the level of the fitted line, i.e. the distance of a line above or
below the origin and parameter “b” tells about the slope of the line, i.e. the change in the
value of Y for one unit of change in X.

The Regression Line is the line that best fits the data, such that the overall distance
from the line to the points (variable values) plotted on a graph is the smallest. In other
words, a line used to minimize the squared deviations of predictions is called as the
regression line.

Regression line of Y on X: This gives the most probable values of Y from the
given values of X.

The correlation between the variables depend on the distance between two
regression lines, such as the nearer the regression lines to each other the higher is the
degree of correlation, and the farther the regression lines to each other the lesser is the
degree of correlation.
The correlation is said to be either perfect positive or perfect negative when the two
regression lines coincide, i.e. only one line exists. In case, the variables are independent;
then the correlation will be zero, and the lines of regression will be at right angles, i.e.
parallel to the X axis and Y axis.

Technically, in regression analysis, the independent variable is usually called the


predictor variable and the dependent variable is called the criterion variable. Regression
analysis can result in linear or nonlinear graphs. A linear regression is where the
relationships between your variables can be described with a straight line. Non-linear
regressions produce curved lines.
Practical part
Correlation Example
Let's assume that we want to look at the relationship between two variables, height
and weight.

H0: There is no relationship between height (x variable) and weight (y variable).


H1: There is a relationship between height (x variable) and weight (y variable).

Here's the data for the 15 cases


Now we're ready to compute the
Person Height, x Weight, y
correlation value. The formula for the
1 130.1 75.2 correlation coefficient is:
2 132.1 76.2
3 134.1 77.2
4 136.1 79.2
5 138.1 81.2
6 140.1 83.2
7 142.1 85.2
8 144.1 82.2
9 146.1 87.2
10 148.1 88.2
11 150.1 81.2
We use the symbol 𝒓𝒙𝒚 to stand for the
12 152.1 90.2
correlation. Through the magic of
13 154.1 91.2 mathematics it turns out that r will always
14 156.1 92.2 be between -1.0 and +1.0. If the
15 158.1 93.2 correlation is negative, we have a
negative relationship; if it's positive, the
relationship is positive.
Here's the original data with the other necessary columns:

𝑦̅𝑖
Person Height (x) Weight(y) x*y x*x y*y

1 130.1 75.2 9783.5 16926 5655.04 75.5


2 132.1 76.2 10066 17450 5806.44 76.7
3 134.1 77.2 10353 17983 5959.84 78.0
4 136.1 79.2 10779 18523 6272.64 79.2
5 138.1 81.2 11214 19072 6593.44 80.5
6 140.1 83.2 11656 19628 6922.24 81.7
7 142.1 85.2 12107 20192 7259.04 83.0
8 144.1 82.2 11845 20765 6756.84 84.2
9 146.1 87.2 12740 21345 7603.84 85.4
10 148.1 88.2 13062 21934 7779.24 86.7
11 150.1 81.2 12188 22530 6593.44 87.9
12 152.1 90.2 13719 23134 8136.04 89.2
13 154.1 91.2 14054 23747 8317.44 90.4
14 156.1 92.2 14392 24367 8500.84 91.7
15 158.1 93.2 14735 24996 8686.24 92.9
Sum = 2161.5 1263 182694 312592 106843
Average 𝟏𝟒𝟒. 𝟏 𝟖𝟒. 𝟐 r= 0.93

N=15

∑ 𝒙𝒚 = 𝟏𝟖𝟐𝟔𝟗𝟒; ∑ 𝒙 = 𝟐𝟏𝟔𝟏. 𝟓 ; ∑ 𝒚 = 𝟏𝟐𝟔𝟑; ∑ 𝒙𝟐 = 𝟑𝟏𝟐𝟓𝟗𝟐 ; ∑ 𝒚𝟐 = 𝟏𝟎𝟔𝟖𝟒𝟑 ;

15 ∙ 182694 − 2161.5 ∙ 1263


𝒓𝒙𝒚 = = 𝟎. 𝟗𝟑𝟏𝟗𝟑 ≈ 𝟎. 𝟗𝟑
√(15 ∙ 312592 − 2161.52) ∙ (15 ∙ 106843 − 12632 )

2
1 − 𝑟𝑥𝑦 1 − 0.932
𝒎𝒓 = √ =√ = √0,0104 = 𝟎, 𝟏𝟎𝟐
𝑛−2 15 − 2

𝑟𝑥𝑦 0.93
𝒕𝒓 = = = 𝟗. 𝟏𝟐
𝑚𝑟 0.102

𝒓𝒄𝒓 = (𝜶, 𝒌) = (𝛼, 𝑛 − 2) = (0.05, 13) = 𝟎, 𝟓𝟏𝟒

Note: we check the critical value in Pearson table.


Then compare values 𝐫𝐱𝐲 and 𝐫𝐜𝐫 :

If 𝐫𝐱𝐲 < 𝐫𝐜𝐫 , we accept H0. If 𝐫𝐱𝐲 ≥ 𝐫𝐜𝐫 , we accept H1.

Conclusion: 𝐫𝐱𝐲 (𝟎. 𝟗𝟑) > 𝐫𝐜𝐫 (𝟎, 𝟓𝟏𝟒) , which means H1 is accepted and the
correlation coefficient for our fifteen cases is 0.93, which shows a very strong positive
relationship between height and weight.
𝒓𝒙𝒚 (0.93) > 3 ∗ 𝒎𝒓 (0.306)and 𝒕𝒓 (𝟗. 𝟏𝟐) > 𝟑, this means correlation coefficient
is reliable.
Then we calculate the parameters of regression equation a and b:

∑ 𝒙 𝟐𝟏𝟔𝟏. 𝟓 ∑ 𝒚 𝟏𝟐𝟔𝟑
̅=
𝒙 = ̅=
= 𝟏𝟒𝟒. 𝟏𝒚 = = 𝟖𝟒. 𝟐
𝑵 𝟏𝟓 𝑵 𝟏𝟓

∑ 𝑥𝑦 − 𝑁 ∙ 𝑥̅ ∙ 𝑦̅ 182694 − 15 ∙ 144.1 ∙ 84.2


𝒃= = = 𝟎. 𝟔𝟐𝟏
̅̅̅2
∑ 𝑥 2 − 𝑁 ∙ (𝑥) 312592 − 15 ∙ 144.12

𝒂 = 𝑦̅ − 𝑏 ∙ 𝑥̅ = 84.2 − 0.621 ∙ 144.1 = −𝟓. 𝟑𝟒𝟖

Then make up the regression equation using a and b:𝑦̅ = 𝑏 ∙ 𝑥 + 𝑎 = 𝟎. 𝟔𝟐𝟏 ∙ 𝒙 − 𝟓. 𝟑𝟒𝟖

Calculate 𝑦̅ for each x and complete seventh column:𝑦 ̅̅̅1 = 𝟎. 𝟔𝟐𝟏 ∙ 𝟏𝟑𝟎. 𝟏 − 𝟓. 𝟑𝟒𝟖 = 𝟕𝟓. 𝟓
𝑦… = ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ∙∙
̅̅̅
𝑦15 = 𝟎. 𝟔𝟐𝟏 ∙ 𝟏𝟓𝟖. 𝟏 − 𝟓. 𝟑𝟒𝟖 = 𝟗𝟐. 𝟗
̅̅̅̅

Construct: Empirical line by (X,Y) and Theoretical line by (X,𝐲̅)


Pearson correlation table of critical values
1. CORRELATION COEFFICIENT IN EXCEL:
Step 1: Type your data into two columns in Excel. For example, type your “X” data into
column B and your “Y” data into column C. And type number in A column.

Step 2:Select any empty cell. Step 6:Click “OK.”

Step 3:Click the function button on the Step 7:Type the location of your data into
ribbon. the “Array 1” and “Array 2” boxes. For
this example, type “B2:A16” into the Array
1 box and then type “C2:C16” into the
Array 2 box.

Step 4:Type “correlation” into the


‘Search for a function’ box.

Step 5:Click “Go.” CORREL will be


highlighted.

Step 8: Click “OK.” The result will appear


in the cell you selected in Step 2. For this
particular data set, the correlation
coefficient(r) is 0.9319.
2. CORRELATION COEFFICIENT IN EXCEL (BY DATA ANALYSIS TOOL):
Step 1: Type your data into two columns in Excel. For example, type your “Y” data into
column B and your “Y” data into column C. And type number in A column.
Step 2:Click the Data tabon the ribbon, go to Data analysis, then choose Correlation in
the appeared list.

Step 3:Click “OK.”


Step 4:Type the location of your data into the “Input Range”type both variables’ range
address. For this example, type “B2:C16”. Then into the “Output Range”select any
empty cell. For this example, type “E2”.

Step 5:Click “OK.”

Step 6:The result will appear. For this particular data set, the correlation coefficient (r) is
0.9319.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy