Correlation Anad Regression
Correlation Anad Regression
9-
17
Department of “Biostatistics, bioinformatics Page1of13
and information technologies”
Some examples of data that have a low correlation (or none at all):
Your sexual preference and the type of cereal you eat.
A dog’s name and the type of dog biscuit they prefer.
The cost of a car wash and how long it takes to buy a soda inside the station.
X: 10 20 30 40 50
Y: 20 40 60 80 100
The absolute value of the correlation coefficient gives us the relationship strength.
The larger the number, the stronger the relationship. For example, |-0.75| = 0.75, which
has a stronger relationship than 0.65.
Degree of correlation (-1 ≤ r ≤ 1)
r value
The degree of correlation relationship
Positive Negative
The degree to which the variables are correlated to each other depends on the
Regression Line. The regression also tells about the relationship between the two or more
variables, and then what is the difference between regression and correlation? Well, there
are two important points of differences between Correlation and Regression.
These are:
Where Ye is the dependent variable, X is the independent variable, and a & b are the
two unknown constants that determine the position of the line. The parameter “a” is Y-
intercept and tells about the level of the fitted line, i.e. the distance of a line above or
below the origin and parameter “b” tells about the slope of the line, i.e. the change in the
value of Y for one unit of change in X.
The Regression Line is the line that best fits the data, such that the overall distance
from the line to the points (variable values) plotted on a graph is the smallest. In other
words, a line used to minimize the squared deviations of predictions is called as the
regression line.
Regression line of Y on X: This gives the most probable values of Y from the
given values of X.
The correlation between the variables depend on the distance between two
regression lines, such as the nearer the regression lines to each other the higher is the
degree of correlation, and the farther the regression lines to each other the lesser is the
degree of correlation.
The correlation is said to be either perfect positive or perfect negative when the two
regression lines coincide, i.e. only one line exists. In case, the variables are independent;
then the correlation will be zero, and the lines of regression will be at right angles, i.e.
parallel to the X axis and Y axis.
𝑦̅𝑖
Person Height (x) Weight(y) x*y x*x y*y
N=15
2
1 − 𝑟𝑥𝑦 1 − 0.932
𝒎𝒓 = √ =√ = √0,0104 = 𝟎, 𝟏𝟎𝟐
𝑛−2 15 − 2
𝑟𝑥𝑦 0.93
𝒕𝒓 = = = 𝟗. 𝟏𝟐
𝑚𝑟 0.102
Conclusion: 𝐫𝐱𝐲 (𝟎. 𝟗𝟑) > 𝐫𝐜𝐫 (𝟎, 𝟓𝟏𝟒) , which means H1 is accepted and the
correlation coefficient for our fifteen cases is 0.93, which shows a very strong positive
relationship between height and weight.
𝒓𝒙𝒚 (0.93) > 3 ∗ 𝒎𝒓 (0.306)and 𝒕𝒓 (𝟗. 𝟏𝟐) > 𝟑, this means correlation coefficient
is reliable.
Then we calculate the parameters of regression equation a and b:
∑ 𝒙 𝟐𝟏𝟔𝟏. 𝟓 ∑ 𝒚 𝟏𝟐𝟔𝟑
̅=
𝒙 = ̅=
= 𝟏𝟒𝟒. 𝟏𝒚 = = 𝟖𝟒. 𝟐
𝑵 𝟏𝟓 𝑵 𝟏𝟓
Then make up the regression equation using a and b:𝑦̅ = 𝑏 ∙ 𝑥 + 𝑎 = 𝟎. 𝟔𝟐𝟏 ∙ 𝒙 − 𝟓. 𝟑𝟒𝟖
Calculate 𝑦̅ for each x and complete seventh column:𝑦 ̅̅̅1 = 𝟎. 𝟔𝟐𝟏 ∙ 𝟏𝟑𝟎. 𝟏 − 𝟓. 𝟑𝟒𝟖 = 𝟕𝟓. 𝟓
𝑦… = ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ∙∙
̅̅̅
𝑦15 = 𝟎. 𝟔𝟐𝟏 ∙ 𝟏𝟓𝟖. 𝟏 − 𝟓. 𝟑𝟒𝟖 = 𝟗𝟐. 𝟗
̅̅̅̅
Step 3:Click the function button on the Step 7:Type the location of your data into
ribbon. the “Array 1” and “Array 2” boxes. For
this example, type “B2:A16” into the Array
1 box and then type “C2:C16” into the
Array 2 box.
Step 6:The result will appear. For this particular data set, the correlation coefficient (r) is
0.9319.