(Mathe) Simple Linear Regression and Correlation
(Mathe) Simple Linear Regression and Correlation
25000
20000
Shilling Sales (y)
15000
10000
5000
0
0 20 40 60 80 100 120 140
100
End of year evaluation (y)
80
60
40
20
0
55 60 65 70 75 80 85 90 95 100 105
calculation 23 56 74 29 82 45 36 51 60 55 52 88 95
Interpretation 16 38 65 39 32 51 11 19 47 54 43 50 60
What to be done?
•Plot a scatter diagram
• Write the least squares regression line (estimated
regression equation)
•Use the estimated regression line/equation to
estimate student 14’s interpretation test results, if
he got 72% in the calculation test.
IFM student's marks in %
70
60
Interpretation Marks
50
40
30
20
10
0
10 20 30 40 50 60 70 80 90 100
Calculation Marks
Regression line…….
data information
Calculation Interpretation
() ()
23 16 368 529
26 39 1131 841
39 11 396 1296
45 51 2295 2025
51 19 969 2601
52 43 2236 2704
55 54 2970 3025
56 38 2128 3136
60 47 2820 3600
74 65 4810 5476
82 32 2624 6724
88 50 4400 7744
95 60 5700 9025
=746
From the table above…….
13 students, 525,746,
Height 61 62 65 65 65 67 67 69
140
120
100
weight in Kg
80
60
40
20
0
60 61 62 63 64 65 66 67 68 69 70
height in inches
Solution…..
The appropriate regression line is the y on x line, which is
indicated as
140
120
100
Weight (Kg)
80
60
40
20
0
58 60 62 64 66 68 70 72
Height (inches)
Recap…..
•Defined linear regression
•Why simple and Linear?
•Relationship between functional r/ship and statistical r/ship
•Scatter diagram
•Methods of obtaining a regression line (semi-average, least squares
regression and inspection)
•Steps for plotting a regression line using Least Squares method
END OF TODAY’S LECTURE
WEEK SEVEN
TOPIC TWO
Correlation
•Definition
•Correlation coefficient
•Methods of obtaining correlation
a) Product moment correlation coefficient (r)
b) Spearman’s rank correlation coefficient (p)
•Comparison of Methods of obtaining correlation
•Coefficient of determination
Definition
•Correlation is a method used to measure the STRENGTH of the
relationship between two variables
•The correlation is said to be positive when high values of one
variable are related with the high values of the other variable or
variables, while a negative correlation refers to association of
high values of one variable with the low values of the others.
•In other words we may say that positive correlation occurs
when both variables tend to increase or decrease together and
correlation is said to be negative when one variable (dependent)
tends to decrease as the other variable (independent) increases
or vice versa.
Important……..
It is significant to note that correlation is suitable in
discovering possible associates between given
variables. However, it does not provide any causal
relationship between them
Definition……
•Correlation values range between negative one (-1) and positive
one (+1).
•Values close to zero (0) shows poor correlation
•if the correlation is zero indicates no correlation
•Values equal to positive one (+1) shows perfect positive correlation
•values close to positive one indicate a high level of positive
correlation
•Values equal to negative one (-1) shows perfect negative
correlation
•values close to negative one indicate a high level of negative
correlation
Correlation coefficient
•Correlation coefficient is a specific way of measuring the strength
of the correlation between bivariate. It is denoted by small letter
‘r’, where r is a number between -1 and +1 (here -1 and +1 are
included)
• The positive (+) and negative (-) signs are used for positive linear
correlation and negative linear correlation respectively
•Mathematically it is presented as. As stated above, when there is
no correlation
Perfect correlation
Correlation coefficients,
examples
Methods of obtaining correlation
(Product moment correlation
coefficient (r))
,
140
120
Time after hyperventilating in min
100
80
60
40
20
0
20 30 40 50 60 70 80 90 100
Time after normal breathing in seconds
Solution…..
2nd step: we need to find r.
x y xy x2 y2
56 87 4872 3136 7569
56 91 5096 3136 8281
65 85 5525 4225 7225
65 91 5915 4225 8281
50 75 3750 2500 5625
25 28 700 625 784
87 122 10614 7569 14884
44 66 2904 1936 4356
35 58 2030 1225 3364
50 70 4200 3136 5625
Solution…..
n=10, ,
Solution……
The result above indicates the strength of relationship
between the two variables. It can be concluded that there
is a very strong measure of correlation (r = 0.97) between
Hyperventilating time and breathing time.
Methods of obtaining correlation
(Spearman’s rank correlation
coefficient (p))
•Based on the ranks of the independent and dependent variables respectively
•Suitably used when one or both variables are ordinal or skewed
• Given by:
r2=0.972=0.9409,
coefficient obtained above,