SOCI1005 - Correlation and Regression
SOCI1005 - Correlation and Regression
Outline
• Correlation
• Scatterplot
• Coefficient of determination
• Regression
3
Correlation
•
nSxy - SxSy
r=
[nSx 2 2
][
- (Sx ) × nSy - (Sy )
2 2
]
12
Coefficient of Determination
¨ An alternative means of assessing the extent of the relationship between
two variables (that is, how closely related they are) is using the coefficient
of determination (COD).
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Worked Example
Step 1: Make a chart. Use the given data, and add three
more columns: xy, x2, and y2.
AGE GLUCOSE LEVEL
SUBJECT XY X2 Y2
X Y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Worked Example
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779
Worked Example
Step 3: Take the square of the numbers in the x column
and put the results in the x2 column.
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481
Worked Example
Step 4: Take the square of the numbers in the y column and
put the results in the y2 column.
GLUCOSE
SUBJECT AGE X XY X2 Y2
LEVEL Y
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Worked Example
Step 5: Add up all of the numbers in each column and
place the results at the bottom of the column.
GLUCOSE
AGE
SUBJECT LEVEL XY X2 Y2
X
Y
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Total 247 486 20485 11409 40022
Worked Example
Step 6: Use the formula to calculate the correlation coefficient
nSxy - SxSy
r=
[nSx 2
- Sx × nSy - Sy ]
( )2
] [ 2
( )2
¨ Recall that correlation tells us how well two variables are related.
y = a + bx
¨ y is the dependent variable and x is the
independent variable.
¨ a and b are the least square (or regression)
coefficients; ‘a’ is the intercept and ‘b’ is the
slope.
Least Square Regression
a = y - bx
• The intercept is the point on the vertical axis where
the regression line crosses the axis. It is the
predicted value for the dependent variable when
the independent variable has a value of zero.
nSxy - SxSy
b=
nSx - (Sx )
2 2
4 8.6
5.2 8.1
2 9
3.4 8.5
8 7.4
10 6.8
1.5 9.4
6 7.7
Worked Example
Average
Hours of Step 1: Find ‘b’
exercise/
hours of sleep
nSxy - SxSy
week (x)
needed/
b=
nSx 2 - (Sx )
night (y) xy x2 2
4 8.6 34.4 16
5.2 8.1 42.12 27.04
2 9 18 4
3.4 8.5 28.9 11.56
8 7.4 59.2 64
10 6.8 68 100
1.5 9.4 14.1 2.25
6 7.7 46.2 36
40.1 65.5 310.92 260.85
y = 9.64 + (-0.29)x OR
y = 9.64 - 0.29x
Interpretation:
•
Example
A random sample of six drivers insured with a company and
having similar auto insurance policies was selected. The
following table lists their driving experiences (in years) and
monthly auto insurance premiums.
¨ A car rental company charges $40 a day and 20 cents per mile for renting a
car. Let y be the total rental charges (in dollars) for one day and x be the
miles driven. The equation for the relationship between x and y is
y = 40 + .20x
Ø How much will a person pay who rents a car for one day and drives 100
miles?
Example