Correlation & Regression
Correlation & Regression
EPIDEMIOLOGY (L)
LABORATORY
Lab Guide 03
Learning Outcomes
• Relate inferential statistical procedures for correlation and
regression with data characteristics and sampling
procedures
• Use appropriate inferential statistical procedures for
correlation and regression for a given data set
• Compute for key inferential statistics measures for
correlation and regression
• Make narratives for inferential statistics analyses results for
correlation and regression in report format
INVESTIGATING
RELATIONSHIPS
BETWEEN VARIABLES
Graphical approach
• Qualitative variables: Comparative bar graph
Graphical approach
• Quantitative variables: Scatterpoint diagram
Direct/Positive: Inverse/Negative:
Points rise from left Points fall from left
to right. to right.
Correlation
Relationship between
Coefficient
Quantitative Correlation
(Pearson/
study hours and exam
scores
Spearman)
Predicting blood
Quantitative/ Regression
Prediction pressure from age and
Qualitative Analysis
weight
Correlation analysis
• Used when the objective
is to measure the strength and
the direction of linear relationship
between 2 or more quantitative variables
Parametric
Non-parametric
150
100
• Strength: Strong
50
Strong positive linear
0
0 50 100 150 200 250
relationship
SBP Standing Position (mmHg)
Test of hypothesis
• H0: ρ = 0 (There is no correlation between the SBP of patients
in the recumbent and standing positions.)
• H1: ρ ≠ 0 (There is a correlation between the SBP of patients in
the recumbent and standing positions.)
•α = 0.05
• Test statistics
• Critical region: t ≥2.306 or t≤ 2.306
df = n-2 = 10-2=8
Critical region: t ≥ 2.306 or t≤ 2.306
Test of hypothesis
• Statistical decision: The null hypothesis is rejected since the
computed t-test (7.82) falls within the critical region (2.306).
1 2 220
200
2 5 210
3 3 215 150
4 7 180 100
5 1 240
6 6 185 50
7 4 210 0
0 1 2 3 4 5 6 7 8
8 3 225
9 2 230 Strong negative linear relationship
10 5 200
Test of hypothesis
• Null Hypothesis (H₀): There is no significant relationship
between the number of hours of physical activity and
cholesterol levels (ρ = 0).
• Alternative Hypothesis (H₁): There is a significant relationship
between the number of hours of physical activity and
cholesterol levels (ρ ≠ 0).
• α = 0.05
• Test statistics
• Critical region: t ≥ 2.306 or t≤ 2.306
df = n-2 = 10-2=8
Critical region: t ≥ 2.306 or t≤ 2.306
Test of hypothesis
• Statistical decision: The null hypothesis is rejected since the
computed t-test (-9.96) falls within the critical region
(2.306).
100
80
60
40
20
0
0 5 10 15 20 25 30 35 40
Non-linear
Test of hypothesis
• Null Hypothesis (H0): There is no significant correlation
between age and performance score (ρ=0).
• Alternative Hypothesis (H1): There is a significant correlation
between age and performance score (ρ≠0)
•α = 0.05
• Test statistics
• There is strong monotonic inverse relationship between age
and performance score.
• Under this method, the best line among all the possible
lines which can be fitted to the data, is the one which
gives the minimum value of the sum of the squares of
the vertical deviations of each data point, from the
corresponding value in the line.
General formula for the prediction equation in
simple linear regression analysis
Y =b0 –/+ b1X
General formula for slope
General formula for intercept
Sample
Hours
• A researcher is studying the Test
Student Studied
relationship between the Score (y)
number of hours studied and (x)
the test scores achieved by 1 1 50
students. The researcher 2 2 55
wants to predict the test
score based on the number of 3 3 60
hours studied. 4 4 65
5 5 70
Regression result
• Intercept (b₀): 45
predicted value of Y when X = O
• Slope (b₁): 5
rate of change Y for a one-unit increase X
• regression equation: y=45+5x
R-squared (R²)
• how well the model fits the data
• R² ranges from 0 to 1.
• R² = 1
A value closer to 1 indicates a better fit.
If R2=1, the model perfectly explains the variation in
the dependent variable.
Standard Error
• tells you the precision of the regression coefficients
(intercept and slope)
• SE = O
Prediction
• If a student studies for 6 hours, what is the predicted
test score?
• y = 45+5(6)=75
1 55 1 30 1 1 0 1
2 45 0 28 0 0 1 0
3 60 1 32 1 1 0 1
4 40 0 22 0 0 1 0
5 50 1 35 1 0 1 1
6 35 0 25 0 1 1 0
7 48 1 33 1 0 0 1
8 52 0 29 0 1 0 0
9 44 1 31 1 1 1 1
10 60 0 36 1 1 0 1