14 - Regresi Dan Korelasi
14 - Regresi Dan Korelasi
A N D C O R R E L AT I O N T E S T
B I O STAT I ST I K DA S A R
INTRODUCTION – REGRESSION TEST
What Is Regression Analysis?
A statistical method used to examine the relationship between a dependent variable (also known as the
outcome or response variable) and one or more independent variables (also known as predictors or
explanatory variables)
Darlington, Richard B., and Andrew F. Hayes. "Regression analysis and linear models." New York, NY: Guilford (2017)
Regression analysis examines the relationship between a quantitative response variable, Y , and one or
more explanatory variables, X1; ... ; Xk . Regression analysis traces the conditional distribution of Y —or
some aspect of this distribution, such as its mean—as a function of the Xs.
Fox, John. Applied regression analysis and generalized linear models. Sage publications, 2015.
Why do we need to run regression?
EXPOSURE 1 OUTCOME
(E1) (Y)
influence
EXPOSURE 2
(E2)
EXPOSURE 3
(E3) The outcome might not result from just
one single exposure, as other exposures
EXPOSURE 4
may interact and influence each other,
(E4) either enhancing or diminishing their
effect on the outcome.
Key Purposes of Regression Analysis
Prediction
Using the model to predict values of the dependent variable based on known values of
the independent variables.
Estimation
Estimating the effect of the independent variables on the dependent variable.
Explanation
Understanding the strength and direction of relationships between variables (X1 – X2).
Control
Adjusting for confounding variables to isolate the effect of the variables of interest.
Hypothesis Testing
Testing hypotheses about the relationships between variables.
Type of regression
Scatter plot:
X-axis → Independent Variable
Y-axis → Dependent Variable
What is “Linear”?
• Remember this:
• Y=mX+B?
Example:
m
A slope (m) of 2 means
that every 1-unit change in
X yields a 2-unit change in B
Y.
PREDICTION or ESTIMATION
𝐸(𝑦𝑖 /𝑥𝑖 ) : The expected value of the dependent variable (y) based on the value of 𝑥𝑖 .
𝛼 : The constant term.
𝛽 : The slope coefficient.
𝑥𝑖 : The independent variable or predictor.
Assumptions (or the fine print)
𝑛−1 𝑛
σ𝑦 σ𝑥
𝛼 = 𝑦ത − 𝛽𝑥ҧ = −𝛽
𝑛 𝑛
https://milnepublishing.geneseo.edu/natural-resources-biometrics/chapter/chapter-7-correlation-and-simple-linear-regression/
Subject 𝒙𝒊 ഥ
𝒙𝒊 − 𝒙 ഥ
𝒙𝒊 − 𝒙 𝟐 𝒚𝒊 ഥ
𝒚𝒊 − 𝒚 ഥ) × (𝒚𝒊 − 𝒚
(𝒙𝒊 − 𝒙 ഥ)
1 𝒙𝟏 ഥ
𝒙𝟏 − 𝒙 ഥ
𝒙𝟏 − 𝒙 𝟐 𝒚𝟏 ഥ
𝒚𝟏 − 𝒚 ഥ) × (𝒚𝟏 − 𝒚
(𝒙𝟏 − 𝒙 ഥ)
2 𝒙𝟐 ഥ
𝒙𝟐 − 𝒙 ഥ
𝒙𝟐 − 𝒙 𝟐 𝒚𝟐 ഥ
𝒚𝟐 − 𝒚 ഥ) × (𝒚𝟐 − 𝒚
(𝒙𝟐 − 𝒙 ഥ)
3 𝒙𝟑 ഥ
𝒙𝟑 − 𝒙 ഥ 𝟐 𝒚𝟑 ഥ
𝒚𝟑 − 𝒚 ഥ) × (𝒚𝟑 − 𝒚
(𝒙𝟑 − 𝒙 ഥ)
𝒙𝟑 − 𝒙
4 𝒙𝟒 ഥ
𝒙𝟒 − 𝒙 ഥ 𝟐 𝒚𝟒 ഥ
𝒚𝟒 − 𝒚 ഥ) × (𝒚𝟒 − 𝒚
(𝒙𝟒 − 𝒙 ഥ)
𝒙𝟒 − 𝒙
5 𝒙𝟓 ഥ
𝒙𝟓 − 𝒙 ഥ 𝟐 𝒚𝟓 ഥ
𝒚𝟓 − 𝒚 ഥ) × (𝒚𝟓 − 𝒚
(𝒙𝟓 − 𝒙 ഥ)
𝒙𝟓 − 𝒙
2
𝑥𝑖 − 𝑥ҧ 𝑥𝑖 − 𝑥ҧ × 𝑦𝑖 − 𝑦ത
σ 𝑥𝑖 − 𝑥ҧ × 𝑦𝑖 − 𝑦ത
𝐶𝑜𝑣(𝑥, 𝑦) 𝑛−1
𝛽= =
𝑉𝑎𝑟(𝑥) σ 𝑥𝑖 − 𝑥ҧ 2
𝑛−1
Subject X Y X.Y
1 X1 X1 2 Y1 Y12 XY1
. X. X. 2 Y. Y. 2 XY.
. X. X. 2 Y. Y. 2 XY.
n Xn Xn 2 Yn Yn2 XYn
(X) = … (X2) … (Y)… (Y2)… (XY) = …
σ𝑥 σ𝑦
(σ 𝑥𝑦) − 𝑛
𝛽= 2
σ 𝑥
σ 𝑥2 −
𝑛
EXAMPLE
Treatment duration
Subject Age
(in days)
1 20 5
2 30 6
3 25 5
4 35 7
5 40 8
What is the relationship between age and treatment duration?
EXAMPLE – Answer 01
Subject 𝒙𝒊 ഥ
𝒙𝒊 − 𝒙 ഥ 𝟐 𝒚𝒊 ഥ
𝒚𝒊 − 𝒚 ഥ × (𝒚𝒊 − 𝒚
𝒙𝒊 − 𝒙 ഥ)
𝒙𝒊 − 𝒙
1 20 −10 100 5 −1.2 12
2 30 0 0 6 −0.2 0
3 25 −5 25 5 −1.2 6
4 35 5 25 7 0.8 4
5 40 10 100 8 1.8 18
𝑛=5 𝑥ҧ = 30 250 𝑦ത = 6.2 40
40
5 − 1 8
40
𝛽= = ==0.16
0.16
250 50
250
5−1
EXAMPLE – Answer 02
2 2
Subject X X Y Y X.Y
1 20 400 5 25 100
2 30 900 6 36 180
3 25 625 5 25 125
4 35 1225 7 49 245
5 40 1600 8 64 320
(X) = 150 (X2) = 4750 (Y)=31 (Y2)=199 (XY) = 970
150 31
970 − 40
5
𝛽= = = 0.16
150 2 250
4750 −
5
Answer
𝛼 = 6.2 − 0.16 × 30 = 1.4 Duration of
treatment 𝑦 = 1.4 + 0.16𝑥𝑖
(days)
12,6
𝑦 = 1.4 + 0.16𝑥𝑖
3.0
Predict for those who were 10
and 70 years old! How is the
expected duration of treatment? 1.4
10 70 Age
KORELASI
Recall: Covariance
σ𝑛𝑖=1(𝑥𝑖 − 𝑋)(𝑦
ሜ 𝑖 − 𝑌)
ሜ
cov( 𝑥, 𝑦) =
𝑛−1
Interpreting Covariance
cov(X,Y) > 0 → X and Y are positively correlated
cov(X,Y) < 0 → X and Y are inversely correlated
cov(X,Y) = 0 → X and Y are independent
Correlation
Measures the relative strength of the linear relationship between two
variables
▪ Unit-less
▪ Ranges between –1 and 1
▪ The closer to –1, the stronger the negative linear relationship
▪ The closer to 1, the stronger the positive linear relationship
▪ The closer to 0, the weaker any positive linear relationship
Scatter Plots of Data with Various Correlation
Coefficients
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3 r=0
◼ Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
◼
Linear Correlation
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
◼
Linear Correlation
No relationship
X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
◼
Correlation coefficient
𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑥, 𝑦)
𝑟=
var 𝑥 × var 𝑦
Pearson’s Correlation σ𝑛𝑖=1(𝑥𝑖 − 𝑋)(𝑦
ሜ 𝑖 − 𝑌)
ሜ
Coefficient is standardized 𝑟= 𝑛−1
covariance (unitless): 2
σ 𝑥𝑖 − 𝑥ҧ 2 σ 𝑦𝑖 − 𝑦ത
×
𝑛−1 𝑛−1
σ𝑛𝑖=1(𝑥𝑖 − 𝑋)(𝑦
ሜ 𝑖 − 𝑌)
ሜ
𝑟=
2 2
σ 𝑥𝑖 − 𝑥ҧ × σ 𝑦𝑖 − 𝑦ത
INTERPRETING COEFFICIENT OF CORRELATION
σ𝑛𝑖=1(𝑥𝑖 − 𝑋)(𝑦
ሜ 𝑖 − 𝑌)
ሜ 40
𝑟= = = 0.9701425
2 2 250 × 6.8
σ 𝑥𝑖 − 𝑥ҧ × σ 𝑦𝑖 − 𝑦ത