0% found this document useful (0 votes)
11 views34 pages

14 - Regresi Dan Korelasi

The document provides an overview of regression analysis, explaining its purpose in examining relationships between dependent and independent variables. It outlines various types of regression, including linear and logistic, and discusses key concepts such as prediction, estimation, and hypothesis testing. Additionally, it covers assumptions of linear regression and introduces correlation as a measure of the strength of relationships between variables.

Uploaded by

kumpulansoal29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views34 pages

14 - Regresi Dan Korelasi

The document provides an overview of regression analysis, explaining its purpose in examining relationships between dependent and independent variables. It outlines various types of regression, including linear and logistic, and discusses key concepts such as prediction, estimation, and hypothesis testing. Additionally, it covers assumptions of linear regression and introduces correlation as a measure of the strength of relationships between variables.

Uploaded by

kumpulansoal29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

LINEAR REGRESSION

A N D C O R R E L AT I O N T E S T
B I O STAT I ST I K DA S A R
INTRODUCTION – REGRESSION TEST
What Is Regression Analysis?

A statistical method used to examine the relationship between a dependent variable (also known as the
outcome or response variable) and one or more independent variables (also known as predictors or
explanatory variables)
Darlington, Richard B., and Andrew F. Hayes. "Regression analysis and linear models." New York, NY: Guilford (2017)

Regression analysis examines the relationship between a quantitative response variable, Y , and one or
more explanatory variables, X1; ... ; Xk . Regression analysis traces the conditional distribution of Y —or
some aspect of this distribution, such as its mean—as a function of the Xs.
Fox, John. Applied regression analysis and generalized linear models. Sage publications, 2015.
Why do we need to run regression?

EXPOSURE 1 OUTCOME
(E1) (Y)
influence

EXPOSURE 2
(E2)
EXPOSURE 3
(E3) The outcome might not result from just
one single exposure, as other exposures
EXPOSURE 4
may interact and influence each other,
(E4) either enhancing or diminishing their
effect on the outcome.
Key Purposes of Regression Analysis
Prediction
Using the model to predict values of the dependent variable based on known values of
the independent variables.
Estimation
Estimating the effect of the independent variables on the dependent variable.
Explanation
Understanding the strength and direction of relationships between variables (X1 – X2).
Control
Adjusting for confounding variables to isolate the effect of the variables of interest.
Hypothesis Testing
Testing hypotheses about the relationships between variables.
Type of regression

Type of regression Outcome Distribution Model


Linear Numeric Normal (Gaussian) 𝑌 = 𝛽0 + 𝛽1𝑋1+. . . +𝛽𝑛𝑋1𝑛

Logistic Dichotomous Binomial 𝑃


𝐿𝑜𝑔 = 𝛽0 + 𝛽1𝑋1+. . . +𝛽𝑛𝑋1𝑛
1 −𝑃

Multinomial Multi categories Multinomial 𝑃(𝑌 = 𝑗)


𝐿𝑜𝑔 = 𝛽0𝑗 + 𝛽1𝑗𝑋1 + ⋯ + 𝛽𝑝𝑗𝑋𝑝
𝑃(𝑌 = 𝑗)

Poisson Numeric or dichotomous Poisson 𝑙𝑜𝑔(𝜆) = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2+. . . +𝛽𝑛𝑋𝑛

Cox (Proportional time-to-event data (time: -


ℎ(𝑡)=ℎ0(𝑡)exp(𝛽1𝑋1+𝛽2𝑋2+...+𝛽𝑛𝑋n
Hazards) numeric; event: binary)
LINEAR REGRESSION
Examples:
Is there a relationship between age and systolic blood pressure?
Is there a relationship between family income and students' Grade Point
Average (GPA)?
Is there a relationship between patient age and length of hospital stay?

Scatter plot:
X-axis → Independent Variable
Y-axis → Dependent Variable
What is “Linear”?
• Remember this:
• Y=mX+B?

Example:
m
A slope (m) of 2 means
that every 1-unit change in
X yields a 2-unit change in B
Y.
PREDICTION or ESTIMATION

If you know something about X, this knowledge helps you predict


something about Y.
(Sound familiar?…sound like conditional probabilities?)
Regression equation

𝐸(𝑦𝑖 /𝑥𝑖 ) = 𝛼 + 𝛽𝑥𝑖

𝐸(𝑦𝑖 /𝑥𝑖 ) : The expected value of the dependent variable (y) based on the value of 𝑥𝑖 .
𝛼 : The constant term.
𝛽 : The slope coefficient.
𝑥𝑖 : The independent variable or predictor.
Assumptions (or the fine print)

Linear regression assumes that…


1. The relationship between X and Y is linear
2. Y is distributed normally at each value of X
3. The variance of Y at every value of X is the
same (homogeneity of variances)
4. The observations are independent
Regression formulas
𝑦 = 𝛼 + 𝛽𝑥
σ 𝑥𝑖 − 𝑥ҧ × 𝑦𝑖 − 𝑦ത σ𝑥 σ𝑦
𝐶𝑜𝑣(𝑥, 𝑦) (σ 𝑥𝑦) −
𝑛−1 𝑛
𝛽= = =
𝑉𝑎𝑟(𝑥) σ 𝑥𝑖 − 𝑥ҧ 2 σ 𝑥2 −
σ 𝑥 2

𝑛−1 𝑛

σ𝑦 σ𝑥
𝛼 = 𝑦ത − 𝛽𝑥ҧ = −𝛽
𝑛 𝑛

https://milnepublishing.geneseo.edu/natural-resources-biometrics/chapter/chapter-7-correlation-and-simple-linear-regression/
Subject 𝒙𝒊 ഥ
𝒙𝒊 − 𝒙 ഥ
𝒙𝒊 − 𝒙 𝟐 𝒚𝒊 ഥ
𝒚𝒊 − 𝒚 ഥ) × (𝒚𝒊 − 𝒚
(𝒙𝒊 − 𝒙 ഥ)
1 𝒙𝟏 ഥ
𝒙𝟏 − 𝒙 ഥ
𝒙𝟏 − 𝒙 𝟐 𝒚𝟏 ഥ
𝒚𝟏 − 𝒚 ഥ) × (𝒚𝟏 − 𝒚
(𝒙𝟏 − 𝒙 ഥ)
2 𝒙𝟐 ഥ
𝒙𝟐 − 𝒙 ഥ
𝒙𝟐 − 𝒙 𝟐 𝒚𝟐 ഥ
𝒚𝟐 − 𝒚 ഥ) × (𝒚𝟐 − 𝒚
(𝒙𝟐 − 𝒙 ഥ)
3 𝒙𝟑 ഥ
𝒙𝟑 − 𝒙 ഥ 𝟐 𝒚𝟑 ഥ
𝒚𝟑 − 𝒚 ഥ) × (𝒚𝟑 − 𝒚
(𝒙𝟑 − 𝒙 ഥ)
𝒙𝟑 − 𝒙
4 𝒙𝟒 ഥ
𝒙𝟒 − 𝒙 ഥ 𝟐 𝒚𝟒 ഥ
𝒚𝟒 − 𝒚 ഥ) × (𝒚𝟒 − 𝒚
(𝒙𝟒 − 𝒙 ഥ)
𝒙𝟒 − 𝒙
5 𝒙𝟓 ഥ
𝒙𝟓 − 𝒙 ഥ 𝟐 𝒚𝟓 ഥ
𝒚𝟓 − 𝒚 ഥ) × (𝒚𝟓 − 𝒚
(𝒙𝟓 − 𝒙 ഥ)
𝒙𝟓 − 𝒙
2
෍ 𝑥𝑖 − 𝑥ҧ ෍ 𝑥𝑖 − 𝑥ҧ × 𝑦𝑖 − 𝑦ത

σ 𝑥𝑖 − 𝑥ҧ × 𝑦𝑖 − 𝑦ത
𝐶𝑜𝑣(𝑥, 𝑦) 𝑛−1
𝛽= =
𝑉𝑎𝑟(𝑥) σ 𝑥𝑖 − 𝑥ҧ 2
𝑛−1
Subject X Y X.Y
1 X1 X1 2 Y1 Y12 XY1
. X. X. 2 Y. Y. 2 XY.
. X. X. 2 Y. Y. 2 XY.
n Xn Xn 2 Yn Yn2 XYn
(X) = … (X2) … (Y)… (Y2)… (XY) = …

σ𝑥 σ𝑦
(σ 𝑥𝑦) − 𝑛
𝛽= 2
σ 𝑥
σ 𝑥2 −
𝑛
EXAMPLE
Treatment duration
Subject Age
(in days)
1 20 5
2 30 6
3 25 5
4 35 7
5 40 8
What is the relationship between age and treatment duration?
EXAMPLE – Answer 01
Subject 𝒙𝒊 ഥ
𝒙𝒊 − 𝒙 ഥ 𝟐 𝒚𝒊 ഥ
𝒚𝒊 − 𝒚 ഥ × (𝒚𝒊 − 𝒚
𝒙𝒊 − 𝒙 ഥ)
𝒙𝒊 − 𝒙
1 20 −10 100 5 −1.2 12
2 30 0 0 6 −0.2 0
3 25 −5 25 5 −1.2 6
4 35 5 25 7 0.8 4
5 40 10 100 8 1.8 18
𝑛=5 𝑥ҧ = 30 250 𝑦ത = 6.2 40

40
5 − 1 8
40
𝛽= = ==0.16
0.16
250 50
250
5−1
EXAMPLE – Answer 02
2 2
Subject X X Y Y X.Y
1 20 400 5 25 100
2 30 900 6 36 180
3 25 625 5 25 125
4 35 1225 7 49 245
5 40 1600 8 64 320
(X) = 150 (X2) = 4750 (Y)=31 (Y2)=199 (XY) = 970

150 31
970 − 40
5
𝛽= = = 0.16
150 2 250
4750 −
5
Answer
𝛼 = 6.2 − 0.16 × 30 = 1.4 Duration of
treatment 𝑦 = 1.4 + 0.16𝑥𝑖
(days)
12,6
𝑦 = 1.4 + 0.16𝑥𝑖

3.0
Predict for those who were 10
and 70 years old! How is the
expected duration of treatment? 1.4

10 70 Age
KORELASI
Recall: Covariance
σ𝑛𝑖=1(𝑥𝑖 − 𝑋)(𝑦
ሜ 𝑖 − 𝑌)

cov( 𝑥, 𝑦) =
𝑛−1

Interpreting Covariance
cov(X,Y) > 0 → X and Y are positively correlated
cov(X,Y) < 0 → X and Y are inversely correlated
cov(X,Y) = 0 → X and Y are independent
Correlation
Measures the relative strength of the linear relationship between two
variables
▪ Unit-less
▪ Ranges between –1 and 1
▪ The closer to –1, the stronger the negative linear relationship
▪ The closer to 1, the stronger the positive linear relationship
▪ The closer to 0, the weaker any positive linear relationship
Scatter Plots of Data with Various Correlation
Coefficients
Y Y Y

X X X
r = -1 r = -.6 r=0
Y
Y Y

X X X
r = +1 r = +.3 r=0
◼ Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Linear Correlation
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Linear Correlation
No relationship

X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Correlation coefficient

𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑥, 𝑦)
𝑟=
var 𝑥 × var 𝑦
Pearson’s Correlation σ𝑛𝑖=1(𝑥𝑖 − 𝑋)(𝑦
ሜ 𝑖 − 𝑌)

Coefficient is standardized 𝑟= 𝑛−1
covariance (unitless): 2
σ 𝑥𝑖 − 𝑥ҧ 2 σ 𝑦𝑖 − 𝑦ത
×
𝑛−1 𝑛−1
σ𝑛𝑖=1(𝑥𝑖 − 𝑋)(𝑦
ሜ 𝑖 − 𝑌)

𝑟=
2 2
σ 𝑥𝑖 − 𝑥ҧ × σ 𝑦𝑖 − 𝑦ത
INTERPRETING COEFFICIENT OF CORRELATION

Strength of the Relationship (Colton):


𝑟 = 0.00–0.25 : No relationship/weak relationship
𝑟 = 0.26–0.50 : Moderate relationship
𝑟 = 0.51–0.75 : Strong relationship
𝑟 = 0.76–1.00 : Very strong/perfect relationship

Correlation does not always imply causality.


A weak correlation does not necessarily mean there is no relationship.
A strong correlation does not always indicate a linear relationship.
DETERMINATION OF COEFFICIENT (r2)
• The R-squared coefficient (r2) can be interpreted as the proportion of
variation in the Y variable that can be explained by the X variable.
• If 100% of the variation in Y can be explained by X, it means that X
indeed plays a role in the change in the value of Y, or it can be said to
be a determinant of the Y variable.
• It can be noted that if the coefficient r=1, then r2 =100. This means
that if there is a change in the value of X, the value of Y will definitely
change.
EXAMPLE
Subject 𝒙𝒊 ഥ
𝒙𝒊 − 𝒙 ഥ 𝟐 𝒚𝒊 ഥ
𝒚𝒊 − 𝒚 ഥ 𝟐 ഥ × (𝒚𝒊 − 𝒚
𝒙𝒊 − 𝒙 ഥ)
𝒙𝒊 − 𝒙 𝒚𝒊 − 𝒚
1 20 −10 100 5 −1.2 1.44 12
2 30 0 0 6 −0.2 0.04 0
3 25 −5 25 5 −1.2 1.44 6
4 35 5 25 7 0.8 0.64 4
5 40 10 100 8 1.8 3.24 18
𝑛=5 𝑥ҧ 250 𝑦ത = 6.2 6.8 40
= 30

σ𝑛𝑖=1(𝑥𝑖 − 𝑋)(𝑦
ሜ 𝑖 − 𝑌)
ሜ 40
𝑟= = = 0.9701425
2 2 250 × 6.8
σ 𝑥𝑖 − 𝑥ҧ × σ 𝑦𝑖 − 𝑦ത

𝑟 2 = 0.972 = 0,9412 = 94,12%


A value of 𝑟=0.97 indicates a strong relationship between the
variables. The coefficient of determination (𝑟2) is 0.9409 or
94.09%, meaning that 94.09% of the variation in patient age can
be explained by the length of stay, while the remaining 5.91% is
explained by other factors outside the variable.
PRACTICE
Minggu depan review
Soal untuk Latihan akan dimasukan via Emas2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy