0% found this document useful (0 votes)
9 views73 pages

Reg Lin

Uploaded by

gaith korchid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views73 pages

Reg Lin

Uploaded by

gaith korchid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Relation between variables:

Correlation & Regression

October 2024

François Petit, Raphaël Porcher


Correlation Linear Regression Final recap

Overview
1 Correlation
Introduction
Correlation coefficients
Example
Special cases
Inference
2 Linear Regression
Introduction
Simple linear regression
Inference
Diagnostics
Multiple regression
Inference
Example
3 Final recap

2 / 73
Correlation Linear Regression Final recap

Outline

1 Correlation
Introduction
Correlation coefficients
Example
Special cases
Inference

2 Linear Regression

3 Final recap

3 / 73
Correlation Linear Regression Final recap

Introduction

Association between two continuous variables

• On sample, two measurements per subject


• Series of pairs of measurements (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn )
• Xi and Yi are continuous variables

Question we are interested in: is there a link between X and Y ?

4 / 73
Correlation Linear Regression Final recap

Introduction

Some examples

• Height and weight in children


• Percent fat mass and age
• Percent fat mass and weight
• Biological parameter and time after administration of a drug
• Biological parameter and dose of a drug administered

5 / 73
Correlation Linear Regression Final recap

Introduction

Two different situations (in theory..)

Let’s consider two similar situations:

• Situation 1: X and Y are two random variables


− Correlation

• Situation 2: Y is a random variable but X can be controlled by the


experimenter
− Regression analysis (later today..)

• .... Though, in practice there is almost no difference

6 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Pearson’s correlation coefficient

• Correlation coefficient ρ
• Quantifies the amount of (linear) association between X and Y
• ρ = ±1 if the scatterplot of Y by X shows aligned points
• ρ = 0: no linear association

Keep in mind (you will see these two notations):


• Parameter: ρ (rho)
• Estimate: r

7 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Correlations in practice

8 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Correlations in practice
• ρ = 0 does not necessarily mean no association

9 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Correlations in practice
• Correlation depends on range (be careful!)

10 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Correlations in practice

• If X and/or Y are estimated with error, the estimated correlation will be


lower than the real one

11 / 73
Correlation Linear Regression Final recap

Correlation coefficients

What should we check for in these analyses?

• Needs several assumptions


• The (Xi , Yi ) are mutually independent
• (X , Y ) is normally distributed:
− ∀X = x, Y follows a normal distribution
− ∀Y = y , X follows a normal distribution
− → not easy to ascertain in practice
• Usually, one checks that:
− Y is normally distributed with constant variance for all values of X
− The relationship between X and Y is roughly linear

12 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Interpretation: rule of thumb

Clearly, the same holds for negative values

13 / 73
Correlation Linear Regression Final recap

Example

Do it yourself: Birth weight example


library(MASS)
data("birthwt")
plot(bwt~lwt, data=birthwt, xlab="Maternal weight (lbs)", ylab="Newborn weight (g)"

14 / 73
Correlation Linear Regression Final recap

Example

Do it yourself: Birth weight example

cor.test(birthwt$lwt, birthwt$bwt)

Pearson’s product-moment correlation

data: birthwt$lwt and birthwt$bwt


t = 2.5848, df = 187, p-value = 0.0105
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.04417405 0.31998094
sample estimates:
cor
0.1857333

15 / 73
Correlation Linear Regression Final recap

Special cases

Non-Gaussian variables

• Pearson’s correlation coefficient can always be computed


• But is sensitive to extreme values (and outliers)
• The test and confidence interval assume the binormal distribution
• The test is rather robust but sometimes the assumption is not
reasonable
• Alternative: Spearman’s rank correlation coefficient

16 / 73
Correlation Linear Regression Final recap

Special cases

Spearman’s correlation coefficient

• In practice, compute Pearson’s correlation on the ranks of X and Y


instead of actual values
• Used for variables with a skewed distribution
• Less sensitive to outliers
• Similar interpretation as Pearson’s coefficient
• The tests of ρ = 0 or confidence intervals are also similar

17 / 73
Correlation Linear Regression Final recap

Special cases

Spearman’s correlation coefficient

cor.test(birthwt$lwt, birthwt$bwt, method="spearman")


cor.test(birthwt$lwt, birthwt$bwt, method="spearman", exact=TRUE)
?cor.test

Spearman’s rank correlation rho


data: birthwt$lwt and birthwt$bwt
S = 845136, p-value = 0.0005535
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.2488882

18 / 73
Correlation Linear Regression Final recap

Special cases

Pearson’s product-moment

cor.test(rank(birthwt$lwt), rank(birthwt$bwt), method="pearson")

Pearson’s product-moment correlation

data: rank(birthwt$lwt) and rank(birthwt$bwt)


t = 3.5141, df = 187, p-value = 0.0005535
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.110068 0.378184
sample estimates:
cor
0.2488882

19 / 73
Correlation Linear Regression Final recap

Special cases

Important caveats

• Association does not imply causation (no causal link)


• Be careful when estimating (and testing) many 2-by-2 correlation
coefficients (multiplicity)
• Be careful with the correlation of two variables measured at different
times
• Does not provide an adequate measure of agreement
• Do not correlate a change to the baseline value

20 / 73
Correlation Linear Regression Final recap

Special cases

Correlation vs. Causation

Have a look here:


http://tylervigen.com/spurious-correlations

21 / 73
Correlation Linear Regression Final recap

Special cases

Correlation vs. Causation

In the "mozzarella cheese" example:


• Correlation coefficient was ρ = 0.96
• Clearly, no causal link really exists!
• → be careful to what is called "spurious association"

Spurious correlation occurs when two variables are associated due to the
presence of some other unobserved factor (think of which unobserved
factor may operate in the mozzarella example)

22 / 73
Correlation Linear Regression Final recap

Inference

Time for some maths

Basic recalls:
• Distribution of a random vector (X , Y )
• Density of (X , Y ):
f (x, y )dxdy = P(x ≤ X ≤ x + dx, y ≤ Y ≤ y + dx)
• Marginal densities:
fx (x)dx = P(x ≤ X ≤ x + dx) and
fy (y )dy = P(y ≤ Y ≤ y + dy )
• Covariance:
Cov (X , Y ) = E[(X − E(X ))(Y − E(Y ))] = E(XY ) − E(X )E(Y )

23 / 73
Correlation Linear Regression Final recap

Inference

Formal definition of the correlation coefficient

Cov (X , Y )
ρ(X , Y ) = p = ρ(Y , X )
Var (X )Var (Y )

Some properties
• −1 ≤ ρ ≤ 1
• if X and Y are independent → ρ = 0
• ρ = 0 and Gaussian X and Y ⇒ independence

24 / 73
Correlation Linear Regression Final recap

Inference

Inference (estimation)

Cov (X , Y )
ρ=
σx σy
is estimated by
sxy
r=
sx sy
i.e. P
(xi − mx )(yi − my )
r = pP
(xi − mx )2 (yi − my )2
or P
xi yi − nmx my
r=
(n − 1)sx sy

25 / 73
Correlation Linear Regression Final recap

Inference

Probability distribution of r

• r can always be computed


• if X and Y are Gaussian, r is not normally distributed
• but Z = 12 ln 1−r
1+r

is approximately Gaussian
1+ρ
− with mean 12 ln 1−ρ

1
− and variance n−3

26 / 73
Correlation Linear Regression Final recap

Inference

Confidence interval

1 Construct a CI for z as [z1 ; z2 ]


zα/2 zα/2
− with z1 = z − √
n−3
and z2 = z + √
n−3
2 Construct a CI for ρ

 e2z1 − 1 e2z2 − 1 
CI1−α (ρ) = ;
e2z1 + 1 e2z2 + 1

...Luckily, R does it for us in cor.test

27 / 73
Correlation Linear Regression Final recap

Inference

Statistical testing

Suppose we want to test H0 : ρ = 0 vs. H1 : ρ ̸= 0


• We assume X and Y are Gaussian
• We use: r
n−2
tc = r ∼H0 t(n−2)
1 − r2

− which follows a Student distribution with n − 2 degrees of freedom under


H0

Then, use as usual p-values o critical values to draw conclusions about the
significance of ρ

28 / 73
Correlation Linear Regression Final recap

Inference

Recap

• Pearson’s correlation coefficient captures linear association between


two continuous variables (e.g. height and weight)
− it cannot be applied if variables are discrete
− tests linear relation
• Spearman’s rank coefficient can be used as an alternative
− can be used for ordinal variables
− tests non-linear relations
− less sensitive to outliers
• Important: correlation does not imply causation!

29 / 73
Correlation Linear Regression Final recap

Outline

1 Correlation

2 Linear Regression
Introduction
Simple linear regression
Inference
Diagnostics
Multiple regression
Inference
Example

3 Final recap

30 / 73
Correlation Linear Regression Final recap

Introduction

Association between two quantitative variables

Back to our initial question..

• One sample, two measurements on each subject

• Series of pairs of measurements (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn )

• Xi et Yi are quantitative

• Question (again): is there a link between X and Y ?

31 / 73
Correlation Linear Regression Final recap

Introduction

Two situations

• X are Y both random variables

• Y is random, but X can be controlled by the experimenter

32 / 73
Correlation Linear Regression Final recap

Simple linear regression

Regression

• The correlation coefficient does not allow predicting a value for Y


using a value of X

• It also has little sense when X is not random (e.g. fixed measurement
times in an experiment)

• Objective: predict Y using X

− No symmetry between X and Y anymore

− Y is the dependent variable and X the independent variable or predictor

33 / 73
Correlation Linear Regression Final recap

Simple linear regression

Linear regression model

• To estimate E(Y |X = x) as a function of x

• General model: E(Y |X = x) = f (x) or Y = f (X ) + ϵ


ϵ is a mean zero random variable representing the model error

• Linear model = the simplest model

Y = α + βX + ϵ
→ Estimate (and test) α and β
− β is the slope (or coefficient) of the regression line
− α is the intercept

34 / 73
Correlation Linear Regression Final recap

Inference

Inference

What is the rationale behind model inference?


Via inference, a model connects known to unknown data

Data Generation Inference

35 / 73
Correlation Linear Regression Final recap

Inference

Least-squares line

• Find the line that minimizes the distance between observations and
predictions

80 +


70 ●
+
+
60 +
+ ●
50 ● ●
Y

+ ●
40 +
● +
30 ●
++
20

30 40 50 60 70 80 90
X

• Ordinary
P least-squares: to minimize
E = (yi − α − βxi )2 (sum of the squared residuals)
(a ‘good’ line is the one that minimizes E)

36 / 73
Correlation Linear Regression Final recap

Inference

Solution

Point estimates
P
sxy (xi − mx )(yi − my )
β̂ = = P
sx2 (xi − mx )2

α̂ = my − β̂mx

Variances
 2
sy
sx2
− β̂ 2
d β̂) = s2 =
Var( β
n−2
2
Pn
sβ i=1 xi2
Var(α̂)
d = sα2 =
n

37 / 73
Correlation Linear Regression Final recap

Inference

Least-squares line (again)

38 / 73
Correlation Linear Regression Final recap

Inference

Interpretation of coefficients

• α: it can be interpreted as the mean effect on y when x = 0

• β: for a given predictor variable, it can be interpreted as the average


effect on y of a one unit increase in predictor

39 / 73
Correlation Linear Regression Final recap

Inference

Test for the slope

• H0 : β = β0 vs H1 : β ̸= β0

• Special case β0 = 0 → test of a linear relation between X and Y

• Test statistic
β̂ − β0
tb = ∼H0 t(n−2)

• Two-sided test
− compute β̂ and tb to be compared to tα/2,n−2
− if |tb | < tα/2,n−2 , do not reject H0
the slope is not significantly different from β0
− if |tb | ≥ tα/2,n−2 , reject H0
conclude that the slope is not β0

40 / 73
Correlation Linear Regression Final recap

Inference

Test for the intercept

• H0 : α = α0 vs H1 : α ̸= α0

• Special case α = 0 → the line goes trough the origin (0; 0)

• Test statistic
α̂ − α0
ta = ∼H0 t(n−2)

• If H0 : α = 0 is not rejected, it is possible to fit a new model Y = βX + ϵ


(thus constrain α = 0)

41 / 73
Correlation Linear Regression Final recap

Inference

Small example

X 23 25 36 42 50 60 68 80 85 95
Y 15 35 30 50 50 45 52 70 75 80

80 +


70 ●
+
+
60 +
+ ●
50 ● ●
Y

+ ●
40 +
● +
30 ●
++
20

30 40 50 60 70 80 90
X

42 / 73
Correlation Linear Regression Final recap

Inference

Estimates

32711−10×56.4×50.2
• β̂ = 9×25.342
= 0.761

• α̂ = 50.2 − 0.761 × 56.4 = 7.27


q 422.62
2
642.04 −0.761
• sβ = 8 = 0.0993

q
37588
• sα = 0.0993 × 10 = 6.09

0.761
• tb = 0.0993 = 7.664 ≥ t0.025,8 = 2.306 (p = 5.94 × 10−5 )

43 / 73
Correlation Linear Regression Final recap

Inference

Confidence intervals

• For the slope


CI(1−α) (β) = β̂ ± tα/2,n−2 sβ

• Similar for the intercept

• For the Y value predicted at X = x0


ŷ0 = α̂ + β̂x0 h
(x −mx )2
i
d ŷ0 ) = s2 × 1 +
Var( P0 = sy20
n (xi −mx )2
P 2
(yi −α̂−β̂xi )
where s2 = n−2
estimates the residuals’ variance
thus
CI(1−α) (y0 ) = ŷ0 ± tα/2,n−2 sy0

44 / 73
Correlation Linear Regression Final recap

Inference

Prediction intervals

• CI(1−α) (y0 ) is the CI of the prediction (average value) of Y at X = x0

• We could also look at the interval where the values of Y should lie
when X = x0

⇒ Much larger interval

(x0 − mx )2
 
2 2 1
spred =s × 1+ + P
n (xi − mx )2

IPred(1−α) (y0 ) = ŷ0 ± tα/2,n−2 spred


• PI takes into account the uncertainity in the estimates and the random
variation du to sampling (variability of a single data point)

45 / 73
Correlation Linear Regression Final recap

Inference

Example

95% confidence interval 95% prediction interval


70 70

60 60
● ●
● ● ● ●
● ● ● ●
50 ●
● 50 ●

● ●

● ● ●
● ●
● ● ● ●
40 ●●
● ● 40 ●●
● ●
● ● ● ●
y

y
● ● ● ● ● ●

● ● ● ● ● ●
● ● ● ● ●
● ●
30 ●

● ● ● 30 ●

● ● ●
● ●
● ●
● ●
20 20
● ●
10 10

0 0
5 10 15 20 25 5 10 15 20 25
x x

46 / 73
Correlation Linear Regression Final recap

Inference

Example: cystic fibrosis (mucoviscidose)

• 25 patients with cystic fibrosis

• Y: maximal expiration pressure (PEmax)

• X: weight

47 / 73
Correlation Linear Regression Final recap

Inference

Result

250
^ =63.5 (12.7)
α
^
β=1.19 (0.30)
t=3.94, ddl=23, p=0.0006

200

● ●

PEmax

150

●●

● ●

100 ● ●
● ● ● ●

● ● ● ●
● ●


50

20 30 40 50 60 70

Weight (kg)

48 / 73
Correlation Linear Regression Final recap

Inference

With R

• lm(y ∼ x)

• Preceding example
x <- c(23, 25, 36, 42, 50, 60, 68, 80, 85, 95)
y <- c(15, 35, 30, 50, 50, 45, 52, 70, 75, 80)
lm(y ∼ x)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.27142 6.08872 1.194 0.267
x 0.76114 0.09931 7.664 5.94e-05 ***

• birthwt example
lm(bwt ∼ lwt, data = birthwt)

49 / 73
Correlation Linear Regression Final recap

Inference

Summary output from R

The summary outputs shows the following main components:

1 Call: function call used to compute the regression model


2 Residuals: shows distribution of the residuals (by definition has mean
zero so, the median should not be far from zero, min and max should
be approx. equal in absolute value)
3 Coefficients: regression coefficients with their statistical significance
(predictors significantly associated to outcome y , are marked by stars)
4 Residual standard error (RSE), R-squared (R2) and the F-statistic: all
are metrics used to check how well the model fits data

50 / 73
Correlation Linear Regression Final recap

Inference

birthwt example

• lm(bwt ∼ lwt, data = birthwt)


> summary(lm(bwt ~ lwt, data=birthwt))

Call:
lm(formula = bwt ~ lwt, data = birthwt)

Residuals:
Min 1Q Median 3Q Max
-2192.12 -497.97 -3.84 508.32 2075.60

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2369.624 228.493 10.371 <2e-16 ***
lwt 4.429 1.713 2.585 0.0105 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 718.4 on 187 degrees of freedom


Multiple R-squared: 0.0345,Adjusted R-squared: 0.02933
F-statistic: 6.681 on 1 and 187 DF, p-value: 0.0105

51 / 73
Correlation Linear Regression Final recap

Inference

Example: cystic fibrosis (mucoviscidose)

• 25 patients with cystic fibrosis

• Y: maximal expiration pressure (PEmax)

• X: weight

52 / 73
Correlation Linear Regression Final recap

Diagnostics

Assumptions for tests and intervals

Assumptions :

1 Y is normally distributed for each value of X

2 The variance of Y is the same for each value of X

3 The relationship between Y and X is linear

• Transformations of X (log, exp, X 2 ) are possible to insure linearity

53 / 73
Correlation Linear Regression Final recap

Diagnostics

Illustration

54 / 73
Correlation Linear Regression Final recap

Diagnostics

Verifying assumptions

1 Plot the observations!

2 Plot the residuals ei = yi − ŷi


− Histogram
− Q-Q plot
− Residuals vs X
− Residuals vs predicted Y

3 Independence of residuals is more difficult to check (not seen here)

4 There exists a formal test for linearity (not seen here)

55 / 73
Correlation Linear Regression Final recap

Diagnostics

Residuals

• Called "ordinary" residuals

ei = yi − ŷi = yi − (α̂ + β̂xi )

• Difference between observed and predicted values

• Sometimes some use standardized and studentized residuals

56 / 73
Correlation Linear Regression Final recap

Diagnostics

Residuals vs predicted

• Residuals should have mean zero

• And the same variance whatever the predicted value

• Can detect nonlinear relationships

• May indicate the need for transformation of X

57 / 73
Correlation Linear Regression Final recap

Diagnostics

Cystic fibrosis example

100

50 ● ●

●●
Residuals

● ●
● ● ● ●
● ●

0 ● ●

● ●
● ● ●


● ●
−50

−100

80 100 120 140

Predicted PEmax

58 / 73
Correlation Linear Regression Final recap

Diagnostics

Q-Q (quantile-quantile) plot

• Normal qq-plot: allows to check the normality assumption

Normal Q−Q Plot


40
● ●

20
Sample Quantiles

●●
●●
●●

●●

0


●●


−20


● ●●


−40

● ●

−2 −1 0 1 2

Theoretical Quantiles

Scatterplot by plotting two sets of quantiles (from a standard Normal). If


both come from the same distribution, then we see a roughly straight line

59 / 73
Correlation Linear Regression Final recap

Diagnostics

With R: birthwt data

• f1 <- lm(bwt ∼ lwt, data=birthwt)

• Residuals vs fitted
plot(predict(f1),resid(f1), xlab="Predicted newborn
weight (kg)",ylab="Residuals (kg)")

• Q-Q plot
qqnorm(resid(f1))
qqline(resid(f1))

60 / 73
Correlation Linear Regression Final recap

Diagnostics

Correlation vs regression

• Both are linked mathematically


sx sy
r = β̂ ⇔ β̂ = r
sy sx

• The test for ρ = 0 is exactly the same as the one for β = 0

• But underlying assumptions can be somewhat different . . .


− Are X and Y two random variables?
− Are (X , Y ) normally distributed or is it simply Y for any value of X ?

61 / 73
Correlation Linear Regression Final recap

Diagnostics

Be careful with extrapolation!

80 80 ●

60 60 ●●

●●
● ●

● ●
40 40
Y

Y
● ●

● ●
● ● ●
● ● ●●
● ●
● ●

● ●
● ●

20 ● ●●●
● ●

● ●
20 ●
● ●●
●●

●●
●●
●● ●● ● ●● ●●
● ● ● ●● ●● ● ●●
● ●●
● ●● ●
● ● ●●

0 0
18 19 20 21 22 23 24 5 10 15 20 25
X X

62 / 73
Correlation Linear Regression Final recap

Diagnostics

With a binary covariate

• Since we do not have assumption on the distribution of X , why not a


binary variable?

• Same model: Y = α + βX + ϵ

• With X ∈ {0, 1}

• β represents the increase in E(Y |X = x) when x changes from 0 to 1

• β = E(Y |X = 1) − E(Y |X = 0): this is the mean difference!

63 / 73
Correlation Linear Regression Final recap

Diagnostics

Link with a t-test

• β̂ is exactly the observed MD

• The test of β = 0 is the same as the t-test (the original one, not
Welsh’s)

• More generally, when X is categorical, the model is called ANOVA

64 / 73
Correlation Linear Regression Final recap

Diagnostics

Illustration with birthwt

> summary(lm(formula = bwt ~ ui, data = birthwt))


Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3030.70 55.25 54.852 < 2e-16 ***
ui -581.27 143.55 -4.049 7.52e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 701.1 on 187 degrees of freedom


Multiple R-squared: 0.08061,Adjusted R-squared: 0.0757
F-statistic: 16.4 on 1 and 187 DF, p-value: 7.518e-05

> t.test(bwt ~ ui, data = birthwt, var.equal=T)


Two Sample t-test
data: bwt by ui
t = 4.0493, df = 187, p-value = 7.518e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
298.0892 864.4574

65 / 73
Correlation Linear Regression Final recap

Multiple regression

Multiple linear regression model

• Extension of the (simple) linear regression model

• We still want to estimate E(Y |X = x) as a function of x, but X is now a


vector (X1 , X2 , . . . , Xp )

• So we relate Y to several covariates, or characteristics: e.g. age,


weight, smoking status, . . .

• Model: Y = β0 + β1 X1 + β2 X2 + . . . + βp Xp + ϵ

• Same assumptions as before + additive contribution to the linear


predictor

66 / 73
Correlation Linear Regression Final recap

Inference

Inference

• Still based on least-squares

• Closed-form solutions by matrix inversion (not given here)

• At the end, we still get point estimates β̂k ’s and their variance / SE

67 / 73
Correlation Linear Regression Final recap

Example

birthwt example with 1 variable

• lm(bwt ∼ lwt, data = birthwt)


> summary(lm(bwt ~ lwt, data=birthwt))

Call:
lm(formula = bwt ~ lwt, data = birthwt)

Residuals:
Min 1Q Median 3Q Max
-2192.12 -497.97 -3.84 508.32 2075.60

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2369.624 228.493 10.371 <2e-16 ***
lwt 4.429 1.713 2.585 0.0105 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 718.4 on 187 degrees of freedom


Multiple R-squared: 0.0345,Adjusted R-squared: 0.02933
F-statistic: 6.681 on 1 and 187 DF, p-value: 0.0105

68 / 73
Correlation Linear Regression Final recap

Example

Interpretation of coefficients

• α: it can be interpreted as the mean for the outcome y when all of the
predictors take on the value of 0

• β: for a given predictor variable, it can be interpreted as the average


effect on outcome y of a one unit increase in predictor, when
holding all other predictors fixed

69 / 73
Correlation Linear Regression Final recap

Example

birthwt example with 2 variables

• lm(bwt ∼ age + lwt, data = birthwt)


Call:
lm(formula = bwt ~ age + lwt, data = birthwt)

Residuals:
Min 1Q Median 3Q Max
-2233.11 -499.33 9.44 520.48 1897.84

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2214.412 299.311 7.398 4.59e-12 ***
age 8.089 10.063 0.804 0.4225
lwt 4.177 1.744 2.395 0.0176 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 719.1 on 186 degrees of freedom


Multiple R-squared: 0.03784,Adjusted R-squared: 0.02749
F-statistic: 3.657 on 2 and 186 DF, p-value: 0.02767

70 / 73
Correlation Linear Regression Final recap

Example

birthwt example with 4 variables


• lm(bwt ∼ age + lwt + smoke + ht + ui + I(ptl > 0) ,
data = birthwt)
Call:
lm(formula = bwt ~ age + lwt + smoke + ht + ui + I(ptl > 0),
data = birthwt)

Residuals:
Min 1Q Median 3Q Max
-1696.93 -481.80 -19.06 447.69 1702.05

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2515.511 289.843 8.679 2.22e-15 ***
age 6.190 9.467 0.654 0.514002
lwt 4.015 1.692 2.373 0.018685 *
smoke -206.773 101.398 -2.039 0.042874 *
ht -623.449 206.135 -3.024 0.002851 **
ui -500.843 141.189 -3.547 0.000495 ***
I(ptl > 0)TRUE -260.002 139.369 -1.866 0.063711 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 666.9 on 182 degrees of freedom


Multiple R-squared: 0.1902,Adjusted R-squared: 0.1635
F-statistic: 7.126 on 6 and 182 DF, p-value: 7.775e-07

71 / 73
Correlation Linear Regression Final recap

Outline

1 Correlation

2 Linear Regression

3 Final recap

72 / 73
Correlation Linear Regression Final recap

Summing Up

Main topic covered today : association between variables

1 Correlation (two continuous variables)


2 Regression (Y continuous, X quantitative)
− Simple linear regression
− Multiple linear regression

The case of Y binary (logistic regression) will be covered in the next


lesson..

73 / 73

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy