0% found this document useful (0 votes)

9 views73 pages

Reg Lin

Uploaded by

gaith korchid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views73 pages

Reg Lin

Uploaded by

gaith korchid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

Relation between variables:

Correlation & Regression

October 2024

François Petit, Raphaël Porcher

Correlation Linear Regression Final recap

Overview
1 Correlation
Introduction
Correlation coefficients
Example
Special cases
Inference
2 Linear Regression
Introduction
Simple linear regression
Inference
Diagnostics
Multiple regression
Inference
Example
3 Final recap

2 / 73
Correlation Linear Regression Final recap

Outline

1 Correlation
Introduction
Correlation coefficients
Example
Special cases
Inference

2 Linear Regression

3 Final recap

3 / 73
Correlation Linear Regression Final recap

Introduction

Association between two continuous variables

• On sample, two measurements per subject

• Series of pairs of measurements (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn )
• Xi and Yi are continuous variables

Question we are interested in: is there a link between X and Y ?

4 / 73
Correlation Linear Regression Final recap

Introduction

Some examples

• Height and weight in children

• Percent fat mass and age
• Percent fat mass and weight
• Biological parameter and time after administration of a drug
• Biological parameter and dose of a drug administered

5 / 73
Correlation Linear Regression Final recap

Introduction

Two different situations (in theory..)

Let’s consider two similar situations:

• Situation 1: X and Y are two random variables

− Correlation

• Situation 2: Y is a random variable but X can be controlled by the

experimenter
− Regression analysis (later today..)

• .... Though, in practice there is almost no difference

6 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Pearson’s correlation coefficient

• Correlation coefficient ρ
• Quantifies the amount of (linear) association between X and Y
• ρ = ±1 if the scatterplot of Y by X shows aligned points
• ρ = 0: no linear association

Keep in mind (you will see these two notations):

• Parameter: ρ (rho)
• Estimate: r

7 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Correlations in practice

8 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Correlations in practice
• ρ = 0 does not necessarily mean no association

9 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Correlations in practice
• Correlation depends on range (be careful!)

10 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Correlations in practice

• If X and/or Y are estimated with error, the estimated correlation will be

lower than the real one

11 / 73
Correlation Linear Regression Final recap

Correlation coefficients

What should we check for in these analyses?

• Needs several assumptions

• The (Xi , Yi ) are mutually independent
• (X , Y ) is normally distributed:
− ∀X = x, Y follows a normal distribution
− ∀Y = y , X follows a normal distribution
− → not easy to ascertain in practice
• Usually, one checks that:
− Y is normally distributed with constant variance for all values of X
− The relationship between X and Y is roughly linear

12 / 73
Correlation Linear Regression Final recap

Correlation coefficients

Interpretation: rule of thumb

Clearly, the same holds for negative values

13 / 73
Correlation Linear Regression Final recap

Example

Do it yourself: Birth weight example

library(MASS)
data("birthwt")
plot(bwt~lwt, data=birthwt, xlab="Maternal weight (lbs)", ylab="Newborn weight (g)"

14 / 73
Correlation Linear Regression Final recap

Example

Do it yourself: Birth weight example

cor.test(birthwt$lwt, birthwt$bwt)

Pearson’s product-moment correlation

data: birthwt$lwt and birthwt$bwt

t = 2.5848, df = 187, p-value = 0.0105
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.04417405 0.31998094
sample estimates:
cor
0.1857333

15 / 73
Correlation Linear Regression Final recap

Special cases

Non-Gaussian variables

• Pearson’s correlation coefficient can always be computed

• But is sensitive to extreme values (and outliers)
• The test and confidence interval assume the binormal distribution
• The test is rather robust but sometimes the assumption is not
reasonable
• Alternative: Spearman’s rank correlation coefficient

16 / 73
Correlation Linear Regression Final recap

Special cases

Spearman’s correlation coefficient

• In practice, compute Pearson’s correlation on the ranks of X and Y

instead of actual values
• Used for variables with a skewed distribution
• Less sensitive to outliers
• Similar interpretation as Pearson’s coefficient
• The tests of ρ = 0 or confidence intervals are also similar

17 / 73
Correlation Linear Regression Final recap

Special cases

Spearman’s correlation coefficient

cor.test(birthwt$lwt, birthwt$bwt, method="spearman")

cor.test(birthwt$lwt, birthwt$bwt, method="spearman", exact=TRUE)
?cor.test

Spearman’s rank correlation rho

data: birthwt$lwt and birthwt$bwt
S = 845136, p-value = 0.0005535
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.2488882

18 / 73
Correlation Linear Regression Final recap

Special cases

Pearson’s product-moment

cor.test(rank(birthwt$lwt), rank(birthwt$bwt), method="pearson")

Pearson’s product-moment correlation

data: rank(birthwt$lwt) and rank(birthwt$bwt)

t = 3.5141, df = 187, p-value = 0.0005535
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.110068 0.378184
sample estimates:
cor
0.2488882

19 / 73
Correlation Linear Regression Final recap

Special cases

Important caveats

• Association does not imply causation (no causal link)

• Be careful when estimating (and testing) many 2-by-2 correlation
coefficients (multiplicity)
• Be careful with the correlation of two variables measured at different
times
• Does not provide an adequate measure of agreement
• Do not correlate a change to the baseline value

20 / 73
Correlation Linear Regression Final recap

Special cases

Correlation vs. Causation

Have a look here:

http://tylervigen.com/spurious-correlations

21 / 73
Correlation Linear Regression Final recap

Special cases

Correlation vs. Causation

In the "mozzarella cheese" example:

• Correlation coefficient was ρ = 0.96
• Clearly, no causal link really exists!
• → be careful to what is called "spurious association"
⇓
Spurious correlation occurs when two variables are associated due to the
presence of some other unobserved factor (think of which unobserved
factor may operate in the mozzarella example)

22 / 73
Correlation Linear Regression Final recap

Inference

Time for some maths

Basic recalls:
• Distribution of a random vector (X , Y )
• Density of (X , Y ):
f (x, y )dxdy = P(x ≤ X ≤ x + dx, y ≤ Y ≤ y + dx)
• Marginal densities:
fx (x)dx = P(x ≤ X ≤ x + dx) and
fy (y )dy = P(y ≤ Y ≤ y + dy )
• Covariance:
Cov (X , Y ) = E[(X − E(X ))(Y − E(Y ))] = E(XY ) − E(X )E(Y )

23 / 73
Correlation Linear Regression Final recap

Inference

Formal definition of the correlation coefficient

Cov (X , Y )
ρ(X , Y ) = p = ρ(Y , X )
Var (X )Var (Y )

Some properties
• −1 ≤ ρ ≤ 1
• if X and Y are independent → ρ = 0
• ρ = 0 and Gaussian X and Y ⇒ independence

24 / 73
Correlation Linear Regression Final recap

Inference

Inference (estimation)

Cov (X , Y )
ρ=
σx σy
is estimated by
sxy
r=
sx sy
i.e. P
(xi − mx )(yi − my )
r = pP
(xi − mx )2 (yi − my )2
or P
xi yi − nmx my
r=
(n − 1)sx sy

25 / 73
Correlation Linear Regression Final recap

Inference

Probability distribution of r

• r can always be computed

• if X and Y are Gaussian, r is not normally distributed
• but Z = 12 ln 1−r
1+r

is approximately Gaussian
1+ρ
− with mean 12 ln 1−ρ

1
− and variance n−3

26 / 73
Correlation Linear Regression Final recap

Inference

Confidence interval

1 Construct a CI for z as [z1 ; z2 ]

zα/2 zα/2
− with z1 = z − √
n−3
and z2 = z + √
n−3
2 Construct a CI for ρ

e2z1 − 1 e2z2 − 1
CI1−α (ρ) = ;
e2z1 + 1 e2z2 + 1

...Luckily, R does it for us in cor.test

27 / 73
Correlation Linear Regression Final recap

Inference

Statistical testing

Suppose we want to test H0 : ρ = 0 vs. H1 : ρ ̸= 0

• We assume X and Y are Gaussian
• We use: r
n−2
tc = r ∼H0 t(n−2)
1 − r2

− which follows a Student distribution with n − 2 degrees of freedom under

Then, use as usual p-values o critical values to draw conclusions about the
significance of ρ

28 / 73
Correlation Linear Regression Final recap

Inference

Recap

• Pearson’s correlation coefficient captures linear association between

two continuous variables (e.g. height and weight)
− it cannot be applied if variables are discrete
− tests linear relation
• Spearman’s rank coefficient can be used as an alternative
− can be used for ordinal variables
− tests non-linear relations
− less sensitive to outliers
• Important: correlation does not imply causation!

29 / 73
Correlation Linear Regression Final recap

Outline

1 Correlation

2 Linear Regression
Introduction
Simple linear regression
Inference
Diagnostics
Multiple regression
Inference
Example

3 Final recap

30 / 73
Correlation Linear Regression Final recap

Introduction

Association between two quantitative variables

Back to our initial question..

• One sample, two measurements on each subject

• Series of pairs of measurements (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn )

• Xi et Yi are quantitative

• Question (again): is there a link between X and Y ?

31 / 73
Correlation Linear Regression Final recap

Introduction

Two situations

• X are Y both random variables

• Y is random, but X can be controlled by the experimenter

32 / 73
Correlation Linear Regression Final recap

Simple linear regression

Regression

• The correlation coefficient does not allow predicting a value for Y

using a value of X

• It also has little sense when X is not random (e.g. fixed measurement
times in an experiment)

• Objective: predict Y using X

− No symmetry between X and Y anymore

− Y is the dependent variable and X the independent variable or predictor

33 / 73
Correlation Linear Regression Final recap

Simple linear regression

Linear regression model

• To estimate E(Y |X = x) as a function of x

• General model: E(Y |X = x) = f (x) or Y = f (X ) + ϵ

ϵ is a mean zero random variable representing the model error

• Linear model = the simplest model

Y = α + βX + ϵ
→ Estimate (and test) α and β
− β is the slope (or coefficient) of the regression line
− α is the intercept

34 / 73
Correlation Linear Regression Final recap

Inference

What is the rationale behind model inference?

Via inference, a model connects known to unknown data

Data Generation Inference

35 / 73
Correlation Linear Regression Final recap

Inference

Least-squares line

• Find the line that minimizes the distance between observations and
predictions

80 +
●
●
70 ●
+
+
60 +
+ ●
50 ● ●
Y

+ ●
40 +
● +
30 ●
++
20
●

30 40 50 60 70 80 90
X

• Ordinary
P least-squares: to minimize
E = (yi − α − βxi )2 (sum of the squared residuals)
(a ‘good’ line is the one that minimizes E)

36 / 73
Correlation Linear Regression Final recap

Inference

Solution

Point estimates
P
sxy (xi − mx )(yi − my )
β̂ = = P
sx2 (xi − mx )2

α̂ = my − β̂mx

Variances
2
sy
sx2
− β̂ 2
d β̂) = s2 =
Var( β
n−2
2
Pn
sβ i=1 xi2
Var(α̂)
d = sα2 =
n

37 / 73
Correlation Linear Regression Final recap

Inference

Least-squares line (again)

38 / 73
Correlation Linear Regression Final recap

Inference

Interpretation of coefficients

• α: it can be interpreted as the mean effect on y when x = 0

• β: for a given predictor variable, it can be interpreted as the average

effect on y of a one unit increase in predictor

39 / 73
Correlation Linear Regression Final recap

Inference

Test for the slope

• H0 : β = β0 vs H1 : β ̸= β0

• Special case β0 = 0 → test of a linear relation between X and Y

• Test statistic
β̂ − β0
tb = ∼H0 t(n−2)
sβ

• Two-sided test
− compute β̂ and tb to be compared to tα/2,n−2
− if |tb | < tα/2,n−2 , do not reject H0
the slope is not significantly different from β0
− if |tb | ≥ tα/2,n−2 , reject H0
conclude that the slope is not β0

40 / 73
Correlation Linear Regression Final recap

Inference

Test for the intercept

• H0 : α = α0 vs H1 : α ̸= α0

• Special case α = 0 → the line goes trough the origin (0; 0)

• Test statistic
α̂ − α0
ta = ∼H0 t(n−2)
sα

• If H0 : α = 0 is not rejected, it is possible to fit a new model Y = βX + ϵ

(thus constrain α = 0)

41 / 73
Correlation Linear Regression Final recap

Inference

Small example

X 23 25 36 42 50 60 68 80 85 95
Y 15 35 30 50 50 45 52 70 75 80

80 +
●
●
70 ●
+
+
60 +
+ ●
50 ● ●
Y

+ ●
40 +
● +
30 ●
++
20
●

30 40 50 60 70 80 90
X

42 / 73
Correlation Linear Regression Final recap

Inference

Estimates

32711−10×56.4×50.2
• β̂ = 9×25.342
= 0.761

• α̂ = 50.2 − 0.761 × 56.4 = 7.27

q 422.62
2
642.04 −0.761
• sβ = 8 = 0.0993

q
37588
• sα = 0.0993 × 10 = 6.09

0.761
• tb = 0.0993 = 7.664 ≥ t0.025,8 = 2.306 (p = 5.94 × 10−5 )

43 / 73
Correlation Linear Regression Final recap

Inference

Confidence intervals

• For the slope

CI(1−α) (β) = β̂ ± tα/2,n−2 sβ

• Similar for the intercept

• For the Y value predicted at X = x0

ŷ0 = α̂ + β̂x0 h
(x −mx )2
i
d ŷ0 ) = s2 × 1 +
Var( P0 = sy20
n (xi −mx )2
P 2
(yi −α̂−β̂xi )
where s2 = n−2
estimates the residuals’ variance
thus
CI(1−α) (y0 ) = ŷ0 ± tα/2,n−2 sy0

44 / 73
Correlation Linear Regression Final recap

Inference

Prediction intervals

• CI(1−α) (y0 ) is the CI of the prediction (average value) of Y at X = x0

• We could also look at the interval where the values of Y should lie
when X = x0

⇒ Much larger interval

(x0 − mx )2

2 2 1
spred =s × 1+ + P
n (xi − mx )2

IPred(1−α) (y0 ) = ŷ0 ± tα/2,n−2 spred

• PI takes into account the uncertainity in the estimates and the random
variation du to sampling (variability of a single data point)

45 / 73
Correlation Linear Regression Final recap

Inference

Example

95% confidence interval 95% prediction interval

70 70

60 60
● ●
● ● ● ●
● ● ● ●
50 ●
● 50 ●
●
● ●
●
● ● ●
● ●
● ● ● ●
40 ●●
● ● 40 ●●
● ●
● ● ● ●
y

y
● ● ● ● ● ●
●
● ● ● ● ● ●
● ● ● ● ●
● ●
30 ●
●
● ● ● 30 ●
●
● ● ●
● ●
● ●
● ●
20 20
● ●
10 10

0 0
5 10 15 20 25 5 10 15 20 25
x x

46 / 73
Correlation Linear Regression Final recap

Inference

Example: cystic fibrosis (mucoviscidose)

• 25 patients with cystic fibrosis

• Y: maximal expiration pressure (PEmax)

• X: weight

47 / 73
Correlation Linear Regression Final recap

Inference

Result

250
^ =63.5 (12.7)
α
^
β=1.19 (0.30)
t=3.94, ddl=23, p=0.0006

200
●

● ●
●
PEmax

150

●●
●

● ●

100 ● ●
● ● ● ●
●
● ● ● ●
● ●

●
●

20 30 40 50 60 70

Weight (kg)

48 / 73
Correlation Linear Regression Final recap

Inference

With R

• lm(y ∼ x)

• Preceding example
x <- c(23, 25, 36, 42, 50, 60, 68, 80, 85, 95)
y <- c(15, 35, 30, 50, 50, 45, 52, 70, 75, 80)
lm(y ∼ x)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.27142 6.08872 1.194 0.267
x 0.76114 0.09931 7.664 5.94e-05 ***

• birthwt example
lm(bwt ∼ lwt, data = birthwt)

49 / 73
Correlation Linear Regression Final recap

Inference

Summary output from R

The summary outputs shows the following main components:

1 Call: function call used to compute the regression model

2 Residuals: shows distribution of the residuals (by definition has mean
zero so, the median should not be far from zero, min and max should
be approx. equal in absolute value)
3 Coefficients: regression coefficients with their statistical significance
(predictors significantly associated to outcome y , are marked by stars)
4 Residual standard error (RSE), R-squared (R2) and the F-statistic: all
are metrics used to check how well the model fits data

50 / 73
Correlation Linear Regression Final recap

Inference

birthwt example

• lm(bwt ∼ lwt, data = birthwt)

> summary(lm(bwt ~ lwt, data=birthwt))

Call:
lm(formula = bwt ~ lwt, data = birthwt)

Residuals:
Min 1Q Median 3Q Max
-2192.12 -497.97 -3.84 508.32 2075.60

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2369.624 228.493 10.371 <2e-16 ***
lwt 4.429 1.713 2.585 0.0105 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 718.4 on 187 degrees of freedom

Multiple R-squared: 0.0345,Adjusted R-squared: 0.02933
F-statistic: 6.681 on 1 and 187 DF, p-value: 0.0105

51 / 73
Correlation Linear Regression Final recap

Inference

Example: cystic fibrosis (mucoviscidose)

• 25 patients with cystic fibrosis

• Y: maximal expiration pressure (PEmax)

• X: weight

52 / 73
Correlation Linear Regression Final recap

Diagnostics

Assumptions for tests and intervals

Assumptions :

1 Y is normally distributed for each value of X

2 The variance of Y is the same for each value of X

3 The relationship between Y and X is linear

• Transformations of X (log, exp, X 2 ) are possible to insure linearity

53 / 73
Correlation Linear Regression Final recap

Diagnostics

Illustration

54 / 73
Correlation Linear Regression Final recap

Diagnostics

Verifying assumptions

1 Plot the observations!

2 Plot the residuals ei = yi − ŷi

− Histogram
− Q-Q plot
− Residuals vs X
− Residuals vs predicted Y

3 Independence of residuals is more difficult to check (not seen here)

4 There exists a formal test for linearity (not seen here)

55 / 73
Correlation Linear Regression Final recap

Diagnostics

Residuals

• Called "ordinary" residuals

ei = yi − ŷi = yi − (α̂ + β̂xi )

• Difference between observed and predicted values

• Sometimes some use standardized and studentized residuals

56 / 73
Correlation Linear Regression Final recap

Diagnostics

Residuals vs predicted

• Residuals should have mean zero

• And the same variance whatever the predicted value

• Can detect nonlinear relationships

• May indicate the need for transformation of X

57 / 73
Correlation Linear Regression Final recap

Diagnostics

Cystic fibrosis example

100

50 ● ●

●●
Residuals

● ●
● ● ● ●
● ●
●
0 ● ●
●

● ●
● ● ●
●
●
● ●
−50

−100

80 100 120 140

Predicted PEmax

58 / 73
Correlation Linear Regression Final recap

Diagnostics

Q-Q (quantile-quantile) plot

• Normal qq-plot: allows to check the normality assumption

Normal Q−Q Plot

●
●

40
● ●

20
Sample Quantiles

●●
●●
●●

●●
●
0

●
●●

●
−20

●
● ●●
●
●
−40

● ●

−2 −1 0 1 2

Theoretical Quantiles

Scatterplot by plotting two sets of quantiles (from a standard Normal). If

both come from the same distribution, then we see a roughly straight line

59 / 73
Correlation Linear Regression Final recap

Diagnostics

With R: birthwt data

• f1 <- lm(bwt ∼ lwt, data=birthwt)

• Residuals vs fitted
plot(predict(f1),resid(f1), xlab="Predicted newborn
weight (kg)",ylab="Residuals (kg)")

• Q-Q plot
qqnorm(resid(f1))
qqline(resid(f1))

60 / 73
Correlation Linear Regression Final recap

Diagnostics

Correlation vs regression

• Both are linked mathematically

sx sy
r = β̂ ⇔ β̂ = r
sy sx

• The test for ρ = 0 is exactly the same as the one for β = 0

• But underlying assumptions can be somewhat different . . .

− Are X and Y two random variables?
− Are (X , Y ) normally distributed or is it simply Y for any value of X ?

61 / 73
Correlation Linear Regression Final recap

Diagnostics

Be careful with extrapolation!

80 80 ●

60 60 ●●
●

●●
● ●
●
● ●
40 40
Y

Y
● ●

● ●
● ● ●
● ● ●●
● ●
● ●
●
● ●
● ●
●
20 ● ●●●
● ●
●
● ●
20 ●
● ●●
●●
●
●●
●●
●● ●● ● ●● ●●
● ● ● ●● ●● ● ●●
● ●●
● ●● ●
● ● ●●
●

0 0
18 19 20 21 22 23 24 5 10 15 20 25
X X

62 / 73
Correlation Linear Regression Final recap

Diagnostics

With a binary covariate

• Since we do not have assumption on the distribution of X , why not a

binary variable?

• Same model: Y = α + βX + ϵ

• With X ∈ {0, 1}

• β represents the increase in E(Y |X = x) when x changes from 0 to 1

• β = E(Y |X = 1) − E(Y |X = 0): this is the mean difference!

63 / 73
Correlation Linear Regression Final recap

Diagnostics

Link with a t-test

• β̂ is exactly the observed MD

• The test of β = 0 is the same as the t-test (the original one, not
Welsh’s)

• More generally, when X is categorical, the model is called ANOVA

64 / 73
Correlation Linear Regression Final recap

Diagnostics

Illustration with birthwt

> summary(lm(formula = bwt ~ ui, data = birthwt))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3030.70 55.25 54.852 < 2e-16 ***
ui -581.27 143.55 -4.049 7.52e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 701.1 on 187 degrees of freedom

Multiple R-squared: 0.08061,Adjusted R-squared: 0.0757
F-statistic: 16.4 on 1 and 187 DF, p-value: 7.518e-05

> t.test(bwt ~ ui, data = birthwt, var.equal=T)

Two Sample t-test
data: bwt by ui
t = 4.0493, df = 187, p-value = 7.518e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
298.0892 864.4574

65 / 73
Correlation Linear Regression Final recap

Multiple regression

Multiple linear regression model

• Extension of the (simple) linear regression model

• We still want to estimate E(Y |X = x) as a function of x, but X is now a

vector (X1 , X2 , . . . , Xp )

• So we relate Y to several covariates, or characteristics: e.g. age,

weight, smoking status, . . .

• Model: Y = β0 + β1 X1 + β2 X2 + . . . + βp Xp + ϵ

• Same assumptions as before + additive contribution to the linear

predictor

66 / 73
Correlation Linear Regression Final recap

Inference

• Still based on least-squares

• Closed-form solutions by matrix inversion (not given here)

• At the end, we still get point estimates β̂k ’s and their variance / SE

67 / 73
Correlation Linear Regression Final recap

Example

birthwt example with 1 variable

• lm(bwt ∼ lwt, data = birthwt)

> summary(lm(bwt ~ lwt, data=birthwt))

Call:
lm(formula = bwt ~ lwt, data = birthwt)

Residuals:
Min 1Q Median 3Q Max
-2192.12 -497.97 -3.84 508.32 2075.60

Residual standard error: 718.4 on 187 degrees of freedom

Multiple R-squared: 0.0345,Adjusted R-squared: 0.02933
F-statistic: 6.681 on 1 and 187 DF, p-value: 0.0105

68 / 73
Correlation Linear Regression Final recap

Example

Interpretation of coefficients

• α: it can be interpreted as the mean for the outcome y when all of the
predictors take on the value of 0

• β: for a given predictor variable, it can be interpreted as the average

effect on outcome y of a one unit increase in predictor, when
holding all other predictors fixed

69 / 73
Correlation Linear Regression Final recap

Example

birthwt example with 2 variables

• lm(bwt ∼ age + lwt, data = birthwt)

Call:
lm(formula = bwt ~ age + lwt, data = birthwt)

Residuals:
Min 1Q Median 3Q Max
-2233.11 -499.33 9.44 520.48 1897.84

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2214.412 299.311 7.398 4.59e-12 ***
age 8.089 10.063 0.804 0.4225
lwt 4.177 1.744 2.395 0.0176 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 719.1 on 186 degrees of freedom

Multiple R-squared: 0.03784,Adjusted R-squared: 0.02749
F-statistic: 3.657 on 2 and 186 DF, p-value: 0.02767

70 / 73
Correlation Linear Regression Final recap

Example

birthwt example with 4 variables

• lm(bwt ∼ age + lwt + smoke + ht + ui + I(ptl > 0) ,
data = birthwt)
Call:
lm(formula = bwt ~ age + lwt + smoke + ht + ui + I(ptl > 0),
data = birthwt)

Residuals:
Min 1Q Median 3Q Max
-1696.93 -481.80 -19.06 447.69 1702.05

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2515.511 289.843 8.679 2.22e-15 ***
age 6.190 9.467 0.654 0.514002
lwt 4.015 1.692 2.373 0.018685 *
smoke -206.773 101.398 -2.039 0.042874 *
ht -623.449 206.135 -3.024 0.002851 **
ui -500.843 141.189 -3.547 0.000495 ***
I(ptl > 0)TRUE -260.002 139.369 -1.866 0.063711 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 666.9 on 182 degrees of freedom

Multiple R-squared: 0.1902,Adjusted R-squared: 0.1635
F-statistic: 7.126 on 6 and 182 DF, p-value: 7.775e-07

71 / 73
Correlation Linear Regression Final recap

Outline

1 Correlation

2 Linear Regression

3 Final recap

72 / 73
Correlation Linear Regression Final recap

Summing Up

Main topic covered today : association between variables

1 Correlation (two continuous variables)

2 Regression (Y continuous, X quantitative)
− Simple linear regression
− Multiple linear regression

The case of Y binary (logistic regression) will be covered in the next

lesson..

73 / 73

MGMT628 Final-Term Short Notes by MƦ Sɦǟɦɮǟʐ
No ratings yet
MGMT628 Final-Term Short Notes by MƦ Sɦǟɦɮǟʐ
205 pages
Datpus TM 7-14 - Compressed
No ratings yet
Datpus TM 7-14 - Compressed
271 pages
(2015) University Choice What Do We Know, What Don't We Know and What Do We Still - Hemsley-Brown
No ratings yet
(2015) University Choice What Do We Know, What Don't We Know and What Do We Still - Hemsley-Brown
37 pages
9MA0 03 Sept2017 Sample Paper Statistics and Mechanics
No ratings yet
9MA0 03 Sept2017 Sample Paper Statistics and Mechanics
40 pages
Lecture 7
No ratings yet
Lecture 7
65 pages
SPSS Cheat Sheet Final
No ratings yet
SPSS Cheat Sheet Final
2 pages
生物统计方法与应用9-Regression and Correlation
No ratings yet
生物统计方法与应用9-Regression and Correlation
42 pages
Meeting 10 - Statistics 2025
No ratings yet
Meeting 10 - Statistics 2025
34 pages
Week 8 2025 - Correlation and Regression
No ratings yet
Week 8 2025 - Correlation and Regression
47 pages
Correlation
No ratings yet
Correlation
35 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
PS - Module 3 - ViRa
No ratings yet
PS - Module 3 - ViRa
104 pages
L6 - Biostatistics - Linear Regression and Correlation
No ratings yet
L6 - Biostatistics - Linear Regression and Correlation
121 pages
Bio 6
No ratings yet
Bio 6
26 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
77 pages
06 Correlation and Regression
No ratings yet
06 Correlation and Regression
63 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
37 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
Research 1-3
No ratings yet
Research 1-3
28 pages
Correlation and Regression
No ratings yet
Correlation and Regression
20 pages
Captura de Ecrã 2024-10-16 À(s) 13.04.06
No ratings yet
Captura de Ecrã 2024-10-16 À(s) 13.04.06
38 pages
009 Vaibhavi Avsarmal Marketing
No ratings yet
009 Vaibhavi Avsarmal Marketing
62 pages
Notes - Correlation and Regression
No ratings yet
Notes - Correlation and Regression
26 pages
Correlation
No ratings yet
Correlation
20 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Linear Regression With R
No ratings yet
Linear Regression With R
45 pages
BStats 2
No ratings yet
BStats 2
66 pages
Customer Satisfaction With Clinical Laboratory Serdfgaf
No ratings yet
Customer Satisfaction With Clinical Laboratory Serdfgaf
23 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
5-Correlation, Regression and Rank Correlation-08-03-2024
No ratings yet
5-Correlation, Regression and Rank Correlation-08-03-2024
29 pages
Lundberg Pollak and Wales1997
No ratings yet
Lundberg Pollak and Wales1997
19 pages
Reg Corr
No ratings yet
Reg Corr
22 pages
Lecture 7 8 Weeks Correlation and Regression
No ratings yet
Lecture 7 8 Weeks Correlation and Regression
7 pages
Research MCQS, Data Analysis
100% (1)
Research MCQS, Data Analysis
4 pages
Data Camp - Correlation and Regression
No ratings yet
Data Camp - Correlation and Regression
151 pages
Data Miningof Public Opinion An Overview
No ratings yet
Data Miningof Public Opinion An Overview
12 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Correction
No ratings yet
Correction
10 pages
MEFall2023 5
No ratings yet
MEFall2023 5
13 pages
IntroToPRLec - Classroom Version
No ratings yet
IntroToPRLec - Classroom Version
30 pages
Correlation
100% (1)
Correlation
29 pages
Upload Chapter 2 COR 500 File 2024
No ratings yet
Upload Chapter 2 COR 500 File 2024
10 pages
Inventory Scales Alphabetically
No ratings yet
Inventory Scales Alphabetically
8 pages
PVQ Rol
No ratings yet
PVQ Rol
2 pages
Correlation and Regression: by Tushar Bhatt
100% (1)
Correlation and Regression: by Tushar Bhatt
66 pages
Effects of Work-Family and Family-Work Conflicts On Flexible Work Arrangements Demand: A Gender Role Perspective
No ratings yet
Effects of Work-Family and Family-Work Conflicts On Flexible Work Arrangements Demand: A Gender Role Perspective
22 pages
Coeficiente de Correlação
No ratings yet
Coeficiente de Correlação
6 pages
Introduction To Correlationand Regression Analysis BY Farzad Javidanrad PDF
No ratings yet
Introduction To Correlationand Regression Analysis BY Farzad Javidanrad PDF
52 pages
Regression Correlation
No ratings yet
Regression Correlation
22 pages
QT - LESSON 8-Regression & Correlation
No ratings yet
QT - LESSON 8-Regression & Correlation
12 pages
Market Research Lesson 1 (6-12)
No ratings yet
Market Research Lesson 1 (6-12)
22 pages
Qualitative Research in Engineering Management: October 2012
No ratings yet
Qualitative Research in Engineering Management: October 2012
8 pages
Correlation Analysis and Regression 22
No ratings yet
Correlation Analysis and Regression 22
41 pages
In-Class Social Networks and Academic Performance: How Good Connections Can Improve Grades
No ratings yet
In-Class Social Networks and Academic Performance: How Good Connections Can Improve Grades
12 pages
Microsoft PowerPoint Session 4 PDF
No ratings yet
Microsoft PowerPoint Session 4 PDF
86 pages
How To Write Literature Review Proposal
100% (2)
How To Write Literature Review Proposal
7 pages
Lecture 6 - Linear Regression and Correlation
No ratings yet
Lecture 6 - Linear Regression and Correlation
40 pages
Applied Statistics II Chapter 7 The Relationship Between Two Variables
No ratings yet
Applied Statistics II Chapter 7 The Relationship Between Two Variables
73 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
The Effect of Computers On Student Writing: A Meta-Analysis of Studies From 1992 To 2002
No ratings yet
The Effect of Computers On Student Writing: A Meta-Analysis of Studies From 1992 To 2002
19 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Panganiban Et Al
No ratings yet
Panganiban Et Al
2 pages
ANOVA Table and Prediction Intervals
No ratings yet
ANOVA Table and Prediction Intervals
7 pages
07 - Correlation and Regression Analysis-1
No ratings yet
07 - Correlation and Regression Analysis-1
13 pages
Correlation Lecture
No ratings yet
Correlation Lecture
20 pages
Correlation & Regression
No ratings yet
Correlation & Regression
20 pages
Statistical Techniques in Healthcare
No ratings yet
Statistical Techniques in Healthcare
13 pages
Correlation and Regression
No ratings yet
Correlation and Regression
12 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Chapter 8 - PSYC 284
No ratings yet
Chapter 8 - PSYC 284
7 pages
3 Kirkpatrick Evaluation Model Paper
100% (2)
3 Kirkpatrick Evaluation Model Paper
7 pages
Anderson 4e PPT 07
No ratings yet
Anderson 4e PPT 07
12 pages
Correlation New
No ratings yet
Correlation New
37 pages
Correlation and Regression
100% (6)
Correlation and Regression
36 pages
Co Relation and Regration
No ratings yet
Co Relation and Regration
2 pages
Chapter 9: Correlation and Regression: Solutions
No ratings yet
Chapter 9: Correlation and Regression: Solutions
8 pages
Chapter12 Stats
No ratings yet
Chapter12 Stats
6 pages
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
No ratings yet
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
43 pages
Working With Relationships Between Two Variables - Size of Teaching Tip & Stats Test Score
No ratings yet
Working With Relationships Between Two Variables - Size of Teaching Tip & Stats Test Score
20 pages
CHAPTER 2 METHOD This Chapter Describes The
No ratings yet
CHAPTER 2 METHOD This Chapter Describes The
5 pages
Correlation
No ratings yet
Correlation
34 pages
PSD Coarse Aggregate
No ratings yet
PSD Coarse Aggregate
3 pages
Rock12 Evaluation+of+Capacity+of+Rock+Foundation+Sockets
No ratings yet
Rock12 Evaluation+of+Capacity+of+Rock+Foundation+Sockets
8 pages
Syed Mazhar PHD Qualifying Exam Paper
No ratings yet
Syed Mazhar PHD Qualifying Exam Paper
3 pages
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.