0% found this document useful (0 votes)
8 views

Lec2 ASE

The document discusses linear regression models. It explains that a linear regression model specifies a dependent variable (y) as a linear function of one or more independent variables (x). The slope coefficient (β1) measures the average change in the dependent variable (y) from a one-unit change in the independent variable (x). While nonlinear relationships can be approximated by linear models, this introduces approximation errors. The model also includes a stochastic error term (ε) to account for uncertainty. Observations from a random sample are used to estimate the coefficients.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lec2 ASE

The document discusses linear regression models. It explains that a linear regression model specifies a dependent variable (y) as a linear function of one or more independent variables (x). The slope coefficient (β1) measures the average change in the dependent variable (y) from a one-unit change in the independent variable (x). While nonlinear relationships can be approximated by linear models, this introduces approximation errors. The model also includes a stochastic error term (ε) to account for uncertainty. Observations from a random sample are used to estimate the coefficients.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Linear Regression

Karim Nchare

African School of Economics

November 2020
Functional relations

I Quantitative characteristics of the world are usually entangled


in functional relations
Functional relations

I Quantitative characteristics of the world are usually entangled


in functional relations
I A regression or model specifies an explained variable as a
function of an explanatory variable

y = f (x)
Functional relations

I Quantitative characteristics of the world are usually entangled


in functional relations
I A regression or model specifies an explained variable as a
function of an explanatory variable

y = f (x)

I y is the regressand, response variable, explained variable,


dependant variable, outcome
Functional relations

I Quantitative characteristics of the world are usually entangled


in functional relations
I A regression or model specifies an explained variable as a
function of an explanatory variable

y = f (x)

I y is the regressand, response variable, explained variable,


dependant variable, outcome
I x is the regressor, predictor variable, explanatory variable,
independent variable, control variable.
Example: Quadratic regression
Rate of change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0
Rate of change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0

I It depends both on the initial point and the magnitude of the


change
Rate of Change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0
Rate of Change
∆x = x1 − x0 and ∆y = y1 − y0 = f (x1 ) − f (x0 )
I The rate of change measures how y responds to changes in x
∆y f (x1 ) − f (x0 ) f (x0 + ∆X ) − f (x0 )
= =
∆x x1 − x0 x1 − x0

I It depends both on the initial point and the magnitude of the


change
Linear Model
I A model is linear if it can be written as:

y = β0 + β1 x
Linear Model
I A model is linear if it can be written as:

y = β0 + β1 x

I Which means that the graph of the regression is a (straight)


line
Slope coefficient
Slope coefficient
Slope coefficient

I The slope of a linear model equals β1 independently of x0 and


∆x
∆y y1 − y0
=
∆x x1 − x0
(β0 + β1 x1 ) − (β0 + β1 x0 )
=
x1 − x0
β1 (x1 − x0 )
=
x1 − x0
= β1
The linearity assumption
I The linearity assumption is less restrictive than it appears
The linearity assumption
I The linearity assumption is less restrictive than it appears

I The following model is clearly nonlinear

y = log (γ0 x γ1 )
The linearity assumption
I The linearity assumption is less restrictive than it appears

I The following model is clearly nonlinear

y = log (γ0 x γ1 )

I After some relabelling:

β0 = log (γ0 )
β 1 = γ1
z = log (x)
The linearity assumption
I The linearity assumption is less restrictive than it appears

I The following model is clearly nonlinear

y = log (γ0 x γ1 )

I After some relabelling:

β0 = log (γ0 )
β 1 = γ1
z = log (x)

I We obtain the linear model

y = log (γ0 ) + γ1 log (x) = β0 + β1 z


Approximating nonlinear models

I Suppose that the true relationship between x and y is given by

y = f (x)
Approximating nonlinear models

I Suppose that the true relationship between x and y is given by

y = f (x)

I We can always abstract from non potential linearities and use


a linear model

ỹ = β0 + β1 x ≈ y = f (x)
Approximating nonlinear models

I Suppose that the true relationship between x and y is given by

y = f (x)

I We can always abstract from non potential linearities and use


a linear model

ỹ = β0 + β1 x ≈ y = f (x)

I If f is not linear, then the approximation will be inexact and


there will be approximation errors

 = y − ỹ
Approximating nonlinear models
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )

I We can still have linear models

y = β0 + β1 x1 + β2 x2 + · · · + βk xk
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )

I We can still have linear models

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I In this case, each coefficient βj is still a measure of change


holding every other variable constant
∆y
= βj
∆xj
Multivariate regressions
I The value of the response variable may be a function of many
regressors
y = f (x1 , x2 , ..., xk )

I We can still have linear models

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I In this case, each coefficient βj is still a measure of change


holding every other variable constant
∆y
= βj
∆xj

I For multivariate regressions linearity assumes separability


Unobserved variables

I We may not know or observe all the variables which affect y

y = β0 + β1 x1 + β2 x2 + · · · + βk xk
Unobserved variables

I We may not know or observe all the variables which affect y

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I if β2 x2 + · · · + βk xk is unobserved, We can still approximate y


with the variables that we do observe

ỹ = β0 + β1 x1
Unobserved variables

I We may not know or observe all the variables which affect y

y = β0 + β1 x1 + β2 x2 + · · · + βk xk

I if β2 x2 + · · · + βk xk is unobserved, We can still approximate y


with the variables that we do observe

ỹ = β0 + β1 x1

I As before, this approximation is inexact and has an


approximation error

 = y − ỹ = β2 x2 + · · · + βk xk
Stochastic regression

I Most of the time there is uncertainty because (at least)


I We are not certain about the linearity of a regression
I We cannot list all the relevant regressors
I we can have some measurement error issues

I Uncertainty is captured by a stochastic error term 

y = β0 + β1 x + 

I β0 + β1 x is called the deterministic component of the model


I  is called the random component of the model
Stochastic regression

I Assume that the error has zero mean conditional on x

I Then the deterministic component corresponds to the mean of


y conditional on x

E (y |x) = E (β0 + β1 x + |x)


= β0 + β1 x

I Then slope coefficient measures the average per-unit effect of


changes in x over the average value of y conditional on x

E (y |x1 ) − E (y |x0 ) = β1 (x1 − x0 )


Random Sample
I We are usually interested in different observations coming
from
I Cross-sectional – different sources
I Time series – a single source at different times
I Panel data – different time series from different sources

I We assume that the data comes from a random sample


{xi, yi, i }
I xi and yi are observed but i is not and we have a collection
of equations
yi = β0 + β1 xi + i
I In case of a multivariate regression

yi = β0 + β1 x1i + · · · + βk xki + i
Predictions and Residuals

I Suppose that we have estimates β̂0 and β̂1 , the estimated


model is then
ŷ = β̂0 + β̂1 x
I Given an estimated model, for each realization of xi the
predicted value of yi is:

ŷi = β̂0 + β̂1 xi

I The corresponding residual is:

ei = yi − ŷi

I Notice we cannot guarantee that ei = i unless we know β0


and β1
A linear regression, random sample
A linear regression, the estimated model
A linear regression, errors vs residuals
Example: Height and Weight model

I Contest Game:
I If you guest the weight of a participant within 10lb of the
actual weight, you get paid 2$.
I Otherwise you pay him or her 3$
I You could use height (observable) to estimate the weight

WEIGHTi = β0 + β1 HEIGHTi + i

I Given estimated coefficients β̂0 = 103.4 and β̂1 = 6.38, you


can make predictions

\ i = 103.4 + 6.38HEIGHTi
WEIGHT
Example: height and weight, Predictions, observations,
residuals
Example: height and weight Predictions
Example: height and weight, Predictions, observations
Example: height and weight, Predictions, observations,
residuals
Estimating linear models

I Begin from dataset coming from a random sample {xi , yi }


I We assume that x and y are related by a model:

yi = β0 + β1 xi + i

I We do not observe i and the true coefficients β0 and β1


I Our objective now is to generate estimates β̂0 and β̂1 of these
coefficients to obtain an estimated model

ŷi = β̂0 + β̂1 xi


Example: linear regression, Data generating process
Example: linear regression, Realized random sample
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the best linear model
Example: linear regression, the closest linear model
Example: linear regression, the closest linear model
The best linear model

I Two uses for the estimated model:


I Prediction - Given xi , yi what is the predicted value ŷi for a
new value of x
I Policy - Given xi , yi what is the average change in y
associated with a change in x:

∆yi = β1 ∆xi ≈ ∆xi

I Better predictions when yi ≈ ŷi , i.e. when the residuals are


small
I Policy implications only make sense if we establish causality
I Better policy implications when β̂1 ≈ β1 and e ≈ 0
Ordinary least squares

Given a data set, the ordinary least squares (OLS) estimates of


β0 and β1 , are the numbers β̂0 and β̂1 which minimize the sum of
squared residuals:
n
X
SSR = (yi − β̂0 − β̂1 xi )2
i=1

The OLS estimated model is: ŷi = β̂0 + β̂1 xi


I We wish to have small residuals. Small means in magnitude
not sign:
ei = yi − ŷi = yi − β̂0 − β̂1 xi
Examples: OLS Random samples
Examples: OLS Estimated models
Computing OLS

I When β1 = 0, we know that β̂0 = ȳ why?


I Now suppose that we know that β0 = 0, i.e. yi = β1 xi + i
I In this case we obtain:
P
xi yi
β̂1 = P 2
xi
I In the general case, the OLS estimates are given by:
P
(xi − x̄)(yi − ȳ )
β̂1 =
(xi − x̄)2
P

β̂0 = ȳ − β̂1 x̄

cov (x,y )
I Notice that β̂1 looks like a sample analogue of var (x)
I The OLS estimates guarantee that
P
êi = 0
Example: height and weight Computing OLS
Example: height and weight Computing OLS
Example: height and weight Computing OLS

P
(xi − x̄)(yi − ȳ ) 590.2
β̂1 = P 2
= ≈ 6.38
(xi − x̄) 92.55
β̂0 = ȳ − β̂1 x̄ = 169 − 6.38 × 10 ≈ 105.22
ŷi = 105.22 + 6.38xi
Example: geography of trade
Example: military service and income
Example: income vs. fecundity
Example: public debt vs. growth
The need for an intercept

I Most of the time we will be interested in β1 rather than β0


I One could simply estimate

yi = β1 xi + i

I But if β̂0 = 0 we may get bad estimates


Multivariate regressions
I The analysis extends to multivariate models

yi = β0 + β1 x1i + · · · + βk xki + i

I The interpretation is slightly different: β̂k indicates the


response to changes in xk holding other regressors constant
I OLS is defined in the same way: minimizing SSR
I The formulas require linear algebra
I OLS is never done by hand: we use computers
Example: Financial Aid
I Response variable: FINAIDi – grant per year to applicant i
I Regressors:
I PARENTi -feasible contributions from parents
I HSRANKi -GPA rank in high school
I GENDERi -gender dummy (1 if male and 0 if female)
Example: financial aid, Dataset
Example: financial aid, Dataset
Example: financial aid, OLS
Estimated OLS model (ignoring GENDER and HSRANK):

\ i = 15897 − 0.34PARENTi
FINAID
Example: financial aid, OLS
Estimated OLS model (ignoring GENDER):

\ i = 8927 − 0.36PARENTi + 87.4HSRANKi


FINAID
Interaction terms

I If the effect of x1 on y depends on the value of x2


I Include an interaction term x1x2 in the regression

y = β0 + β1 x1 + β2 x2 + β3 x1 x2 + 

I The average effect of one unit change in x1 over y is given by


β2 + β3 x2 :
0 0
E (y |x1 , x2 ) − E (y |x1 , x2 ) = (x1 − x1 )(β2 + β3 x2 )
Anscombe’s quartet Data
Example: Anscombe’s quartet, Scatterplots
Anscombe’s quartet Estimated models
Evaluating an estimated model

I Is the equation supported by sound theory/common sense?


I How well does the estimated model fit the data?
I Is the dataset reasonably large and accurate?
I Is OLS the best estimator to be used?
I Do estimated coefficients correspond to prior expectations?
I Are all the important variables included?
I In case we want to do policy: are the estimated parameters
structural?
Explained variation

I Regressions are used to explain y


I In particular, we wish to explain why/when is yi different from
E (y )
I The variation in y can be decomposed as:

yi − E (y ) = β0 + β1 xi + i − β0 − β1 E (x)
= β1 (xi − E (x)) + i
explained unexplained

I One way to evaluate models is to measure the proportion of


the variance of y that we are able to explain
Explained variation

I Regressions are used to explain y


I In particular, we wish to explain why/when is yi different from

I The variation in y can be decomposed as:

yi − ȳ = β0 + β1 xi + i − β0 − β1 x̄
= β1 (xi − x̄) + i
explained unexplained

I One way to evaluate estimated models is to measure the


proportion of the variance of y that we are able to explain
Example: Variance decomposition
Example: Variance decomposition
Variance decomposition

X X
SST = (yi − ȳ )2 =(yi − ŷi + ŷi − ȳ )2
X X X
= (yi − ŷi )2 + 2 (yi − ŷi )(ŷi − ȳ ) + (ŷi − ȳ )2
X X
= (yi − ŷi )2 + (ŷi − ȳ )2
= Sum of Squares Residual + Sum of Squares Explained
= SSR + SSE
Goodness of fit R 2
I We have decomposed the total variation (SST) into the
explained variation (SSE) and the unexplained or residual
variation (SSR)
I A measure of how much of the variation of y can be explained
by the variation of x according to the estimated model
SSE SST − SSR SSR
R2 = = =1−
SST SST SST
I The higher the R 2 the closer the model is to the data and
since 0 ≤ SSR ≤ SST we know that 0 ≤ R 2 ≤ 1.
I It does not measure:
I How linear/tight the relation between x and y is (correlation)
I The inclination of the estimated line (slope coefficient)
I The strength of the causal relation between x and y
Examples
Example: height and weight, Computing OLS
Adding more regressors

I Adding a regressor always decreases SSR and then always


increases R 2 if y is independent from it. Why?
I Having more data or more variables improves the R2 because
it increases the degrees of freedom
I The adjusted R 2 controls for this bias:

SSR/n − K
R̄ 2 = 1 −
SST /n − 1

where n is the sample size and K is the number of parameters


I R̄ 2 = R 2 when K = 1 and R̄ 2 ≈ R 2 when n is very large.
ANOVA
Example: water supply variables

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy