0% found this document useful (0 votes)
78 views

Lecture 3 Simple Linear Regression

This document provides a summary of lecture notes on simple linear regression. It discusses: 1) Information about a group project analyzing datasets using econometric methods, including forming groups and choosing a dataset. 2) A recap of topics covered in the previous week's lecture, including probability distributions and concepts involving two random variables. 3) An outline of topics to be covered, including the simple linear model, population regression function, and properties of OLS estimates. 4) Definitions related to simple linear regression, including the population regression model and estimating the slope and intercept parameters.

Uploaded by

yen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Lecture 3 Simple Linear Regression

This document provides a summary of lecture notes on simple linear regression. It discusses: 1) Information about a group project analyzing datasets using econometric methods, including forming groups and choosing a dataset. 2) A recap of topics covered in the previous week's lecture, including probability distributions and concepts involving two random variables. 3) An outline of topics to be covered, including the simple linear model, population regression function, and properties of OLS estimates. 4) Definitions related to simple linear regression, including the population regression model and estimating the slope and intercept parameters.

Uploaded by

yen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Lecture Notes 3

Simple Linear Regression

Tingting Wu

National University of Singapore


Group Project Info Updates
• Purpose: Analyse one dataset through the Econometric methods in this course
• Number of groups: 10
• Group size: 4-5 pax. (8 groups with 4 pax and 2 groups with 5 pax)
• About dataset to use in the group project?
• 1. Datasets that I provide;
• 2. Alternative dataset that you are interested in (Please check with me to proceed)

• So please
Ø Form up your own group by the end of week 6.
Ø The group representative should send me your group members list (Name and ID, dataset)
Ø Then I can assign the datasets and release the group project details.
Recap for week 2 lecture

Ø Probability, Random variables, and Probability Distribution

Ø Moments of a probability distribution


• Mean; Variance; Standard deviation; Skewness;

Ø Often used probability distributions in econometrics


• Normal; Chi-Squared; Student t; F-distribution

Ø Two Random Variables


• Joint distribution, Marginal distribution, Conditional distribution
• Independence, Covariances and Correlations
Outline

Ø Simple Linear model

Ø The population regression function (PRF) and Relationship between u and x

Ø Deriving the Ordinal Least Squares (OLS) estimates

Ø Properties of OLS on any Sample of Data


Definition of Simple Regression Model
• What type of analysis will we do?
Cross-sectional analysis
• First step:
Clearly define what is your population (in what you are interested to study).
• Second Step:
There are two variables, x and y, and we would like to “study how y varies with changes in x.”
• Third Step:
We assume we can collect a random sample from the population of interest.

• Now we will learn to write our first econometric model, derive an estimator (what’s an estimator
again?) and use this estimator in our sample.
Definition of Simple Regression Model
We care:

How 𝑥 explains 𝑦 ?
𝑥 𝑦
Explanatory variable Variable to be explained
Independent variable Dependent variable
Conditioning variable Target variable
Predictor variable
Regressor

We must confront three issues:


a. How do we allow factors other than x to affect y ?
b. What is the functional relationship between y and x?
c. How can we be sure we are capturing a ceteris paribus relationship between y and x
Simple Linear Model
We usually define the simple linear regression model following the equation to explain 𝑦 in terms of 𝑥
in the population of interest as below:
𝑦 = 𝛽! + 𝛽"𝑥 + 𝑢,
𝛽" : intercept parameter 𝛽! : slope parameter 𝑢: error term or disturbance
(unobserved in the model)

𝛽! : the slope parameter to describe the relationship between y and x, holding other factors fixed.
(the primary interest in applied economics)
𝛽" : the intercept parameter, sometimes called as the constant term.
(also has it’s use, but rarely central to the analysis )

The simple linear regression model is also called the two-variables linear regression model
or bivariate linear regression model
Simple Linear Model
We usually define the simple linear regression model following the equation to explain 𝑦 in terms of 𝑥
in the population of interest as below:
𝑦 = 𝛽! + 𝛽"𝑥 + 𝑢,
𝛽" : intercept parameter 𝛽! : slope parameter 𝑢: error term or disturbance
(unobserved in the model)
This equation explicitly allows for other factors, contained in 𝑢 , to affect 𝑦. (a)
This equation addresses the functional relationship (b) and ceteris paribus (c) issue between y and x by
assuming that the other factors in 𝑢 are held fixed when x and y changes

∆𝑦 = 𝛽! ∆𝑥 + ∆𝑢
∆𝑦 = 𝛽! ∆𝑥 when ∆𝑢 = 0

Þ Namely, y is assumed to be linearly related to x when ∆ 𝑢 = 0.


Þ Thus, the change in y is simply 𝛽! multiplied by the change in x

You can predict the changes in 𝑦 from the changes in 𝑥!


Simple Linear Model

Example: Wage and Education

A model relating a person’s wage to observed education and other observed factors is

𝑤𝑎𝑔𝑒 = 𝛽! + 𝛽"𝑒𝑑𝑢𝑐 + 𝑢
Where
𝑤𝑎𝑔𝑒 is measured in dollars per hour
𝑒𝑑𝑢𝑐 is measured by years of getting education
𝑢 contains somewhat nebulous factors, i.e., ability, labor force experience, or numerous other things.

Then 𝛽! measures the changes in hourly wage given one more year of education, holding other factors
(u) fixed.
Þ can also be represented as

∆𝑤𝑎𝑔𝑒 = 𝛽"∆𝑒𝑑𝑢𝑐 when ∆𝜇 = 0


Simple Linear Model

The simple linear regression model following the equation to explain 𝑦 in terms of 𝑥 in the population
of interest as below:

𝑦 = 𝛽! + 𝛽"𝑥 + 𝑢,

How do we estimate 𝛽" and 𝛽! ?

Collect an example sample to have a look!


Find the value of dependent variable given the value of x
Example:
Figure. Weekly consumption expenditure in terms of weekly income

Oops! For each x, we have several values of y.


How do we guess the level of y given one specific x?
The most expected y given x?
Find the value of dependent variable given the value of x
Example:
Figure. Conditional Probability of each y given x

=> To get the expected value of y given x through the average of y given this x ?
The expected value of dependent variable
Figure. Conditional Distribution of expenditure (y) for various levels of income (x)

Fitting the expected values of


y given any x to a linear line

Þ Regression line

To find the expected value of


y given any x from the simple
linear model.
The population regression function (PRF)
• Let’s take the conditional expectation of our simple linear model for x and y in population

• Then, we get:
𝐸 𝑦 𝑥 = 𝛽" + 𝛽! 𝑥 + 𝐸 𝑢 𝑥

This function shows that relationship between x and y is that a one-unit increase in 𝑥 changes the
expected value of 𝑦 by the amount 𝛽! plus the changes in 𝐸 𝑢 𝑥 .

How this 𝑢 looks like given different x ?


The population regression function (PRF)
Figure. The Population Regression Function (PRF)

• For a given value of 𝑥, we see a variance of y values


, possibly affected by some unobserved factors in u.

=> Notice: In the population, the 𝑢 in simple linear


model 𝑦 = 𝛽" + 𝛽! 𝑥 + 𝑢 has a distribution.

• To single out the linearity between x and y. We need


to restrict the relationship between u and x.
=? Þ In particular, if the value of x changes, how u
changes?
Relation between 𝑢 and 𝑥

To let the changes in 𝑢 do not affect the 𝑥 , we need to restrict the dependence between 𝑥 and 𝑢.

• Option 1: Uncorrelated

We could assume 𝑢 and 𝑥 uncorrelated in the population:

𝐶𝑜𝑟𝑟 𝑥, 𝑢 = 0

It implies only that 𝑢 and 𝑥 are not linearly related.

=> Not good enough


Relation between 𝑢 and 𝑥

To let the changes in 𝑢 do not affect the 𝑥 , we need to restrict the dependence between 𝑥 and 𝑢.

• Option 2: Mean Independence

The mean of the error (i.e., the mean of the unobserved factors) is the same across all slices of the
population determined by values of 𝑥.

We represent it by

𝐸 𝜇 𝑥 = 𝐸 𝜇 , 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑥

And we say that 𝑢 is mean independence of x


Relation between 𝑢 and 𝑥
• Combining 𝐸 𝜇 𝑥 = 𝐸 𝜇 (the substantive assumption) with 𝐸 𝜇 = 0 (a normalization)
gives

𝐸 𝜇 𝑥 = 0, 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑥

• Called the zero conditional mean assumption

• Then, with zero conditional mean assumption, we get the population regression function:

𝐸 𝑦 𝑥 = 𝛽" + 𝛽! 𝑥 + 𝐸 𝑢 𝑥
= 𝛽" + 𝛽! 𝑥

Which finally shows the population regression function of 𝐸 𝑦 𝑥 is a linear function of x.


The population regression function (PRF)
• Assuming we know the population, consider this example:

• Suppose for the population of students attending a university, we know the PRF:

𝐸 𝑐𝑜𝑙𝐺𝑃𝐴 ℎ𝑠𝐺𝑃𝐴 = 1.5 + 0.5 ℎ𝑠𝐺𝑃𝐴

• For this example, what is 𝑦? what is 𝑥? What is the slope? What’s the intercept?

• If ℎ𝑠𝐺𝑃𝐴 = 3.6 what’s the expected college GPA in the population? 1.5 + 0.5(3.6) = 3.3

• In practice, we never know the population intercept and slope.

• => We estimate the population intercept and slope from the random sample collected.
Deriving the Ordinary Least Squares (OLS) Estimates
Given data on x and y, how can we estimate the population parameters, 𝛽" and 𝛽! ?

From population to sample:

Let’s collect a random sample {(𝑥# , 𝑦# ): 𝑖 = 1, … , 𝑛} with a sample size of n (the number of
observations) from the population.
The simple linear regression model of each 𝑥# 𝑎𝑛𝑑 𝑦# in Sample can be written as

𝑦# = 𝛽! +𝛽"𝑥# + 𝑢#

Where 𝑢# is the error term for observation i containing all factors affecting 𝑦# other than 𝑥# .

From the sample values of variable 𝑥# and 𝑦# , we compute the estimated 𝛽G" , 𝛽G! for 𝛽" and 𝛽! .
Deriving the Ordinary Least Squares (OLS) Estimates
Estimate the 𝛽" and 𝛽! through satisfying assumptions for the error term u in PRF:

Mean independence and zero conditional mean => 𝐸 𝑢 𝑥 = 𝐸 𝜇 = 0


Uncorrelation => Cov x, u = 𝐸 𝑥𝜇 = 0

Therefore, we plug 𝑦# = 𝛽" + 𝛽! 𝑥# + 𝑢# that we get from our random sample into 𝐸 𝜇 = 0

Then we get 𝐸 𝑦# − 𝛽G" − 𝛽G! 𝑥# = 0

! $
=> ∑ 𝑦# − 𝛽G" − 𝛽G! 𝑥# = 0 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ (1)
$ #%!

Again, we plug 𝑦# = 𝛽" + 𝛽! 𝑥# + 𝑢# that we get from our random sample into 𝐸 𝑥𝜇 = 0

! $
Then we get ∑ 𝑥 𝑦# − 𝛽G" − 𝛽G! 𝑥# = 0 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ (2)
$ #%! #
Deriving the Ordinary Least Squares (OLS) Estimates
Combining (1) and (2), we get the estimated slope is

∑$#%!(𝑥# − 𝑥)(𝑦
̅ # − 𝑦)
R
𝛽G! =
∑$#%! 𝑥# − 𝑥̅ &

This equation is simply the sample covariance between 𝑥# and 𝑦# divided by the sample variance of 𝑥#

'()*+, -./(0#($-, (2,4) 6!,# 78#


So, we can write it as 𝛽G! = = $ = 𝜌S2,4 7
6()*+, /(0#($-,(2) 6! 8!

Once we have 𝛽G! , then we can estimate the intercept as

𝛽G" = 𝑦R − 𝛽G! 𝑥̅

The estimates given in 𝛽G! and 𝛽G" above are called the Ordinary Least Square estimates of 𝛽! and 𝛽"
Deriving the Ordinary Least Squares (OLS) Estimates
For any 𝛽G! and 𝛽G" defines a fitted value for y when 𝑥 = 𝑥# , we could have

𝑦0# = 𝛽1!+𝛽1"𝑥#

• 𝛽G" , 𝛽G! are the estimated intercept and slope


• 𝑦U# is the fitted/predicted value
• We also have the residuals, 𝜇U# which are the differences between the true values of 𝑦 and the
predicted value:
𝜇0# = 𝑦# − 𝑦0#
• You can think of the residuals as the prediction errors of our estimates.

• So how do we find the best estimator 𝛽G" and 𝛽G! to predict the value of 𝑦U# in terms of 𝑥# ?

=> The less the prediction errors for each 𝜇$! is, the better the estimators are.
Deriving the Ordinary Least Squares (OLS) Estimates
To summarize the information of prediction errors for all i, we minimize the sum of the squared
prediction errors, or in other words, minimizing the sum of the squared residuals to find the best
estimators for 𝛽! and 𝛽" .

• Ordinary Least Squares (OLS):


'
(𝛽1!, 𝛽1") = arg 𝑚𝑖𝑛 ;(𝑌# −𝛽! − 𝛽"𝑋# )(
𝛽" , 𝛽# #&"

• The first order conditions for equation for the OLS estimates are exactly the equations (1) and (2)
!
without $ .

• So our estimators 𝛽G! and 𝛽G" are OLS estimators.


Deriving the Ordinary Least Squares (OLS) Estimates
Graphical Example:
How do we fit the regression line 𝑦$! = 𝛽-" +𝛽-# 𝑥! through the OLS estimators to the data?
Deriving the Ordinary Least Squares (OLS) Estimates
Graphical Example:
How do we fit the regression line 𝑦$! = 𝛽-" +𝛽-# 𝑥! through the OLS estimators to the data?
Deriving the Ordinary Least Squares (OLS) Estimates
Graphical Example:
How do we fit the regression line 𝑦$! = 𝛽-" +𝛽-# 𝑥! through the OLS estimators to the data?
Answer: We will minimize the squared sum of residuals
Interpreting the OLS Estimates

Example: Wage and Education

A PRF model relating a person’s wage to observed education and other observed factors is
𝑤𝑎𝑔𝑒 = 𝛽! + 𝛽"𝑒𝑑𝑢𝑐 + 𝑢
Data: A random sample from the US workforce population in 1976 with 526 observations.
𝑤𝑎𝑔𝑒: dollars per hour,
𝑒𝑑𝑢𝑐: highest grade completed (years of education)

The estimated equation is


> = −0.90 +0.54 𝑒𝑑𝑢𝑐
𝑤𝑎𝑔𝑒
Ø People with one more year of schooling is estimated to earn $0.54 per hour more.
Interpreting the OLS Estimates

Example: Wage and Education

This function > = −0.90 +0.54 𝑒𝑑𝑢𝑐


𝑤𝑎𝑔𝑒

is the OLS (or sample) regression line.

With this OLS regression line, can you predict the hourly wage for a person with of 12 years of
schooling (e.g., high school)?

How about 16 years (e.g., Bachelor degree)?


Interpreting the OLS Estimates

Example: Wage and Education

• When we write the simple linear regression model,

𝑤𝑎𝑔𝑒 = 𝛽! + 𝛽" 𝑒𝑑𝑢𝑐 + 𝑢

it applies to the population, so we do not know 𝛽" and 𝛽! .

• 𝛽G" = −0.91, 𝛽G! = 0.54 are our estimates from this particular sample.

• These estimates may or may not be close to the population values. If we obtain another sample, the
estimates will almost change.
Interpreting the OLS Estimates

Example: Wage and Education

• If 𝑒𝑑𝑢𝑐 = 0, then the predicted wage from our OLS regression line

> = −0.90 +0.54 ∗ 0 = −0.90


𝑤𝑎𝑔𝑒

• This prediction does not fit in reality.

• Mainly because when we extrapolate outside the majority range of our data can produce strange
predictions.
=> less sampled for that group of people probably
Interpreting the OLS Estimates

Example: Wage and Education

• When 𝑒𝑑𝑢𝑐 = 8, then the predicted wage from our OLS regression line

> = −0.90 +0.54 ∗ 8 = 3.42


𝑤𝑎𝑔𝑒

• Which we can think of as our estimate of the average wage in the population when 𝑒𝑑𝑢𝑐 = 8.

So how to interpret the predicted wages for people with 12 years of schooling (e.g., high school) and
16 years of schooling (e.g., Bachelor degree) that we did in previous slides?
Properties of OLS on any Sample of Data
Properties of OLS on any Sample of Data
Recall the OLS estimates for any 𝛽G! and 𝛽G" , we could have a fitted value for y when 𝑥 = 𝑥# :

𝑦0# = 𝛽1!+𝛽1"𝑥#

• 𝛽G" , 𝛽G! are the estimated intercept and slope


• 𝑦U# is the fitted/predicted value

• We also have the residuals, 𝜇U# which are the differences between the true values of 𝑦# and the
predicted value 𝑦U# based on 𝑥# :

𝜇0# = 𝑦# − 𝑦0# = 𝑦# − 𝛽1!−𝛽1"𝑥# , 𝑖 = 1,2 … … 𝑛


Properties of OLS on any Sample of Data

Some residuals are positive, others are negative

• If 𝑢̂ # is positive ⇒ the line underpredicts 𝑦#


• If 𝑢̂ # is negative ⇒ the line overpredicts 𝑦#
Properties of OLS on any Sample of Data: Algebraic
(1) The sum of OLS residuals is 0:
$
Y 𝑢S # = 0
#%!

(2) The sample covariance between the explanatory variables and the residuals is always zero
$
Y 𝑥# 𝑢S # = 0
#%!

Ø Therefore, the sample correlation between the x and 𝑢̂ # is also equal to zero.
Ø Because the 𝑦%# are linear functions of the x i , the fitted values and residuals are uncorrelated, too:
$
Y 𝑦S# 𝑢S # = 0
#%!
Properties of OLS on any Sample of Data: Algebraic

(3) The point (𝑥̅ , 𝑦


J): is always on the OLS regression line

𝑦R = 𝛽G" +𝛽G! 𝑥̅

=> That is, if we plug in the average for 𝑥 , we predict the sample average of y.

So far, we have no way of measuring how well the explanatory or independent variable x, explains
the dependent variable, y.

=> Check the Goodness-of-Fit


Properties of OLS on any Sample of Data: Goodness-of-Fit

For each observation, we write

𝑦# = 𝑦̂# + 𝑢S #

Then we can define:

Total Sum of Squares = SST = ∑$#%! 𝑦# − 𝑦R &

Explained Sum of Squares = SSE = ∑$#%! 𝑦S# − 𝑦R &

Residual Sum of Squares = SSR = ∑$#%! 𝑢U# &

• Other names for the three quantities may also ok.


e.g., SST = TSS
SSE = ESS (regression sum of squares)
SSR = RSS (error sum of squares)
Properties of OLS on any Sample of Data: Goodness-of-Fit

Let’s have a look at the Total Sum of Squares (SST)

SST = ∑$#%! 𝑦# − 𝑦R &

= ∑$#%![ 𝑦# − 𝑦S# + 𝑦U# − 𝑦R ]&

= ∑$#%![S
𝑢# + 𝑦U# − 𝑦R ]&

Using the fact that the fitted values and residuals are uncorrelated, then we have

SST = SSE + SSR

=> How do we use the SST, SSE and SSR to measure how well our x explains y?
Properties of OLS on any Sample of Data: Goodness-of-Fit

The R-Squared

We want to evaluate how well the independent variable x explains the dependent variable y.

•We want to obtain the fraction of the sample variation in y that is explained by x.
•We will summarize it in one number: R 2 (or coefficient of determination.)

By assuming SST > 0,

𝑆𝑆𝐸 𝑆𝑆𝑅
𝑅& = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇

Simply to say, 𝑅& is the ratio of the explained variation compared with total variation
Properties of OLS on any Sample of Data: Goodness-of-Fit

Since SSE cannot be greater than SST, then we have

0 ≤ 𝑅2 ≤ 1

Interpretations:

• 𝑅2 = 0 => No linear relationship between 𝑦# and 𝑥# .


• 𝑅2 = 1 => Prefect linear relationship between 𝑦# and 𝑥# .
• As 𝑅 2 increases => 𝑦# (actual values) gets closer and closer to the OLS regression line 𝑦S# (estimated
values)

Now, it seems that we can analyze our regression results with 𝑅 2 .

But should we only focus on 𝑅 2 ?


Properties of OLS on any Sample of Data: Goodness-of-Fit
Example: Wage and Education

• Let’s recall this OLS (sample) regression line,

, = −0.90 +0.54 𝑒𝑑𝑢𝑐


𝑤𝑎𝑔𝑒

Where 𝑛 = 526 , 𝑅 2 = 0.16

• Therefore, years of education explains only about 16% of the variation in hourly wage
• This means that 84% of the variation in hourly wage is left unexplained!

It is common to have low 𝑅 2 in simple regression model for cross-sectional analysis.


Thus, multiple regression analysis is needed.

Þ Still, we cannot deny the relationship between hourly wage and education.
Þ 𝑅 2 at some degree explains the relationship between 𝑦# and 𝑥# .
Þ Be careful with the interpretation.
Suggested Exercise
Below you have a random sample with 10 data points. Your observations are (𝑥 𝑖 , 𝑦# ).

Find the 𝛽G" , 𝛽G! and the 𝑅 2 .


Suggested Exercise
To answer this question, you will need to know:

∑) ̅
&'((&& '&)(* +
& '*)
• 𝛽3" = ∑) *
&'( && '&̅

• 𝛽G" = 𝑦R − 𝛽G! 𝑥̅

• SST = ∑$#%! 𝑦# − 𝑦R &

• SSE = ∑$#%! 𝑦S# − 𝑦R &

• SSR = ∑$#%! 𝑢U# &

669 66;
• 𝑅& = =1−
66: 66:

Please understand then and remember them.


These quantities will not be presented for you in the exams.
To be continued……

Some suggested exercises will be provided after class for you to practice. Please check
Canvas by tomorrow.

Next week, we will continue to talk about simple linear regression

• Change units of measurement


• Incorporating nonlinearities
• Unbiasedness of OLS Estimators

And Multiple linear regression


This is it for today!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy