Lecture 3 Simple Linear Regression
Lecture 3 Simple Linear Regression
Tingting Wu
• So please
Ø Form up your own group by the end of week 6.
Ø The group representative should send me your group members list (Name and ID, dataset)
Ø Then I can assign the datasets and release the group project details.
Recap for week 2 lecture
• Now we will learn to write our first econometric model, derive an estimator (what’s an estimator
again?) and use this estimator in our sample.
Definition of Simple Regression Model
We care:
How 𝑥 explains 𝑦 ?
𝑥 𝑦
Explanatory variable Variable to be explained
Independent variable Dependent variable
Conditioning variable Target variable
Predictor variable
Regressor
𝛽! : the slope parameter to describe the relationship between y and x, holding other factors fixed.
(the primary interest in applied economics)
𝛽" : the intercept parameter, sometimes called as the constant term.
(also has it’s use, but rarely central to the analysis )
The simple linear regression model is also called the two-variables linear regression model
or bivariate linear regression model
Simple Linear Model
We usually define the simple linear regression model following the equation to explain 𝑦 in terms of 𝑥
in the population of interest as below:
𝑦 = 𝛽! + 𝛽"𝑥 + 𝑢,
𝛽" : intercept parameter 𝛽! : slope parameter 𝑢: error term or disturbance
(unobserved in the model)
This equation explicitly allows for other factors, contained in 𝑢 , to affect 𝑦. (a)
This equation addresses the functional relationship (b) and ceteris paribus (c) issue between y and x by
assuming that the other factors in 𝑢 are held fixed when x and y changes
∆𝑦 = 𝛽! ∆𝑥 + ∆𝑢
∆𝑦 = 𝛽! ∆𝑥 when ∆𝑢 = 0
A model relating a person’s wage to observed education and other observed factors is
𝑤𝑎𝑔𝑒 = 𝛽! + 𝛽"𝑒𝑑𝑢𝑐 + 𝑢
Where
𝑤𝑎𝑔𝑒 is measured in dollars per hour
𝑒𝑑𝑢𝑐 is measured by years of getting education
𝑢 contains somewhat nebulous factors, i.e., ability, labor force experience, or numerous other things.
Then 𝛽! measures the changes in hourly wage given one more year of education, holding other factors
(u) fixed.
Þ can also be represented as
The simple linear regression model following the equation to explain 𝑦 in terms of 𝑥 in the population
of interest as below:
𝑦 = 𝛽! + 𝛽"𝑥 + 𝑢,
=> To get the expected value of y given x through the average of y given this x ?
The expected value of dependent variable
Figure. Conditional Distribution of expenditure (y) for various levels of income (x)
Þ Regression line
• Then, we get:
𝐸 𝑦 𝑥 = 𝛽" + 𝛽! 𝑥 + 𝐸 𝑢 𝑥
This function shows that relationship between x and y is that a one-unit increase in 𝑥 changes the
expected value of 𝑦 by the amount 𝛽! plus the changes in 𝐸 𝑢 𝑥 .
To let the changes in 𝑢 do not affect the 𝑥 , we need to restrict the dependence between 𝑥 and 𝑢.
• Option 1: Uncorrelated
𝐶𝑜𝑟𝑟 𝑥, 𝑢 = 0
To let the changes in 𝑢 do not affect the 𝑥 , we need to restrict the dependence between 𝑥 and 𝑢.
The mean of the error (i.e., the mean of the unobserved factors) is the same across all slices of the
population determined by values of 𝑥.
We represent it by
𝐸 𝜇 𝑥 = 𝐸 𝜇 , 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑥
𝐸 𝜇 𝑥 = 0, 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑥
• Then, with zero conditional mean assumption, we get the population regression function:
𝐸 𝑦 𝑥 = 𝛽" + 𝛽! 𝑥 + 𝐸 𝑢 𝑥
= 𝛽" + 𝛽! 𝑥
• Suppose for the population of students attending a university, we know the PRF:
• For this example, what is 𝑦? what is 𝑥? What is the slope? What’s the intercept?
• If ℎ𝑠𝐺𝑃𝐴 = 3.6 what’s the expected college GPA in the population? 1.5 + 0.5(3.6) = 3.3
• => We estimate the population intercept and slope from the random sample collected.
Deriving the Ordinary Least Squares (OLS) Estimates
Given data on x and y, how can we estimate the population parameters, 𝛽" and 𝛽! ?
Let’s collect a random sample {(𝑥# , 𝑦# ): 𝑖 = 1, … , 𝑛} with a sample size of n (the number of
observations) from the population.
The simple linear regression model of each 𝑥# 𝑎𝑛𝑑 𝑦# in Sample can be written as
𝑦# = 𝛽! +𝛽"𝑥# + 𝑢#
Where 𝑢# is the error term for observation i containing all factors affecting 𝑦# other than 𝑥# .
From the sample values of variable 𝑥# and 𝑦# , we compute the estimated 𝛽G" , 𝛽G! for 𝛽" and 𝛽! .
Deriving the Ordinary Least Squares (OLS) Estimates
Estimate the 𝛽" and 𝛽! through satisfying assumptions for the error term u in PRF:
Therefore, we plug 𝑦# = 𝛽" + 𝛽! 𝑥# + 𝑢# that we get from our random sample into 𝐸 𝜇 = 0
! $
=> ∑ 𝑦# − 𝛽G" − 𝛽G! 𝑥# = 0 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ (1)
$ #%!
Again, we plug 𝑦# = 𝛽" + 𝛽! 𝑥# + 𝑢# that we get from our random sample into 𝐸 𝑥𝜇 = 0
! $
Then we get ∑ 𝑥 𝑦# − 𝛽G" − 𝛽G! 𝑥# = 0 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ (2)
$ #%! #
Deriving the Ordinary Least Squares (OLS) Estimates
Combining (1) and (2), we get the estimated slope is
∑$#%!(𝑥# − 𝑥)(𝑦
̅ # − 𝑦)
R
𝛽G! =
∑$#%! 𝑥# − 𝑥̅ &
This equation is simply the sample covariance between 𝑥# and 𝑦# divided by the sample variance of 𝑥#
𝛽G" = 𝑦R − 𝛽G! 𝑥̅
The estimates given in 𝛽G! and 𝛽G" above are called the Ordinary Least Square estimates of 𝛽! and 𝛽"
Deriving the Ordinary Least Squares (OLS) Estimates
For any 𝛽G! and 𝛽G" defines a fitted value for y when 𝑥 = 𝑥# , we could have
𝑦0# = 𝛽1!+𝛽1"𝑥#
• So how do we find the best estimator 𝛽G" and 𝛽G! to predict the value of 𝑦U# in terms of 𝑥# ?
=> The less the prediction errors for each 𝜇$! is, the better the estimators are.
Deriving the Ordinary Least Squares (OLS) Estimates
To summarize the information of prediction errors for all i, we minimize the sum of the squared
prediction errors, or in other words, minimizing the sum of the squared residuals to find the best
estimators for 𝛽! and 𝛽" .
• The first order conditions for equation for the OLS estimates are exactly the equations (1) and (2)
!
without $ .
A PRF model relating a person’s wage to observed education and other observed factors is
𝑤𝑎𝑔𝑒 = 𝛽! + 𝛽"𝑒𝑑𝑢𝑐 + 𝑢
Data: A random sample from the US workforce population in 1976 with 526 observations.
𝑤𝑎𝑔𝑒: dollars per hour,
𝑒𝑑𝑢𝑐: highest grade completed (years of education)
With this OLS regression line, can you predict the hourly wage for a person with of 12 years of
schooling (e.g., high school)?
• 𝛽G" = −0.91, 𝛽G! = 0.54 are our estimates from this particular sample.
• These estimates may or may not be close to the population values. If we obtain another sample, the
estimates will almost change.
Interpreting the OLS Estimates
• If 𝑒𝑑𝑢𝑐 = 0, then the predicted wage from our OLS regression line
• Mainly because when we extrapolate outside the majority range of our data can produce strange
predictions.
=> less sampled for that group of people probably
Interpreting the OLS Estimates
• When 𝑒𝑑𝑢𝑐 = 8, then the predicted wage from our OLS regression line
• Which we can think of as our estimate of the average wage in the population when 𝑒𝑑𝑢𝑐 = 8.
So how to interpret the predicted wages for people with 12 years of schooling (e.g., high school) and
16 years of schooling (e.g., Bachelor degree) that we did in previous slides?
Properties of OLS on any Sample of Data
Properties of OLS on any Sample of Data
Recall the OLS estimates for any 𝛽G! and 𝛽G" , we could have a fitted value for y when 𝑥 = 𝑥# :
𝑦0# = 𝛽1!+𝛽1"𝑥#
• We also have the residuals, 𝜇U# which are the differences between the true values of 𝑦# and the
predicted value 𝑦U# based on 𝑥# :
(2) The sample covariance between the explanatory variables and the residuals is always zero
$
Y 𝑥# 𝑢S # = 0
#%!
Ø Therefore, the sample correlation between the x and 𝑢̂ # is also equal to zero.
Ø Because the 𝑦%# are linear functions of the x i , the fitted values and residuals are uncorrelated, too:
$
Y 𝑦S# 𝑢S # = 0
#%!
Properties of OLS on any Sample of Data: Algebraic
𝑦R = 𝛽G" +𝛽G! 𝑥̅
=> That is, if we plug in the average for 𝑥 , we predict the sample average of y.
So far, we have no way of measuring how well the explanatory or independent variable x, explains
the dependent variable, y.
𝑦# = 𝑦̂# + 𝑢S #
= ∑$#%![S
𝑢# + 𝑦U# − 𝑦R ]&
Using the fact that the fitted values and residuals are uncorrelated, then we have
=> How do we use the SST, SSE and SSR to measure how well our x explains y?
Properties of OLS on any Sample of Data: Goodness-of-Fit
The R-Squared
We want to evaluate how well the independent variable x explains the dependent variable y.
•We want to obtain the fraction of the sample variation in y that is explained by x.
•We will summarize it in one number: R 2 (or coefficient of determination.)
𝑆𝑆𝐸 𝑆𝑆𝑅
𝑅& = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇
Simply to say, 𝑅& is the ratio of the explained variation compared with total variation
Properties of OLS on any Sample of Data: Goodness-of-Fit
0 ≤ 𝑅2 ≤ 1
Interpretations:
• Therefore, years of education explains only about 16% of the variation in hourly wage
• This means that 84% of the variation in hourly wage is left unexplained!
Þ Still, we cannot deny the relationship between hourly wage and education.
Þ 𝑅 2 at some degree explains the relationship between 𝑦# and 𝑥# .
Þ Be careful with the interpretation.
Suggested Exercise
Below you have a random sample with 10 data points. Your observations are (𝑥 𝑖 , 𝑦# ).
∑) ̅
&'((&& '&)(* +
& '*)
• 𝛽3" = ∑) *
&'( && '&̅
• 𝛽G" = 𝑦R − 𝛽G! 𝑥̅
669 66;
• 𝑅& = =1−
66: 66:
Some suggested exercises will be provided after class for you to practice. Please check
Canvas by tomorrow.