0% found this document useful (0 votes)
13 views14 pages

Solutions for Tutorial 2

This document provides a tutorial on simple and multiple regression analysis using STATA, covering key concepts such as coefficient estimation, omitted variable bias, and variance analysis. It discusses the derivation of the intercept and slope coefficients, the implications of including or excluding variables in a regression model, and the trade-offs between bias and variance. Additionally, it explains how to calculate percentiles in a dataset.

Uploaded by

kvffhryykg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

Solutions for Tutorial 2

This document provides a tutorial on simple and multiple regression analysis using STATA, covering key concepts such as coefficient estimation, omitted variable bias, and variance analysis. It discusses the derivation of the intercept and slope coefficients, the implications of including or excluding variables in a regression model, and the trade-offs between bias and variance. Additionally, it explains how to calculate percentiles in a dataset.

Uploaded by

kvffhryykg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Tutorial 2

Simple & Multiple Regression & STATA

 We know that the distribution is normally distributed with a mean of “a” and a
variance of σ 2. Our explanatory variable X is also normally distributed with a mean of
“b” and a variance of σ 2.
 We know that β 0 is the intercept of our regression.
 Therefore, we need to find the estimate for our coefficient: ^β 0
 We know the formulas for the two estimators from the lecture:

 Additionally, we can remember our formulas for y∧x (and remember that in this
case, the line above y and x represents the mean)
 We know the formulas for the means (& even though we do not need them to solve
this exercise, it is good to be aware of them again!)

n n
1
y= ∑ y ∧x= 1n ∑ x i
n i=1 i i =1

 So for this example, we could also see our equation as:

( ) ( )
n n
1 1
E [ ^β 0 ]=E y= ∑ y i − ^β 1 E x = ∑ x i
n i=1 n i=1

 And given that we know that ^β 1=10, the solution is relatively easy:

E [ ^β 0 ]= y−10 x
 To really understand what we are doing behind the scenes of this solutions, it is
important to remember how we derive at the formula for ^β 0

 ^β comes from the minimization of the sum of squares residuals:


0
n
minimise : ∑ u^ 2i
i=1
 How do we derive at the sum of squared residuals?

n n
min : ∑ u^ 2i =min ∑ ( y i− ^β0 − ^β 1 x i)2
i=1 i=1

 In other words: We have our values of y (our dependent variable/explained variable)


minus the way we predict these values with ^β 0 and ^β 1×x i all squared
 We know that in order to solve the minimization, we have to take the derivative of
the sum of square residual with respect to the parameter of interest ^β 0:
o ! Calculus also provides us with rules such as the chain rule in order to solve
the derivation equation.
o Therefore, the two in front of our equation comes from the previous
exponent. And its negative sign from the derivative of the argument of the
objective function with respect to ^β 0 which is -1

n
∂ ∑ u^ 2i n
i=1
=−2 ∑ ( y i− β^ 0 − ^β1 x i)
∂ β^ 0 i=1

 And this derivative needs to be set to 0.

n
∂ ∑ u^ 2i n
i=1
=−2 ∑ ( y i− β^ 0− ^β1 x i) =0
∂ β^ 0 i=1

 As we are setting it to 0, we can “ignore” the -2 by multiplying our right side of the
equation, our 0 by -2. Furthermore, we also need to remember that when we have a
summation outside of our brackets, this means that we have to sum up all the
individual terms, which gives us the following:

n n n
→∑ y i−¿ ∑ ^β 0−∑ β^ 1 xi ¿
i=1 i=1 i=1
 And again, can factor out our estimators in front of the equation, which in the case of
^β leaves us with:
0
n
^β 0 ∑ 1=n
i=1

 By solving this equation, we can isolate ^β 0


n n
→ ∑ y i−n ^β 0− β^ 1 ∑ xi
i=1 i=1

n n
1 1
→ ^β 0= ∑
n i=1
y i− β^ 1 ∑ x i
n i=1

 Now, after we have taken the derivative, we are back at the start and we can see
where our equation originally comes from that helped us estimate our ^β 0

 We know that ^β 1 is an unbiased estimator in model 1 which has two explanatory


variables, X1 and X2. In comparison, our model 2 has only one explanatory variables
~
 To compare the beta 1 estimators, we can call the beta1 of model 2: β 1 to
differentiate the two coefficients
 The answer is:
No. Model 1 should not be preferred over Model 2.
 And here is why:
 Whenever we include another variable, we have a trade-off:
o A trade-off between variance and bias
 In model 2, we exclude X2 and its coefficient  therefore, everything we have in our
X2 and its coefficient goes into the error term of model 2.
 In order to solve this exercise & find an answer, we need to consider a few possible
scenarios concerning the values of our coefficients & investigate the trade-offs.
Step 1. Examining the Bias-Aspect:

Helpful Notes: Remember the definition for Omitted-variable bias

 The OVB occurs when a statistical model leaves out one or more relevant variables. The
bias results in the model attributing the effect of the missing variables to those that were
included. It arises when the regressor X is correlated with an omitted variable. For omitted
variable bias to occur, two conditions must be fulfilled:
~
1. The omitted variable is correlated with the included regressor, δ 1 ≠ 0
2. The omitted variable is a determinant of the dependent variable Y, ^β 2 ≠ 0

Together, this results in a violation of the OLS assumption: E ( ui| X i )=0

Case 1: Beta 2 is different from 0.


β2≠ 0

 In this case 1, our Model 2 would exclude an important variable. This would
introduce a bias!
~
 Therefore, our β 1 from Model 2 would be biased.
o Because in this case, including X2 from Model actually explains our model
better than Model 1. As otherwise, our β 2would have been “absorbed” by the
error term
 This bias comes from the relationship between X1 and X2  which can be indicated
~
with δ 1 * β 2(this describes the bias in this example)
~
 (We also know that δ 1 is the regression coefficient where the excluded variable is
the dependent variable.)

Case 2: Beta 2 is equal to 0.

β 2=0

~
o Here, we know that β 1 from Model 2 would be unbiased
~
o Why? Because the bias ( δ 1 * β 2), which is a product, has one of the elements equal to
0. It would therefore disappear in this case.
o Note: In model 1, ^β 1 remains unbiased

 By simply looking at the bias, we would always choose Model 1. As this prevents us from
having an omitted variable bias.

Examining the Variance-Aspect:

 We are now looking at the variance


 We know that the variance of ^β 1 in Model 1 is:
2
σ
var ( ^β 1) = 2
SST X 1 (1−R X 1 )

 It is the ratio of the variance of the error divided by the Total Sum of Squares of the
relevant variable for ^β 1 which is X1, times 1 – R squared of X1.

Note & remember: R2 is a measure of how close the data is fitted to the regression line

 Now that we looked at Model 1, we need to look at –


~
 Variance of β 1 in Model 2:
2
~ σ
var ( β 1) =
SST X 1

Comparing these two variances:

Case 1:
 If X1 and X2 are uncorrelated, then R squared of X1 is equal to 0

~
corr ( X 1 , X 2) =0 , then R2X 1=0∧then→ var ( β^ 1 ) =var ( β1 )

o When this is the case, the two variances are the same!

Case 2:

 If X1 and X2 are correlated, then our R squared of X1 won’t be equal to 0

~
corr ( X 1 , X 2) ≠ 0 , then R2X 1 ≠ 0∧then→ var ( β^ 1 ) > var ( β 1 )

~
 Here, the variance of β 1 will be smaller than the variance of ^β 1 as we will always have
a smaller number in the denominator
 This is because with ^β 1, we always multiply the Total Sum of Squares with (1−R 2X 1) ,
which is smaller than 1 if they are correlated
~
o Therefore, with ^β 1 from Model 1, the SST is a smaller number than β 1 from
Model 2.
~
o And our variance of ^β 1 will be greater than our variance of β 1
o  this is always the case in this circumstance
Bringing these results together:

Case 1  We are Choosing Model 1


 When we have a Model 1 where β 2 ≠ 0 (in this case, beta 2 has explanatory power in
~
relation to the outcome), then β 1 from Model 2 is biased!
o Still, our ^β 1 is unbiased
~
 At the same time, the variance of β 1 < variance of ^β 1 (or as we said above:
~
var ( ^β 1) >var ( β1 ))
 Therefore, we choose Model 1
 It is the only Model that leads us to a coefficient which is unbiased

Case 2  Choosing Model 2


~
 When we have a model 1 where β 2=0 , then we know that β 1 from Model 2 is
unbiased (here, we do not have an omitted variable bias).
~
 Then, ^β 1 is also unbiased with variance of β 1 < variance of ^β 1
 Therefore, two coefficients are unbiased
 And the variance of ^β 1 is smaller.  we gain more precision in our estimator of β 1
 More precision. Less Bias. Both in Model 2.  therefore, we choose this one.

 We know that ui is distributed Normally with a mean of 0 and a variance of σ 2


 Just as last week, it can be incredibly useful to write-out the squared term of our
summation:
o Tip: remember the binomial formulas we discussed!

n
E [ ∑ ( u2i −2∗ui∗u+u 2 ) ]
i=1

 Once more, to work with the individual terms and isolate our various parts of the
equations, it can be easier to re-write it with the summations inside of our bracket:

n n n
E [ ∑ ui2−2∗∑ u i∗u+ ∑ u 2]
i=1 i=1 i=1
 Let’s have a look at the blue part of the equation. We can see that the first part of it
is very similar to the formula of an average. Remember:

n
1
u= ∑u
n i=1 i
1
 What we do not have in this particular blue term, however, is our . Therefore, if we
n
1
divide the summation of ui with , then we also need to multiply the term by n to
n
compensate for this step.
o We du this to replace the summation of ui with another u , so that we can
simplify the equation with a squared u:

n n
E [ ∑ ui2−2∗n∗u2 + ∑ u 2]
i=1 i=1

 We can now look at the green part of the equation.


o Tip: remember that when we have a small i with our term, it means that we
are taking an iteration. However, when we have a bar, it means that we are
considering a constant factor.
n
 Given that u , we can take it outside of the summation term. Leaving us with ∑ 1 .
2

i=1
And again, when we have a case like this, it leaves us with n instead of a summation.
Therefore, we can simplify the equation:

[∑ ]
n
E u 2i −2∗n∗u2 +n∗u2 which is :
i=1

n
E [ ∑ ui2−2 nu 2+ nu 2]
i=1
 Now, we have a simply calculation of -2 + 1:

n
E [ ∑ ui2−n u2 ]
i=1

 In order to continue, we can make use of the properties of the Expected value!
Where the expected value of a sum of term is the sum of the expected value of each
of the elements

∑ E ( u2i )−n∗E (u2 )


i=1
 In a final step, we need to make a leap in order to solve the equation further.
 Looking back at the lectures, we know the following:

var ( ui )=E ( u 2i )−¿


 We can re-arrange this formula to simplify our above equation
 If we simply take ¿ to the other side of the equation, we can get a term that can
replace E ( u i )
2

∑ ¿¿ ¿
i=1

 All we have to do now is think back to what we know from the exercise:
o We know that ui is distributed Normally with a mean of 0 and a variance of σ 2
2
σ
o And we know that var( u) is
n
 So in the next step, we simply fill in the terms into our equation:

∑ ¿¿ ¿
i=1

n
n∗σ 2
¿ ∑ σ 2+ 0− +0
i=1 n

 We now have a sum of numbers that is sigma squared n times:

2
2 σ 2
¿ n σ −n =σ (n−1)
n

 Interpretation of the Expected Value:


o Given that we have to our σ 2 multiplied by n-1, we know that it is not
unbiased
STATA Part: Exercise Number 4

Exercise 4 (i)

 We want to know what the value of the 25th Percentile is and what it means.
 The 25th percentile is often also called the First Quartile Q1
o Note: The Second Quartile Q2 is what we know as the Median and it is the 50th
percentile
 We can also picture it as the area under our distribution in which a certain
percentage lies  In this case, it marks the value of our distribution of wages under
which we can find 25% of the observations

Step 1: Load in our Data

Step 2:
 We can use the command “sum wage, detail”
 Alternatively, we can go into “Statistics  Other Tables  Compact table of
summary statistics” and fill in what kind of statistics we want
 Or: we can also use: centile (wage), centile (25 75)
 We can now see that the 25% percentile is 3.33
 Under the value of 3.33 hourly wage, we find 25% of average earnings

Exercise 4 (ii)

 We want to compute the 95% confidence interval for the mean of our wage variable
 Firstly: What is the 95% confidence interval?
o A 95% confidence interval for μY is a random variable that contains the true
value of μY in 95% of all possible random samples. Or, another definition:
o “For a given statistic calculated for a sample of observations (e.g. mean), the
confidence interval is a range of values around that statistic that are believed
to contain, with a certain probability (e.g. 95%), the true value of the statistic
(i.e., the population value)
 We can of course use STATA to calculate our Cis, but it is important to understand
the Maths behind it as well.
 The 95% CI is calculated as follows:

lower boundary of CI =X−(1.96∗Standard Error)

upper boundary of CI =X +(1.96∗Standard Error )

Note. Definition of Standard Error: the standard deviation of the sampling distribution* of a
statistic. For a given statistic (e.g., the mean) it tells us how much variability there is in the
statistic across samples from the same population. Large values, therefore, indicate that a
statistic from a given sample may not be an accurate reflection of the population from which
the sample came.

Note & Reminder. Definition of Sampling Distribution: the probability distribution of a


statistic. We can think of this as follows: if we take a sample from a population and calculate
some statistic (e.g. the mean), the value of this statistic will depend on the sample we took.
As suck, the statistic will vary slightly from sample to sample. If we took lots and lots of
samples from the population and calculated the statistic of interest, we could create a
frequency distribution of the values we got. The resulting distribution is what the sampling
distribution represents: the distribution of possible values of a given statistic that we could
expect to get from a given population

 We know that the critical value (z-Score) for 95% CI is 1.96


 Our mean is, as we saw in the table, 5.896
 The standard error is calculated by dividing the standard deviation of the sample
size’s (here, n = 526) square root:

3.693
SE= ≈
√ 526
 We can then do the calculations in STATA:

Alternatively, we can also go into Statistics  Summary Statistics  Confidence Intervals


Exercise 4 (iii)

 We are running a simple regression of wage on education to examine the predicted


effect of 4 more years of education on wages ()
 The command is: regress wage educ
 It can be found under: Statistics  Linear Models & Related  Linear Regression
 We set our dependent variable as wage and our independent variable as education

Legend:
 SS – These are the Sum of Squares associated with the source of the variance, Total,
Modal and Residual
 df – Degrees of freedom. The total variance has N-1 degrees of freedom. In this case,
there were 526 participants, so the DF for the residual is 524. And the df for our
model corresponds to the number of predictors minus 1 (K -1)
 MS – These are the Mean Squares, the Sum of Squares divided by their respective df.

 The coefficient indicates the change of our dependent variable wage with one extra
year of education = .54135
 To answer the question, we simply have to multiply this by four:

 The answer to the question is: with 4 years of education, a person from the
sample earns, on average, 2.165 pounds more
Exercise 4 (iv)

 We are doing & analysing a scatterplot!


 One important thing you can learn when trying to write code for any statistical
program (regardless if you use STATA, SPSS or R), it can be useful to research your
commands & play around with what you find.
 Here is an example for scatterplots:

https://stats.oarc.ucla.edu/stata/faq/how-can-i-do-a-scatterplot-with-regression-line-in-
stata/

 Alternatively, you can also go into Graphics  Twoway graphs (scatter, line etc.)

 Using the command console gives us more freedom to edit our graphs and to add the
regression line:
 graph twoway (scatter wage educ) (lfit wage educ), graphregion(color(white))

 What can we see from this plot? What would you say about the spread?
 We can see that for lower years of education, the variables are scattered closely to
the regression line while at higher levels of education, the spread of the fitted value
is much wider  this can also be confirmed when we look at:

Exercise 4 (v)

 Where we are going to plot the residuals against the variable of education
 To do this, we first have to use the following command:
o predict residual, residual
 predict either gives us the fitted value or the residual & here, we
are telling STATA that we want to call it the residual. And the
second time of writing residual tells STATA that we want u^
 And to draw the graph we use:
o twoway (scatter residual educ) (lfit resid educ)
 The command says that we want to do a scatterplot where we plot the residuals
against the years of education and add the linear fit.

 We can see that at low levels of education, the residuals are closer to 0
 However, when years of education increase, the spread around 0 gets wider and
wider
 The variation of the residuals increases with education
 This is a sign that the variance of the error term is not constant for every given
variable for years of education. Therefore  we have heteroskedasticity

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy