Solutions for Tutorial 2
Solutions for Tutorial 2
We know that the distribution is normally distributed with a mean of “a” and a
variance of σ 2. Our explanatory variable X is also normally distributed with a mean of
“b” and a variance of σ 2.
We know that β 0 is the intercept of our regression.
Therefore, we need to find the estimate for our coefficient: ^β 0
We know the formulas for the two estimators from the lecture:
Additionally, we can remember our formulas for y∧x (and remember that in this
case, the line above y and x represents the mean)
We know the formulas for the means (& even though we do not need them to solve
this exercise, it is good to be aware of them again!)
n n
1
y= ∑ y ∧x= 1n ∑ x i
n i=1 i i =1
( ) ( )
n n
1 1
E [ ^β 0 ]=E y= ∑ y i − ^β 1 E x = ∑ x i
n i=1 n i=1
And given that we know that ^β 1=10, the solution is relatively easy:
E [ ^β 0 ]= y−10 x
To really understand what we are doing behind the scenes of this solutions, it is
important to remember how we derive at the formula for ^β 0
n n
min : ∑ u^ 2i =min ∑ ( y i− ^β0 − ^β 1 x i)2
i=1 i=1
n
∂ ∑ u^ 2i n
i=1
=−2 ∑ ( y i− β^ 0 − ^β1 x i)
∂ β^ 0 i=1
n
∂ ∑ u^ 2i n
i=1
=−2 ∑ ( y i− β^ 0− ^β1 x i) =0
∂ β^ 0 i=1
As we are setting it to 0, we can “ignore” the -2 by multiplying our right side of the
equation, our 0 by -2. Furthermore, we also need to remember that when we have a
summation outside of our brackets, this means that we have to sum up all the
individual terms, which gives us the following:
n n n
→∑ y i−¿ ∑ ^β 0−∑ β^ 1 xi ¿
i=1 i=1 i=1
And again, can factor out our estimators in front of the equation, which in the case of
^β leaves us with:
0
n
^β 0 ∑ 1=n
i=1
n n
1 1
→ ^β 0= ∑
n i=1
y i− β^ 1 ∑ x i
n i=1
Now, after we have taken the derivative, we are back at the start and we can see
where our equation originally comes from that helped us estimate our ^β 0
The OVB occurs when a statistical model leaves out one or more relevant variables. The
bias results in the model attributing the effect of the missing variables to those that were
included. It arises when the regressor X is correlated with an omitted variable. For omitted
variable bias to occur, two conditions must be fulfilled:
~
1. The omitted variable is correlated with the included regressor, δ 1 ≠ 0
2. The omitted variable is a determinant of the dependent variable Y, ^β 2 ≠ 0
In this case 1, our Model 2 would exclude an important variable. This would
introduce a bias!
~
Therefore, our β 1 from Model 2 would be biased.
o Because in this case, including X2 from Model actually explains our model
better than Model 1. As otherwise, our β 2would have been “absorbed” by the
error term
This bias comes from the relationship between X1 and X2 which can be indicated
~
with δ 1 * β 2(this describes the bias in this example)
~
(We also know that δ 1 is the regression coefficient where the excluded variable is
the dependent variable.)
β 2=0
~
o Here, we know that β 1 from Model 2 would be unbiased
~
o Why? Because the bias ( δ 1 * β 2), which is a product, has one of the elements equal to
0. It would therefore disappear in this case.
o Note: In model 1, ^β 1 remains unbiased
By simply looking at the bias, we would always choose Model 1. As this prevents us from
having an omitted variable bias.
It is the ratio of the variance of the error divided by the Total Sum of Squares of the
relevant variable for ^β 1 which is X1, times 1 – R squared of X1.
Note & remember: R2 is a measure of how close the data is fitted to the regression line
Case 1:
If X1 and X2 are uncorrelated, then R squared of X1 is equal to 0
~
corr ( X 1 , X 2) =0 , then R2X 1=0∧then→ var ( β^ 1 ) =var ( β1 )
o When this is the case, the two variances are the same!
Case 2:
~
corr ( X 1 , X 2) ≠ 0 , then R2X 1 ≠ 0∧then→ var ( β^ 1 ) > var ( β 1 )
~
Here, the variance of β 1 will be smaller than the variance of ^β 1 as we will always have
a smaller number in the denominator
This is because with ^β 1, we always multiply the Total Sum of Squares with (1−R 2X 1) ,
which is smaller than 1 if they are correlated
~
o Therefore, with ^β 1 from Model 1, the SST is a smaller number than β 1 from
Model 2.
~
o And our variance of ^β 1 will be greater than our variance of β 1
o this is always the case in this circumstance
Bringing these results together:
n
E [ ∑ ( u2i −2∗ui∗u+u 2 ) ]
i=1
Once more, to work with the individual terms and isolate our various parts of the
equations, it can be easier to re-write it with the summations inside of our bracket:
n n n
E [ ∑ ui2−2∗∑ u i∗u+ ∑ u 2]
i=1 i=1 i=1
Let’s have a look at the blue part of the equation. We can see that the first part of it
is very similar to the formula of an average. Remember:
n
1
u= ∑u
n i=1 i
1
What we do not have in this particular blue term, however, is our . Therefore, if we
n
1
divide the summation of ui with , then we also need to multiply the term by n to
n
compensate for this step.
o We du this to replace the summation of ui with another u , so that we can
simplify the equation with a squared u:
n n
E [ ∑ ui2−2∗n∗u2 + ∑ u 2]
i=1 i=1
i=1
And again, when we have a case like this, it leaves us with n instead of a summation.
Therefore, we can simplify the equation:
[∑ ]
n
E u 2i −2∗n∗u2 +n∗u2 which is :
i=1
n
E [ ∑ ui2−2 nu 2+ nu 2]
i=1
Now, we have a simply calculation of -2 + 1:
n
E [ ∑ ui2−n u2 ]
i=1
In order to continue, we can make use of the properties of the Expected value!
Where the expected value of a sum of term is the sum of the expected value of each
of the elements
∑ ¿¿ ¿
i=1
All we have to do now is think back to what we know from the exercise:
o We know that ui is distributed Normally with a mean of 0 and a variance of σ 2
2
σ
o And we know that var( u) is
n
So in the next step, we simply fill in the terms into our equation:
∑ ¿¿ ¿
i=1
n
n∗σ 2
¿ ∑ σ 2+ 0− +0
i=1 n
2
2 σ 2
¿ n σ −n =σ (n−1)
n
Exercise 4 (i)
We want to know what the value of the 25th Percentile is and what it means.
The 25th percentile is often also called the First Quartile Q1
o Note: The Second Quartile Q2 is what we know as the Median and it is the 50th
percentile
We can also picture it as the area under our distribution in which a certain
percentage lies In this case, it marks the value of our distribution of wages under
which we can find 25% of the observations
Step 2:
We can use the command “sum wage, detail”
Alternatively, we can go into “Statistics Other Tables Compact table of
summary statistics” and fill in what kind of statistics we want
Or: we can also use: centile (wage), centile (25 75)
We can now see that the 25% percentile is 3.33
Under the value of 3.33 hourly wage, we find 25% of average earnings
Exercise 4 (ii)
We want to compute the 95% confidence interval for the mean of our wage variable
Firstly: What is the 95% confidence interval?
o A 95% confidence interval for μY is a random variable that contains the true
value of μY in 95% of all possible random samples. Or, another definition:
o “For a given statistic calculated for a sample of observations (e.g. mean), the
confidence interval is a range of values around that statistic that are believed
to contain, with a certain probability (e.g. 95%), the true value of the statistic
(i.e., the population value)
We can of course use STATA to calculate our Cis, but it is important to understand
the Maths behind it as well.
The 95% CI is calculated as follows:
Note. Definition of Standard Error: the standard deviation of the sampling distribution* of a
statistic. For a given statistic (e.g., the mean) it tells us how much variability there is in the
statistic across samples from the same population. Large values, therefore, indicate that a
statistic from a given sample may not be an accurate reflection of the population from which
the sample came.
3.693
SE= ≈
√ 526
We can then do the calculations in STATA:
Legend:
SS – These are the Sum of Squares associated with the source of the variance, Total,
Modal and Residual
df – Degrees of freedom. The total variance has N-1 degrees of freedom. In this case,
there were 526 participants, so the DF for the residual is 524. And the df for our
model corresponds to the number of predictors minus 1 (K -1)
MS – These are the Mean Squares, the Sum of Squares divided by their respective df.
The coefficient indicates the change of our dependent variable wage with one extra
year of education = .54135
To answer the question, we simply have to multiply this by four:
The answer to the question is: with 4 years of education, a person from the
sample earns, on average, 2.165 pounds more
Exercise 4 (iv)
https://stats.oarc.ucla.edu/stata/faq/how-can-i-do-a-scatterplot-with-regression-line-in-
stata/
Alternatively, you can also go into Graphics Twoway graphs (scatter, line etc.)
Using the command console gives us more freedom to edit our graphs and to add the
regression line:
graph twoway (scatter wage educ) (lfit wage educ), graphregion(color(white))
What can we see from this plot? What would you say about the spread?
We can see that for lower years of education, the variables are scattered closely to
the regression line while at higher levels of education, the spread of the fitted value
is much wider this can also be confirmed when we look at:
Exercise 4 (v)
Where we are going to plot the residuals against the variable of education
To do this, we first have to use the following command:
o predict residual, residual
predict either gives us the fitted value or the residual & here, we
are telling STATA that we want to call it the residual. And the
second time of writing residual tells STATA that we want u^
And to draw the graph we use:
o twoway (scatter residual educ) (lfit resid educ)
The command says that we want to do a scatterplot where we plot the residuals
against the years of education and add the linear fit.
We can see that at low levels of education, the residuals are closer to 0
However, when years of education increase, the spread around 0 gets wider and
wider
The variation of the residuals increases with education
This is a sign that the variance of the error term is not constant for every given
variable for years of education. Therefore we have heteroskedasticity