100% found this document useful (1 vote)
115 views11 pages

Wooldridge 7e Ch03 SM

Uploaded by

lubiandiego
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
115 views11 pages

Wooldridge 7e Ch03 SM

Uploaded by

lubiandiego
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

23

CHAPTER 3
Multiple Regression Analysis: Estimation
SOLUTIONS TO PROBLEMS

3.1 (i) hsperc is defined so that the smaller it is, the lower the student’s standing in high
school. Everything else equal, the worse the student’s standing in high school, the lower his/her
expected college GPA.

(ii) Just plug these values into the equation

= 1.392  .0135(20) + .00148(1050) = 2.676.

(iii) The difference between A and B is simply 140 times the coefficient on sat, because
hsperc is the same for both students. So A is predicted to have a score .00148(140) .207
higher.

(iv) With hsperc fixed,  = .00148sat. Now, we want to find sat such that
 = .5, so .5 = .00148(sat) or sat = .5/(.00148) 338. Perhaps not surprisingly, a
large ceteris paribus difference in SAT score – almost two and one-half standard deviations – is
needed to obtain a predicted difference in college GPA or a half a point.

3.3 (i) If adults trade off sleep for work, more work implies less sleep (other things equal), so
 < 0.

(ii) The signs of and are not obvious, at least to me. One could argue that more
educated people like to get more out of life, and so, other things equal, they sleep less (  < 0).
The relationship between sleeping and age is more complicated than this model suggests, and
economists are not in the best position to judge such things.

(iii) Since totwrk is in minutes, we must convert five hours into minutes: totwrk =
5(60) = 300. Then sleep is predicted to fall by .148(300) = 44.4 minutes. For a week, 45
minutes less sleep is not an overwhelming change.

(iv) More education implies less predicted time sleeping, but the effect is quite small. If
we assume the difference between college and high school is four years, the college graduate
sleeps about 45 minutes less per week, other things equal.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
24

(v) Not surprisingly, the three explanatory variables explain only about 11.3% of the
variation in sleep. One important factor in the error term is general health. Another is marital
status and whether the person has children. Health (however we measure that), marital status,
and number and ages of children would generally be correlated with totwrk. (For example, less
healthy people would tend to work less.)

3.5 (i) No. By definition, study + sleep + work + leisure = 168. Therefore, if we change study,
we must change at least one of the other categories so that the sum is still 168.

(ii) From part (i), we can write, say, study as a perfect linear function of the other
independent variables: study = 168  sleep  work  leisure. This holds for every observation,
so MLR.3 is violated.

(iii) Simply drop one of the independent variables, say leisure:

GPA = + study + sleep + work + u.

Now, for example, is interpreted as the change in GPA when study increases by one hour,
where sleep, work, and u are all held fixed. If we are holding sleep and work fixed but increasing
study by one hour, then we must be reducing leisure by one hour. The other slope parameters
have a similar interpretation.

3.7 Only (ii), omitting an important variable, can cause bias, and this is true only when the
omitted variable is correlated with the included explanatory variables. The homoskedasticity
assumption, MLR.5, played no role in showing that the OLS estimators are unbiased.
(Homoskedasticity was used to obtain the usual variance formulas for the .) Further, the
degree of collinearity between the explanatory variables in the sample, even if it is reflected in a
correlation as high as .95, does not affect the Gauss-Markov assumptions. Only if there is a
perfect linear relationship among two or more explanatory variables is MLR.3 violated.

3.9 (i)  < 0 because more pollution can be expected to lower housing values; note that is
the elasticity of price with respect to nox. is probably positive because rooms roughly
measures the size of a house. (However, it does not allow us to distinguish homes where each
room is large from homes where each room is small.)

(ii) If we assume that rooms increases with quality of the home, then log(nox) and rooms
are negatively correlated when poorer neighborhoods have more pollution, something that is
often true. We can use Table 3.2 to determine the direction of the bias. If  > 0 and
Corr(x1,x2) < 0, the simple regression estimator has a downward bias. But because  < 0,
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
25

this means that the simple regression, on average, overstates the importance of pollution. [E( )
is more negative than .]

(iii) This is what we expect from the typical sample based on our analysis in part (ii). The
simple regression estimate, 1.043, is more negative (larger in magnitude) than the multiple
regression estimate, .718. As those estimates are only for one sample, we can never know
which is closer to . But if this is a “typical” sample, is closer to .718.

3.11 From equation (3.22), we have

where the are defined in the problem. As usual, we must plug in the true model for yi:

The numerator of this expression simplifies because  = 0,  = 0, and  =

. These all follow from the fact that the are the residuals from the regression of on
: the have zero sample average and are uncorrelated in sample with . So the numerator
of can be expressed as

Putting these back over the denominator gives

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
26

Conditional on all sample values on x1, x2, and x3, only the last term is random due to its
dependence on ui. But E(ui) = 0, and so

which is what we wanted to show. Notice that the term multiplying is the regression
coefficient from the simple regression of xi3 on .

3.13 (i) For notational simplicity, define szx = this is not quite the sample
covariance between z and x because we do not divide by n – 1, but we are only using it to
simplify notation. Then we can write as

This is clearly a linear function of the yi: take the weights to be wi = (zi  )/szx. To show
unbiasedness, as usual we plug yi =  + xi + ui into this equation, and simplify:

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
27

where we use the fact that  = 0 always. Now szx is a function of the zi and xi and the
expected value of each ui is zero conditional on all zi and xi in the sample. Therefore, conditional
on these values,

because E(ui) = 0 for all i.

(ii) From the fourth equation in part (i) we have (again conditional on the zi and xi in the
sample),

because of the homoskedasticity assumption [Var(ui) = 2 for all i]. Given the definition of szx,
this is what we wanted to show.

(iii) We know that Var( ) = 2/ Now we can rearrange the inequality in the

hint, drop from the sample covariance, and cancel n-1 everywhere, to get  ≥

When we multiply through by 2 we get Var( )  Var( ), which is what we


wanted to show.

3.15 (i) The degrees of freedom of the first regression is n – k – 1 = 353 – 1 – 1 = 351.The
degrees of freedom of the second regression is n – k – 1 = 353 – 2 – 1 = 350. The standard error
is smaller than the simple regression equation because one more explanatory variable is included
in the second regression. The SSR falls from 326.196 to 198.475 when another explanatory
variable is added, and the degrees of freedom also falls by one, which affects the standard error.

(ii) Yes, there is a positive moderate correlation between years and rbisyr.
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
28

1 1
VIFyears = 2 = = 2.48139; from this value, we can say that there is little
1−R years 1−0.597
collinearity between years and rbisyr.

(iii) The standard error for the coefficient on years in the multiple regression is:
σ^
se ( β^ years ) =
¿¿¿
The standard error is smaller than the simple regression equation because one more explanatory
variable is included in the second regression. The SSR falls from 326.196 to 198.475 when
another explanatory variable is added. The degrees of freedom also falls by one, which affects
the standard error. Therefore, the standard error for the coefficient of years in the multiple
regression is smaller than its simple regression.

3.17 In this example, each additional year of education is associated with one less year of
experience. Thus, when considering the effect of an additional year of education on wages, we
must also consider the effect of one less year of experience. As education increases by one year,
log wage rises by 0.094, but then experience falling by one year will reduce log wage by 0.026.
The net effect is a 0.068 increase in log wage. As this is a semi-log model, we estimate that an
additional year of education will increase wages by about 6.8%.1

SOLUTIONS TO COMPUTER EXERCISES

C3.1 (i) Probably  > 0, as more income typically means better nutrition for the mother and
better prenatal care.

(ii) On the one hand, an increase in income generally increases the consumption of a food,
and cigs and faminc could be positively correlated. On the other, family incomes are also higher
for families with more education, and more education and cigarette smoking tend to be
negatively correlated. The sample correlation between cigs and faminc is about .173, indicating
a negative correlation.

(iii) The regressions without and with faminc are

and

1
The exact effect is given by 100*[exp(0.068)-1]%=7.04%
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
29

The effect of cigarette smoking is slightly smaller when faminc is added to the regression, but the
difference is not great. This is due to the fact that cigs and faminc are not very correlated, and
the coefficient on faminc is practically small. (The variable faminc is measured in thousands, so
$10,000 more in 1988 income increases predicted birth weight by only .93 ounces.)

C3.3 (i) The constant elasticity equation is

(ii) We cannot include profits in logarithmic form because profits are negative for nine of
the companies in the sample. When we add it in levels form, we get

The coefficient on profits is very small. Here, profits are measured in millions, so if profits
increase by $1 billion, which means  = 1,000 – a huge change – predicted salary
increases by about only 3.6%. However, remember that we are holding sales and market value
fixed.
Together, these variables (and we could drop profits without losing anything) explain
almost 30% of the sample variation in log(salary). This is certainly not “most” of the variation.

(iii) Adding ceoten to the equation gives

This means that one more year as CEO increases predicted salary by about 1.2%.

(iv) The sample correlation between log(mktval) and profits is about .78, which is fairly
high. As we know, this causes no bias in the OLS estimators, although it can cause their
variances to be large. Given the fairly substantial correlation between market value and firm
profits, it is not too surprising that the latter adds nothing to explaining CEO salaries. Also,
profits is a short term measure of how the firm is doing, while mktval is based on past, current,
and expected future profitability.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
30

C3.5 The regression of educ on exper and tenure yields

educ = 13.57  .074 exper + .048 tenure + .


n = 526, R2 = .101.

Now, when we regress log(wage) on we obtain

= 1.62 + .092
n = 526, R2 = .207.

As expected, the coefficient on in the second regression is identical to the coefficient on educ
in equation (3.19). Notice that the R-squared from the above regression is below that in (3.19).
In effect, the regression of log(wage) on explains log(wage) using only the part of educ that is
uncorrelated with exper and tenure; separate effects of exper and tenure are not included.

C3.7 (i) The results of the regression are

n = 408, R2 = .180.

The signs of the estimated slopes imply that more spending increases the pass rate (holding
lnchprg fixed) and a higher poverty rate (proxied well by lnchprg) decreases the pass rate
(holding spending fixed). These are what we expect.

(ii) As usual, the estimated intercept is the predicted value of the dependent variable when
all regressors are set to zero. Setting lnchprg = 0 makes sense, as there are schools with low
poverty rates. Setting log(expend) = 0 does not make sense, because it is the same as setting
expend = 1, and spending is measured in dollars per student. Presumably this is well outside any
sensible range. Not surprisingly, the prediction of a pass rate is nonsensical.

(iii) The simple regression results are

n = 408, R2 = .030.
and the estimated spending effect is larger than it was in part (i) – almost double.
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
31

(iv) The sample correlation between lexpend and lnchprg is about , which means that,
on average, high schools with poorer students spent less per student. This makes sense,
especially in 1993 in Michigan, where school funding was essentially determined by local
property tax collections.

(v) We can use equation (3.23). Because Corr(x1,x2) < 0, which means , and ,
the simple regression estimate, , is larger than the multiple regression estimate, . Intuitively,
failing to account for the poverty rate leads to an overestimate of the effect of spending.

C3.9 (i) The estimated equation is

The R-squared is now about .083, compared with about .014 for the simple regression case.
Therefore, the variables giftlast and propresp help to explain significantly more variation in gifts
in the sample (although still just over eight percent).

(ii) Holding giftlast and propresp fixed, one more mailing per year is estimated to increase
gifts by 2.17 guilders. The simple regression estimate is 2.65, so the multiple regression estimate
is somewhat smaller. Remember, the simple regression estimate holds no other factors fixed.

(iii) Because propresp is a proportion, it makes little sense to increase it by one. Such an
increase can happen only if propresp goes from zero to one. Instead, consider a .10 increase in
propresp, which means a 10 percentage point increase. Then, gift is estimated to be 15.36(.1) 
1.54 guilders higher.

(iv) The estimated equation is

After controlling for the average past gift level, the effect of mailings becomes even smaller:
1.20 guilders, or less than half the effect estimated by simple regression.

(v) After controlling for the average of past gifts – which we can view as measuring the
“typical” generosity of the person and is positively related to the current gift level – we find that
the current gift amount is negatively related to the most recent gift. A negative relationship
makes some sense, as people might follow a large donation with a smaller one.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
32

C3.11 (i) The regression results are:

^
math 4=96.7704−0.8328 pctsgle .

The percentage of children not in the married-couples families has a negative impact on
percentage of satisfactory level of 4th grade math. The effect of single parenthood seem small. If,
say, pctsgle increases by .10 (ten percentage points), the percentage of satisfactory level of 4th
grade math is estimated to decrease by .08328 percentage, which is a small effect.

(ii) The estimated regression results are:

^
math 4=51.723−0.1996 pctsgle−0.3964 free +3.5601 lmedinc.

The coefficient of pctsgle has negatively increased from -0.8328 to -0.1996. This means that, as
the percentage of children not in married couples increases, the percentage of satisfactory level
of 4th grade math decreases.

(iii) The sample correlation between lmedinc and free is -0.74. This is the expected relationship
because as the median income increases, the eligibility of the free lunch decreases.

(iv) No, because high correlations among the variables lmedinc and free do not make it more
difficult to determine the causal effect of single parenthood on student performance.

1 1
(v) VIFpctsgle = 2 = = 1.6116.
1−R 1−0.3795
1 1
VIFfree = 2 = = 1.8034.
1−R 1−0.4455
1 1
VIFlmedinc = 2 = = 1.4732.
1−R 1−0.3212

By comparing the three variables, it is very clear that the variable free has the highest VIF.
No, this knowledge does not affect the model to study the causal effect of single parenthood on
math performance.

^
C3.13 (i) Regressing colGPA on PC yields Col G PA=2.989
(0.040)
+ 0.170 PC . Students without a PC
(0.063)
have on average a 2.989 GPA, while students who own a PC have college GPAs on average 0.17
points higher.

^
(ii) Col G PA=1.264
(0.333)
+0.157 PC +0.447 hsGPA +0.009 ACT . Including information on high
(0.057) (0.094 ) (0.011)
school GPA and ACT scores does reduce the magnitude of the PC variable, but not by a large
amount. Thus, a small portion of the effects of these other variables were being picked up by PC
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.
33

ownership, but even after controlling for these variables, PC ownership still has a significant
impact on college GPA.

(iii) A 10 point increase in ACT score is predicted to raise college GPA by 0.09 points. This is a
smaller effect than owning a PC, all else equal.

(iv) Adding binary indicators for whether an individual’s mother and father being college
graduates does not have much of an impact on the marginal effect of PC ownership. The
coefficient falls from 0.157 in part ii to 0.152. Thus, it does not appear that parent’s education
has much of an impact on the effect of PC ownership on college GPA. The regression that
includes all of these variables is explaining 22.2% of the variation in college GPA.

(v) Though high school GPA and ACT scores are correlated, they are not perfectly correlated.
Thus, each variable is adding some unique variation to the regression. On one hand, including
highly correlated variables will cause the precision of our estimators to decline. On the other
hand, omitting a relevant variable could lead to biased estimates. Would we rather have less
precise, but unbiased estimators or more precise, but biased estimators? Most econometricians
would argue that the former is preferred.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website,
in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise
on a password-protected website or school-approved learning management system for classroom use.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy