0% found this document useful (0 votes)
9 views32 pages

311 Slide ch6

Chapter 6 of Wooldridge’s textbook discusses relaxing the assumption of linearity in regression models, introducing log-level, level-log, and log-log models to account for nonlinear relationships. It also covers the use of quadratic and interaction terms to analyze varying marginal effects and decision-making based on residual analysis. Examples illustrate how these models can be applied in real estate to understand price changes relative to factors like age and area.

Uploaded by

zhihanyu3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views32 pages

311 Slide ch6

Chapter 6 of Wooldridge’s textbook discusses relaxing the assumption of linearity in regression models, introducing log-level, level-log, and log-log models to account for nonlinear relationships. It also covers the use of quadratic and interaction terms to analyze varying marginal effects and decision-making based on residual analysis. Examples illustrate how these models can be applied in real estate to understand price changes relative to factors like age and area.

Uploaded by

zhihanyu3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Chapter 6 of Wooldridge’s textbook

1
Big Picture

In this lecture you learn

1. how to relax the assumption of linearity

2. models with log term, squared term, and interaction term

3. decision-making based on residual analysis

2
Assumption of Linearity

1. Consider a simple regression


y = β0 + β1 x + u (1)

2. This regression is linear in the sense that it assumes constant marginal effect of x on y :
dy
= β1 = constant (2)
dx
So when x changes, y changes at a constant rate

3. In the graph, the linear model can be represented by a straight line

4. For example, if y is house price and x is age, then the linear model assumes the
depreciation rate is constant

5. In reality the relationship between y and x can be nonlinear or the marginal effect can be
varying. That is the motivation for nonlinear models such as the log-level model

3
Log-level model

1. Consider an exponential growth model

y = eβ0 +β1 x+u (3)

For instance, y can be the number of confirmed cases of coronavirus and x is time

2. We get the log-level model after we take natural log

log(y) = β0 + β1 x + u (log-level model) (4)

3. We can show the marginal effect of log-level model is not constant


dy
= β1 eβ0 +β1 x+u = β1 y 6= constant (5)
dx
4. (critical thinking) Can we use log-level model when y takes negative values?

5. (critical thinking) Which model is easier to estimate, (3) or (4)?

4
An Approximation

1. There is an approximation when A and B are close


A−B
≈ log(A) − log(B) (6)
B
A−B
2. Proof (optional): for x ≈ 0 we have log(1 + x) ≈ x. We can prove (6) by letting x ≡ B
and applying the property of log function:
A−B
≡x
B
≈ log(1 + x)
   
A−B B A−B
= log 1 + = log +
B B B
 
A
= log
B
= log(A) − log(B)

3. In short, 100 times log difference approximates percentage change


5
Log-Level Model and Percentage Change

Consider taking derivative of (4) with respect to x :


d log(y)
β1 = (7)
dx
Thus when dx = 1 it follows that

β1 = d log(y) ⇒ 100β1 = percentage change of y (8)

In short, 100 times β1 in the log-level model gives the percentage change of y when x
changes by one unit

6
Example 1

7
Example 1

1. We use House data

2. We regress rprice onto age

3. The coefficient of age is -337. So age rising by one year is associated with price
decreasing by 337 dollar

4. We wonder whether 337 dollar is a big or small change. To put that number into
perspective, we divide it by the average house price 83721. The ratio times 100 equals
0.4031 percent

5. Alternatively, we get a similar percentage change using the log-level model

8
Example 1—continued

9
In Class Discussion

Consider a new level-log model

y = β0 + β1 log(x) + u (Level-log model)


dy
1. Find the marginal effect dx

2. Is the marginal effect constant?

3. How to tell which model to use, log-level or level-log? (Hint: compare the graphs of
y = ex and y = log(x))

10
Log-log Model

If we take logs of both dependent and independent variables, we get the log-log model that
indicates elasticity

log(y) = β0 + β1 log(x) + u (log-log model) (9)

where β1 measures the percentage change of y when x changes by one percent (not one unit).
d log(y) 100d log(y) percent change of y
β1 = = = = elasticity (10)
d log(x) 100d log(x) percent change of x
In short β1 measures elasticity

11
Example 2

1. We generate the log value using stata log function and gen command

2. We then fit the log-log model

3. The coefficient of log area is 0.7845. So area rising by one percent is associated with
price increasing by 0.7845 percent (less than one percent)

4. The price-area relationship is inelastic

12
Example 2

13
Model with Quadratic (Squared) Term

1. Another way to account for non-linearity (non-constant marginal effect) is using a


quadratic model:

y = β0 + β1 x + β2 x 2 + u (Quadratic Model) (11)

2. The marginal effect depends on x, so is non-constant


dy
= β1 + 2β2 x (12)
dx
3. (True or False) β1 measures the marginal effect

4. Compared to models using log values, the quadratic model can allow for a turning point.

5. There is a minimum if β2 > 0 and maximum if β2 < 0. By setting (12) to zero we locate
the turning point at
turning point β1
x =− (13)
2β2
14
Testing Constant Marginal Effect

1. From (12) it is evident that testing

H0 : β2 = 0 (14)

is the same as testing the marginal effect is constant

2. We reject the null hypothesis if the t statistic of β2 exceeds 1.96 in absolute value, or its
p-value is less than 0.05

3. We can start with a quadratic model. If it turns out that β2 is insignificant, we just drop
the squared term x2 and run a linear model. This is the general-to-specific modeling
strategy

15
Example 3

16
Example 3

1. We first regress house price onto age without a squared term

2. The fitted line is a downward-sloping straight line (why?)

3. Obviously the linear model does a bad job predicting those old houses with age greater
than 100

4. Most old houses lying above the fitted line implies systematic prediction errors and
existence of a turning point or nonlinear relationship

17
Example 3—continued

18
Example 3—continued

1. We next regress house price onto age and its squared term

2. The fitted line is a parabola (cup) facing upward

3. We get a better fit because now those old houses scatter around the new fitted line

19
Example 3—continued

20
Example 3—continued

1. The coefficient of squared term age2 is 8.29, positive and statistically significant at 5%
level

2. t statistic of β2 is 10.62 > 1.96, rejecting the hypothesis (14) of constant marginal effect

3. We use formula (13) to show that the minimum (turning point) is located where age =
87.87.

4. Before the turning point, the house price falls when a house gets older. After the turning
point, the house price starts to rise.

21
In Class Discussion

Consider the relationship between rprice and area

1. Can a house be too big in the sense that there is a turn point, after which rprice starts to
fall when area keeps rising?

2. What is the sign you expect for the squared area?

3. How to find the optimal area that maximizes the house price?

22
Model with Interaction Term

1. We can include an interaction term (product of two regressors) to allow the marginal
effect of one regressor to depend on the other regressor

y = β0 + β1 x1 + β2 x1 x2 + u (Model with Interaction Term) (15)

2. The marginal effect of x1 depends on x2 (called interaction effect)


dy
= β1 + β2 x2 (16)
dx1

3. Testing the hypothesis of no interaction effect amounts to testing

H0 : β2 = 0 (17)

23
Example 4

24
Example 4

1. A house is young if its age is less than 18, the average age. Otherwise a house is old

2. We run two regressions separately using young and old houses

3. We find that for young houses the marginal effect of baths on rprice is 28580, greater
than the marginal effect of 21032 of old houses

4. In short, we find evidence supporting the interaction effect (i.e., age matters for the
marginal effect of baths on rprice)

25
Example 4—continued

26
Example 4—continued

1. We generate the interaction term

2. The coefficient of interaction term is -30.46

3. That number is negative, so age affects the marginal effect of baths negatively

4. That number is significant at 10% level (according to the p-value), so the interaction
effect exists

5. From (16) we know that for a brand-new house (age=0), one more bathroom is
associated with 29350 price increase. For an one-year old house (age=1), the marginal
effect is 29350-30 = 29320

27
In Class Exercise

1. We want to know whether the depreciation rate of an aging house depends on the
number of bathrooms

2. Please specify a proper model and run a regression to find the answer.

3. Is the interaction effect statistically significant?

4. Which house depreciates faster when it gets old, the one with one bathroom or two
bathrooms?

28
Prediction and Residual Analysis

1. Consider a simple model


y = β0 + β1 x + u

2. The fitted or predicted value (denoted by ŷ) for given x = c is computed as

ŷ = β̂0 + β̂1 c (18)

where β̂ is the estimated coefficient

3. The prediction error is called residual (denoted by û)

û = y − ŷ (19)

4. A model over-predicts when y < ŷ or û < 0; otherwise the model under-predicts

5. ŷ is part of y explained by the model, while û captures the unexplained part

29
Example 5

30
Example 5

1. A real-estate investor wants to find most under-valued houses (bargain), whose actual
prices are less than the predicted price

2. We obtain the fitted value and residual using stata predict command

3. We sort houses based on residuals

4. We list the five houses with most negative residuals

5. The best bargain is a house sold at the price of 76804. The predicted price from the
regression is 161742

31
Application of Regression Analysis

1. How to run a regression to help IRS to detect potential tax-cheaters?

2. How to design a cellphone app that uses regression to report the calorie you burn given
the number of steps you walk?

32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy