311 Slide ch6
311 Slide ch6
1
Big Picture
2
Assumption of Linearity
2. This regression is linear in the sense that it assumes constant marginal effect of x on y :
dy
= β1 = constant (2)
dx
So when x changes, y changes at a constant rate
4. For example, if y is house price and x is age, then the linear model assumes the
depreciation rate is constant
5. In reality the relationship between y and x can be nonlinear or the marginal effect can be
varying. That is the motivation for nonlinear models such as the log-level model
3
Log-level model
For instance, y can be the number of confirmed cases of coronavirus and x is time
4
An Approximation
In short, 100 times β1 in the log-level model gives the percentage change of y when x
changes by one unit
6
Example 1
7
Example 1
3. The coefficient of age is -337. So age rising by one year is associated with price
decreasing by 337 dollar
4. We wonder whether 337 dollar is a big or small change. To put that number into
perspective, we divide it by the average house price 83721. The ratio times 100 equals
0.4031 percent
8
Example 1—continued
9
In Class Discussion
3. How to tell which model to use, log-level or level-log? (Hint: compare the graphs of
y = ex and y = log(x))
10
Log-log Model
If we take logs of both dependent and independent variables, we get the log-log model that
indicates elasticity
where β1 measures the percentage change of y when x changes by one percent (not one unit).
d log(y) 100d log(y) percent change of y
β1 = = = = elasticity (10)
d log(x) 100d log(x) percent change of x
In short β1 measures elasticity
11
Example 2
1. We generate the log value using stata log function and gen command
3. The coefficient of log area is 0.7845. So area rising by one percent is associated with
price increasing by 0.7845 percent (less than one percent)
12
Example 2
13
Model with Quadratic (Squared) Term
4. Compared to models using log values, the quadratic model can allow for a turning point.
5. There is a minimum if β2 > 0 and maximum if β2 < 0. By setting (12) to zero we locate
the turning point at
turning point β1
x =− (13)
2β2
14
Testing Constant Marginal Effect
H0 : β2 = 0 (14)
2. We reject the null hypothesis if the t statistic of β2 exceeds 1.96 in absolute value, or its
p-value is less than 0.05
3. We can start with a quadratic model. If it turns out that β2 is insignificant, we just drop
the squared term x2 and run a linear model. This is the general-to-specific modeling
strategy
15
Example 3
16
Example 3
3. Obviously the linear model does a bad job predicting those old houses with age greater
than 100
4. Most old houses lying above the fitted line implies systematic prediction errors and
existence of a turning point or nonlinear relationship
17
Example 3—continued
18
Example 3—continued
1. We next regress house price onto age and its squared term
3. We get a better fit because now those old houses scatter around the new fitted line
19
Example 3—continued
20
Example 3—continued
1. The coefficient of squared term age2 is 8.29, positive and statistically significant at 5%
level
2. t statistic of β2 is 10.62 > 1.96, rejecting the hypothesis (14) of constant marginal effect
3. We use formula (13) to show that the minimum (turning point) is located where age =
87.87.
4. Before the turning point, the house price falls when a house gets older. After the turning
point, the house price starts to rise.
21
In Class Discussion
1. Can a house be too big in the sense that there is a turn point, after which rprice starts to
fall when area keeps rising?
3. How to find the optimal area that maximizes the house price?
22
Model with Interaction Term
1. We can include an interaction term (product of two regressors) to allow the marginal
effect of one regressor to depend on the other regressor
H0 : β2 = 0 (17)
23
Example 4
24
Example 4
1. A house is young if its age is less than 18, the average age. Otherwise a house is old
3. We find that for young houses the marginal effect of baths on rprice is 28580, greater
than the marginal effect of 21032 of old houses
4. In short, we find evidence supporting the interaction effect (i.e., age matters for the
marginal effect of baths on rprice)
25
Example 4—continued
26
Example 4—continued
3. That number is negative, so age affects the marginal effect of baths negatively
4. That number is significant at 10% level (according to the p-value), so the interaction
effect exists
5. From (16) we know that for a brand-new house (age=0), one more bathroom is
associated with 29350 price increase. For an one-year old house (age=1), the marginal
effect is 29350-30 = 29320
27
In Class Exercise
1. We want to know whether the depreciation rate of an aging house depends on the
number of bathrooms
2. Please specify a proper model and run a regression to find the answer.
4. Which house depreciates faster when it gets old, the one with one bathroom or two
bathrooms?
28
Prediction and Residual Analysis
û = y − ŷ (19)
29
Example 5
30
Example 5
1. A real-estate investor wants to find most under-valued houses (bargain), whose actual
prices are less than the predicted price
2. We obtain the fitted value and residual using stata predict command
5. The best bargain is a house sold at the price of 76804. The predicted price from the
regression is 161742
31
Application of Regression Analysis
2. How to design a cellphone app that uses regression to report the calorie you burn given
the number of steps you walk?
32