Econometrics__2__Notes (2)
Econometrics__2__Notes (2)
able Trap
1.1 Notes from the Lecture
1. If a qualitative variable has m categories, introduce only m − 1 dummy
variables to avoid perfect collinearity, which leads to the dummy vari-
able trap.
3. The intercept value represents the mean value of the base category.
6. If you introduce a dummy for each category, you must omit the inter-
cept term. For example:
Here, the coefficients of the dummy variables represent the mean values
of each category.
where:
1
• Yi is the dependent variable (e.g., hourly wage).
Yi = β1 + β2 Xi + ui (1)
Properties of LPM
1. E(Yi |Xi ) = β1 + β2 Xi , which is the conditional probability P (Yi =
1|Xi ).
2. Disadvantages of LPM:
2
3 Logit Model
The logit model uses the logistic distribution function for modeling binary
outcomes:
ez
Pi = , Zi = β1 + β2 Xi (2)
1 + ez
The log of the odds ratio is linear:
Pi
Li = ln = Zi = β1 + β2 Xi (3)
1 − Pi
Odds ratio: The factor by which the odds change for a one-unit increase
in a predictor variable:
eβ
Proportional change in odds: The relative change in the odds for a one-unit
increase in the predictor:
eβ − 1
The marginal effect measures the change in probability resulting from a one-
unit change in a predictor variable. Formula:
Marginal Effect = P (1 − P ) × β
3
Form of Y Form of X Interpretation
Y X When X changes by 1 unit, Y will change by β1 units.
ln Y ln X When X changes by 1%, Y will change by β1 % (elasticity).
Y ln X When X changes by 1%, Y will change by β1 /100 units.
ln Y X When X changes by 1 unit, Y will change by (β1 × 100)%.
4 Probit Model
The probit model assumes a cumulative normal distribution for binary out-
comes:
P (Y = 1|X) = Φ(β1 + β2 X) (4)
where Φ is the cumulative distribution function (CDF) of the standard nor-
mal distribution: Z X
1 t2
Φ(X) = √ e− 2 dt (5)
−∞ 2π
odds ratio = eZ
A disadvantage of the Logit and Probit models is that the parameters are
not easily interpreted.
4
Summary of Time Series Models and Their
Properties
White Noise (WN)
• Definition: A purely random process with constant mean and variance
and no autocorrelation.
• Equation: ut ∼ IIDN(0, σ 2 ).
• Properties:
• Properties:
• Properties:
– Always stationary.
– The ACF cuts off after lag q.
– The PACF decays gradually.
5
Autoregressive Moving Average (ARMA) Model
• Definition: Combines the AR and MA models to explain a time series
using both past values and past error terms.
• Properties:
• Properties:
• Covariance depends only on the lag (distance) between two time peri-
ods, not the actual time.
6
Key Properties:
• Mean Reversion: The series tends to return to its mean over time.
• Constant Fluctuations: Variance remains stable, indicating consis-
tent amplitude of fluctuations.
Importance of Stationarity:
• Nonstationary series are specific to the observed time period and un-
suitable for generalization or forecasting.
7
Key Concepts
• Random walks (with or without drift) are nonstationary processes.
• Random walks exhibit stochastic trends.
Types of Trends
Deterministic Trend:
• Predictable and constant over time.
• Equation: Yt = β1 + β2 t + ut
• Subtracting the trend (β1 + β2 t) results in a stationary series. This
process is called detrending.
Stochastic Trend:
• Unpredictable and nonstationary.
• Found in random walks with or without drift.
Integrated Processes
A stochastic process requiring differencing d times to achieve stationarity is
said to be integrated of order d, denoted as I(d).
• Example:
– I(0): Stationary time series.
– I(1): Requires first differencing.
– I(2): Requires second differencing.
• Range: −1 ≤ ρk ≤ 1.
Correlogram:
• A plot of ρk against k (lags).
• High and slowly decaying ρk : Indicates nonstationarity.
8
Testing Autocorrelation Significance
Box–Pierce Q Statistic:
• Approximation: Q ∼ χ2 (m).
• LB ∼ χ2 (m).
Yt = ρYt−1 + ut where − 1 ≤ ρ ≤ 1
Yt − Yt−1 = (ρ − 1)Yt−1 + ut
9
Dickey-Fuller (DF) Test
• If H0 is true (δ = 0), the t-statistic for the coefficient of Yt−1 in the
regression follows the τ -statistic distribution (critical values available
in specialized tables).
∆Yt = δYt−1 + ut
∆Yt = β1 + δYt−1 + ut
∆Yt = β1 + β2 t + δYt−1 + ut
10
The F-statistic is given by the equation:
SSR
Explained Variation per Degree of Freedom k−1
F = = SSE
(6)
Unexplained Variation per Degree of Freedom n−k
where:
• n: Number of observations
11
Engle–Granger (EG) or Augmented Engle–Granger (AEG) Test
• The DF or ADF unit root tests can be applied by estimating a regres-
sion of the form:
Yt = β1 + β2 Xt + ut ,
obtaining the residuals ut , and applying the DF or ADF tests to ut .
• Critical values for this test were provided by Sargan and Bhargava.
• In the short run, disequilibrium may occur. The error term in the
cointegrating equation can be treated as the equilibrium error.
12
– Each endogenous variable is explained by its own lagged values
and the lagged values of all other endogenous variables in the
model.
– Typically, there are no exogenous variables in the model.
7. Regression of one time series variable on one or more time series vari-
ables often can give nonsensical or spurious results. This phenomenon
is known as spurious regression. One way to guard against it is to find
out if the time series are cointegrated.
13
Engle–Granger (EG), Augmented Engle–Granger (AEG), and Cointe-
grating Regression Durbin–Watson (CRDW) tests can be used to find
out if two or more time series are cointegrated.
9. Cointegration of two (or more) time series suggests that there is a long-
run, or equilibrium, relationship between them.
10. The error correction mechanism (ECM) developed by Engle and Granger
is a means of reconciling the short-run behavior of an economic variable
with its long-run behavior.
11. The field of time series econometrics is evolving. The established results
and tests are in some cases tentative, and a lot more work remains. An
important question that needs an answer is why some economic time
series are stationary and some are nonstationary.
14