Econometric S Cheat Sheet
Econometric S Cheat Sheet
by Tyler Ransom, University of Oklahoma Regression is useful because we can estimate a ceteris paribus Multiple regression is more useful than simple regression
@tyleransom relationship between some variable x and our outcome y because we can more plausibly estimate ceteris paribus
relationships (i.e. E (u|x) = E (u) is more plausible)
y = β0 + β1 x 1 + · · · + βk x k + u
Data & Causality y = β0 + β1 x + u
β̂1 , . . . , β̂k : partial effect of each of the x’s on y
Basics about data types and causality. We want to estimate β̂1 , which gives us the effect of x on y.
β̂0 = y − β̂1 x1 − · · · − β̂k xk
Types of data OLS formulas d (y, residualized xj )
Cov
β̂j =
Experimental Data from randomized experiment To estimate β̂0 and β̂1 , we make two assumptions: Vd
ar (residualized xj )
Observational Data collected passively where “residualized xj ” means the residuals from OLS
1. E (u) = 0
Cross-sectional Multiple units, one point in time regression of xj on all other x’s (i.e. x1 , . . . , xj−1 , xj+1 , . . . xk )
Time series Single unit, multiple points in time 2. E (u|x) = E (u) for all x
Longitudinal (or Panel)Multiple units followed over multiple Gauss-Markov Assumptions
time periods When these hold, we get the following formulas:
1. y is a linear function of the β’s
Experimental data β̂0 = y − β̂1 x 2. y and x’s are randomly sampled from population
d (y, x) 3. No perfect multicollinearity
• Correlation =⇒ Causality Cov
β̂1 = 4. E (u|x1 , . . . , xk ) = E (u) = 0 (Unconfoundedness)
• Very rare in Social Sciences Vd
ar (x)
5. V ar (u|x1 , . . . , xk ) = V ar (u) = σ 2 (Homoskedasticity)
fitted values (ŷi ) ŷi = β̂0 + β̂1 xi When (1)-(4) hold: OLS is unbiased; i.e. E(β̂j ) = βj
Statistics basics residuals (ûi ) ûi = yi − ŷi When (1)-(5) hold: OLS is Best Linear Unbiased Estimator
P
We examine a random sample of data to learn about the Total Sum of Squares SST = N (y − y)2 Variance of u (a.k.a. “error variance”)
Pi=1 i
population Expl. Sum of Squares SSE = N i=1 (ŷi − y)
2
PN 2 SSR
Resid. Sum of Squares SSR = i=1 ûi σ̂ 2 =
Random sample Representative of population R-squared (R2 ) R2 = SSE ; N −K−1
SST
Parameter (θ) Some number describing population “frac. of var. in y explained by x” XN
1
Estimator of θ Rule assigning value of θ to sample = û2
1 PN Algebraic properties of OLS estimates N − K − 1 i=1 i
e.g. Sample average, Y = N i=1 Yi
PN
Estimate of θ What the estimator spits out
i=1 ûi = 0 (mean & sum of residuals is zero) Variance and Standard Error of β̂j
for a particular sample (θ̂) PN σ2
Sampling distribution Distribution of estimates i=1 xi ûi = 0 (zero covariance bet. x and resids.) V ar(β̂j ) = , j = 1, 2, ..., k
across all possible samples SSTj (1 − Rj2 )
The OLS line (SRF) always passes through (x, y) where
Bias of estimator W E (W ) − θ
f) N
X
Efficiency W efficient if V ar(W ) < V ar(W SSE + SSR = SST
SSTj = (N − 1)V ar(xj ) = (xij − xj )
Consistency W consistent if θ̂ → θ as N → ∞ 0 ≤ R2 ≤ 1 i=1