0% found this document useful (0 votes)
11 views81 pages

Day 9.2

Uploaded by

Xi Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views81 pages

Day 9.2

Uploaded by

Xi Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Instrumental Variables

Caveat
• An Instrumental Variable is a somewhat
complicated methodological idea.
– Can be technically challenging at first.
– Good applications often are embedded in the
language and questions of specific fields of study.
– Academic politics about what methods work best
and who should be cited play an outsized role in
teaching.
• We don’t have enough time to master the
idea today.
Purpose
• Review the basics with some of the “new” lingo.

• Provide some advice about how to apply “design-


based” thinking to IV settings.

• Give examples of IV studies.

• Inspire you to learn more on your own time.


Instrumental Variables As A
More General Method
Learning About Causal Effects
• Randomized experiments have played a
crucial role in establishing causality and in
estimating the magnitude of effects in many
fields.
– But randomized experiments (obviously) are not
the only way that people generate evidence of
causality.
– And it is not the only way that people use IV to
estimate the magnitude of causal effects.
IV and RCT
• We have been talking about how to “repair” a
broken RCT using a method called instrumental
variables.

• Use the wald ratio or 2sls

• The idea here is that RCT is really an “special


case” of the more general method of IV.
– A very important and convincing special case.
– But still: a special case…
Instrumental Variables
• The logic of IV applies outside the domain of
formally conducted RCTs.

• A variety of statistical tools, robustness checks,


and analytical norms have grown up around the
use of instrumental variables.

• Learning about IV methods has tremendous spill


over effects. It helps you be a more critical reader
of research papers.
– Even papers that don’t use IV methods at all.
The Plan
• Instrumental Variables
– Simple Introduction
– Better Notation and Assumptions
– Wald Ratio and LATE
• IV as a “generalized” research design.
• Implementation from a design based perspective:
– Probing Assumptions
– Assessing potential threats to validity
– Interpreting results
Starting with an Easy Case
Constant Linear Effects
Y = β0 + β1D + ε
• In the past, this was the standard way to
analyze causal relationships. (Perhaps with
covariates.)
• Still serves as an off the shelf model for
framing the discussion.
But Treatment is Not Randomly
Assigned.
Y = β0 + β1D + ε
• Cov(D, ε) is non zero.

• Positive Cov(D, ε) means that treated people


would have had higher D even in the absence
of treatment.

• Selection bias.
If we estimate the model using
“standard” regression then the
coefficients will be biased.
What to do?
Enter the IV
• Suppose we find a variable/situation/sub-
population that satisfies:

• E[ε | Z = 1] = E[ε | Z =0]

• The instrument is not associated with any


other factors that affect Z.
– Combines Independence and Exclusion.
What Can You Do With Z?
• You want to know: Y = β0 + β1D + ε
• But OVB means that the direct approach will
provide a biased estimate of β1.
What can you do with Z
• Condition on Z = 1 and take expectations.
– E[Y | Z = 1] = E[β0 + β1D + ε | Z = 1]
– E[Y | Z = 1] = β0 + β1E[D|Z=1] + E[ε|Z=1]
• Condition on Z = 0 and take expectations
– E[Y | Z = 1] = E[β0 + β1D + ε | Z = 0]
– E[Y | Z = 1] = β0 + β1E[D|Z=0] + E[ε|Z=0]
• Difference the two sides:
E[Y | Z = 1] - E[Y | Z = 0] =
{β0 + β1E[D|Z=1] + E[ε|Z=1]} –
{β0 + β1E[D|Z=0] + E[ε|Z=0]}
• Cancel and invoke exclusion/independence:
E[Y | Z = 1] - E[Y | Z = 0] = β1 {E[D|Z=1] –E[D|Z=0]}
What Can You Do With Z?
E[Y | Z = 1] - E[Y | Z = 0] = β1 {E[D|Z=1] –E[D|Z=0]}
• Divide both sides by the first stage:

β1 = { E[Y | Z = 1] - E[Y | Z = 0] } / {E[D|Z=1] –E[D|Z=0]}

• Success.
• For this to work, you need:
– Independence/Exclusion
– First Stage
– Constant Effects/Linearity
Moving To A More General Setting:
Notation and Assumptions for
“Modern” IV
Notation
• 𝑍 is the randomly assigned study arm.

• 𝐷 = 𝐷 1 𝑍 + 𝐷(0)(1 − 𝑍) is the treatment


received expressed in terms of “potential
treatments” under alternative values of Z.

• 𝑌(𝑑, 𝑧) is a potential outcome under hypothetical


values of Z and D.
– Imagine a set of variables for every possible
combination of Z and D.
Assumptions
1. First Stage: Z does affect D

2. Independence: Z is randomly assigned.

3. Exclusion: Z has no direct effect on outcomes.

4. Monotonicity: Z always affects D in the same


direction.

5. SUTVA: Non-interference\No Spillovers


The First Stage Assumption 1
• Treatment take up rates vary across sub-
populations defined by Z.
• The treatment exposure rate is different in the
treatment and control arms of the study.
• Z affects D.
• The instrument affects how many people are
treated.
The First Stage Assumption 2
• E[D | Z = 1] – E[D | Z = 0] ≠ 0

• Pr[D | Z = 1] – Pr[D | Z = 0] ≠ 0

• Di = D = α0i + α1i Zi  E[α1i ] ≠ 0

• D = α0 + α1 Z + υ  α1 ≠ 0

• Is the first stage assumption “testable”?


Independence Assumption 1
• The instrument is statistically independent of the
potential outcome and potential treatment
variables.
– People who receive Z = 1 are not more likely to
respond to the instrument than people who receive Z
= 0. (No assignment by people most likely to “take
advantage” of the opportunity.)
– People who receive Z = 1 are not more likely to benefit
from the treatment.
– People who receive Z = 0 are not more likely to “do
well” on the outcome even in the absence of the
treatment.
Independence Assumption 2
• Pr(Y(d,z), D(z) | Z = 1) = Pr(Y(d,z), D(z) | Z = 0)

• Is the independence assumption testable?

• Can the independence assumption be


“investigator” controlled?
Some Reinforcement
• The first two IV assumptions are:
– First Stage and Independence
• Both assumptions are at least “partially” testable:
– First Stage is easily tested: are treatment take up rates
in the Z = 1 and Z = 0 groups?
– Independence is partially testable through balancing
tests: are covariates similar in the Z = 1 and Z = 0
groups?
– Independence is very well justified by a known
random assignment procedure. Then Z should be
“uncorrelated with covariates by construction”
Exclusion Restriction 1
• The instrumental variable has no direct effect
on the outcome.
• The instrument only affects outcomes through
its first stage effect on treatment exposure.
• The IV causal chain:
– Z affects D and (then) D affects Y.
Exclusion Restriction 2
• Y(d,z) = Y(d)
• Before the exclusion restriction, there are four
potential outcomes:
• Y(1,1) : outcome when D = 1, Z = 1
• Y(1,0) : outcome when D = 1, Z = 0
• Y(0,1) : outcome when D = 0, Z = 1
• Y(0,0) : outcome when D = 0, Z = 0
• Exclusion restriction collapses the four d,z pairs
down to the standard pair of potential outcomes.
Exclusion Restriction 3
• Typically not testable against data.
– RCTs are not an exception and many
social/economic/educational RCTs likely violate the
exclusion restriction.
• This point is difficult to see at first but here are
some examples of threats to the exclusion
restriction:
• Placebo effects, demoralization effects, Hawthorne effects,
substitution effects, income effects.
• These things can all be created by commonly employed
random assignment procedures.
Exclusion Restriction 4
• Good papers justify the exclusion restriction using:
– Substantive theory and logic.
– Sensitivity analysis
– Smart “partial tests” the flow from the specific design and
theory: the treatment should not affect group X, where it had
no first stage.

• Bad papers justify the exclusion restriction by appealing to:


– Random assignment.
– Covariate balance.
– The idea that the results make sense.
• Older papers and bad new papers ignore the exclusion
restriction entirely or bury it in a footnote.
Monotonicity Assumption 1
• The instrument affects treatment exposure in the
same direction for each person.
– Either the instrument nudges everyone towards the
treatment or away from the treatment.
– Not both.

• Important in the newish “heterogeneous


treatment effects” approach.
• Not testable. Appeal to logic. Perhaps try to
investigate “two-way flows”.
Monotonicity Assumption 2
• D(1) ≥ D(0) for all subjects; or
• D(1) ≤ D(0) for all subjects.

• Recall:
– D = D(0) + Z(D(1) – D(0))
– Di = D = α0i + α1i Zi
• Monotonicity requires that:
– α1i ≥ 0
– D(1) – D(0) ≥ 0 for each i.
Spillover Effects (SUTVA)
• The instruments and treatments associated with
person i have no influence on the instruments and
treatments associated with person j.

• Y(di, dj) = Y(di)

• No “herd immunity”, peer effects, etc.

• Economists sometimes say that quasi-experiments are


typically “partial equilibrium” studies.
Assumption Story Testable Against Data Strategies
First Stage Z affects D. Yes. Test the first stage. Both
precision and magnitude
are important.

Independence Z is as good as randomly Partial. Conduct balance tests on


assigned. covariates.
Exclusion Z has no direct effect on Usually not. Examine logic and
Y. theory.
Conduct study specific
tests and sensitivity
analysis if possible.

Monotonicity Z weakly affects D in the Usually not. Sub group analysis to


same direction for check sign of first stage.
everyone. “Two way flows” may
indicate a problem.

No Spillovers Other people’s Usually not. Tricky. But sometimes


treatments don’t matter. cluster level instruments
can provide a solution.
Conduct analysis at level
that “contains” the spill.
Getting To LATE

I will skip the derivation. But we can


do it this afternoon if there is
interest.
Combine Assumptions with Data
• Suppose we believe that assumptions 1
through 5 are reasonable in the context of
some study.
• And suppose our study produces
observations of (𝑌, 𝐷, 𝑍) for each person.
– (Notice that the study only gives us data on
realized outcomes and not potential outcomes.)
• What can we learn?
Intent To Treat
• ITT = E[Y | Z = 1] – E[Y | Z = 0]

• Y = β0 + β1Z + ε

• Also called the “reduced form”


– Effect of the instrument on the outcome.
– Mean differencing and regression give the same
answer in the simple case.
What Does ITT Show?
• Fairly simple derivation shows that:

ITT = E[Y(1) – Y(0)| D(1) – D(0) = 1] x Pr(D(1) – D(0) = 1]

• Average effect for “compliers” times the size


of the complier population.
What Does The First Stage Show?
• F = E[D | Z = 1] – E[D | Z = 0]

• Fairly simple derivation shows that:

• F = E[D(1) – D(0)] = Pr(D(1) – D(0) = 1]

• First stage reveals the fraction of the


population who are “compliers”.
Wald Ratio Produces IV
• ITT/FS = E[Y(1) – Y(0) | D(1) – D(0) = 1]
Who Are Compliers?
• Recall from earlier presentations:
– D(1) = 1, D(0) = 0 Complier
– D(1) = 1, D(0) = 1  Always Taker
– D(1) = 0, D(0) = 0  Never Taker
– D(1) = 0, D(0) = 1  Defier

• Monotonicity assumption rules out defiers.


Two Stage Least Squares
Wald Ratio
• Great for understanding the logic of IV and for
clarifying identification of causal effects.
• But:
– Tedious and statistically costly to deal with
covariates.
– Slightly annoying to compute standard errors.
– What if you have more than one Z?
– What if you have more than one D?
Practicality
• In practice, lots of people “think” in terms of
Wald Ratios.
– Maybe they even start the analysis with Wald
Ratios.
– Often the calculate simple Wald Ratios in their
heads during seminars.
Practicality
• But when it comes down to actually doing
their work and writing a paper…

• They employ a method called Two Stage Least


Squares.
Two Stage Least Squares
• We’ll present things in regression notation
because that is the easier way to understand this
material.

• As in the previous example, you can always re-


write things in terms of potential outcomes to
make causality and treatment effects clearer.

• For now, we’ll start with “constant treatment


effects”
Two Equations in TSLS

1. Causal Model (Structural Model)


2. First Stage
Causal Equation
• The causal or “structural” equation:
– Y = β 0 + β 1D + ε

• Assume that D is not randomly assigned and


omitted variable bias is a concern.
• Notice that Z does not appear in the model.
Terminology
• When 𝐷 is assigned in a non-random way that we
think will produce biased estimates, people say
that:
– D is endogenous.
– D is determined “simultaneously” with Y
– D is subject to omitted variables bias (OVB)
– D is “self-selected”.
– Conditioning on D creates a “selected sample”.
• They all mean (basically) the same thing.
– We hate your paper.
– You need and instrument. (We’ll hate that too.)
Stages of TSLS
• The first stage of TSLS involves estimating the
first stage regression.
• The second stage involves estimating the
causal model by replacing D with its predicted
value.
First Stage
• D = α 0 + α1 Z + υ
• Regression of treatment on instrument.
• Core idea is that we can “decompose” the
variation in treatment exposure into two parts:
– The part generated by the design/instrument.
– The part generated by non-random choices.
• Compute predicted value from the equation:
– Dhat is the “good variation” or “design based
variation”
– Uhat is the remaining (bad) variation.
Second Stage
• Y = β0 + β1D + ε
• Y = β0 + β1 (α0 + α1 Z + υ) + ε
• Y = {β0 + β1 α0 }+ {β1 α1 Z} + {β1 υ + ε}
• Y = θ0 + θ1Z + ξ
• θ1 = β1 α1 is the “reduced form” or ITT effect.

• What should you do if you want to know β1 ?


Insight: No Need To Divide If You Only
Include The Good Variation From The
First Stage
• Y = β0 + β1 (α0 + α1 Z) + ε
• Y = β0 + β1 (Dhat) + ε

• You obtain an unbiased estimate of β1 because α0 +


α1 Z consists of only the variation in D that is
explained by Z.
Recap
• Intuitive TSLS Algorithm:
– Estimate the first stage model.
– Compute predicted values
– Regress Y on the predicted values.

• In practice, the whole thing is computed in


one step.
Extensions of TSLS
Extensions of TSLS: Use The White
Board.

• Adding covariates to the model.

• Multiple Instrumental Variables.

• Multiple Treatment Variables.


Implementation Strategies and
Rules of Thumb
Implementation
• Test the first stage first. F > 10 is a common rule of
thumb.
– Remember to investigate first stage by sub-population.
– Think about the logic of the first stage. Consider
monotonicity assumption.
• Conduct balance tests.
• Develop theory needed to assess exclusion restriction.
• Devise sensitivity analysis for exclusion restriction.
• Try to find partial tests of exclusion.
Why Covariates?
• To make it more likely that the independence and
exclusion restrictions actually hold.
• Remember that we may not have explicitly randomly
assigned values of Z.
• Controlling for covariates increases power if the
covariates explain variation in Y.
• Covariates can also be used to assess
assumptions: balance tests.
• Covariates also facilitate sub-population analysis.
Balancing Tests?
• If Z is as good as randomly assigned then if
you make a table of means for sub-groups
with different values of Z, you should see
balance.

• Unless you think Z is only as good as randomly


assigned after controlling/stratifying on X.
Then you need to adjust for X directly.
Why should you avoid IV methods?
• Very hard to come up with instruments that will
credibly satisfy the assumptions. Why delude
yourself?
• Even when it works, the standard errors from IV
estimates are apt to be very large.
• Interpretation is weird: LATE or weighted average
of LATEs may be a treatment effect for a special
complier group that you do not care about.
• Weak instruments problem.
Examples With Investigator Control
• Oregon Health Insurance Experiment
• Moving To Opportunity Experiment
Examples with non-investigator
randomization
• Vietnam Draft Lottery
• Charter School Lottery
Examples where “quasi-
experiments”, theories, and models
are used to justify IV
Effects of Obesity on Health Care Costs
• Cawley and Meyerhoefer.
• Problem is measurement error in weight and omitted
variable bias in weight.
• Weight of relative is an instrument for the individual’s
weight.
• Claim is that shared household environment has no
effect on weight.
– Controversial for lay people.
– Research from behavioral genetics finds no support for the
claim that shared household environments affect weight.
• OLS finds that obesity raises costs by $650 per year.
• IV suggests that obesity raises costs by $2700 per year.
Effects of HIV Prevalence on Sexual
Behavior
• Oster (2011)
• How come people in Africa don’t practice safer
sex now that HIV prevalence is so high?
• Problem: Local HIV Prevalence is not randomly
assigned and is (obviously) partly determined by
risky sexual practices.
• Instrument: Distance from the origin of the
epidemic (center of Congo)
• Finds that after accounting for endogeneity,
people do reduce risky sex when HIV prevalence
is higher.
Other Examples
• Settler Mortality and Economic Institutions
• Family Size and Longer Run Well-being (Twins and
Gender Hunting)
• Effects of Eminent Domain Laws on Investment
Levels and Patterns (Heterogenous Judges)
• Effects of Foster Care on Child Outcomes
(Heterogeneous Judges)
• Demand Curves for Fish: Fulton Fish Market
(Stormy Days and Sunny Days)
Class Exercise:
A Hypothetical Study
Causal Effects of Flu Vaccination on
Younger Adults
Side Track To Talk About Weak
Instruments
Weak Instruments
• An instrument may be weak if the coefficient
on the instrumental variable in the first stage
is small.

– 𝐷𝑖 = 𝑋𝑖 𝛾 + 𝜋1 𝑍𝑖 + 𝜇𝑖
– Suppose that 𝜋1 > 0 but not by very much?
Remember the Wald Ratio
• Wald ratio is the building block of more
complicated IV strategies.

𝑌 𝑍 = 1 −𝐸[𝑌|𝑍=0]
𝐸
• 𝑊=
𝐸 𝐷 𝑍 = 1 −𝐸[𝐷|𝑍=0]
𝐼𝑇𝑇
• 𝑊=
𝐹𝑆
• What if 𝐹 ≅ 0?
0
10000
12000

2000
4000
6000
8000
1.000
0.833
0.694
0.579
0.482
0.402
0.335
0.279
0.233
0.194
0.162
0.135
0.112
0.093
0.078
0.065
0.054
0.045
0.038
0.031
0.026
0.022
0.018
0.015
0.013
0.010
0.009
0.007
0.006
0.005
0.004
Value of the Wald Ratio With Fixed ITT and Shrinking First Stage

0.004
0.003
0.002
0.002
0.002
0.001
0.001
0.001
Very Fast Background
• Health People 2020 report:
– About 25% of 18 to 64 year olds were vaccinated
for influenza in 2008.
• The goal for 2020 is to push that number up to 80%.
– About 67% of adults over age 65 were vaccinated
in 2008.
• The goal for 2020 is to get the number up to 90%.
Simple Question
• Does the general population of non-elderly
adult actually benefit from increased
vaccination?
How should we model this
question?
Effects of Flu Vaccines

– How should we measure health?


– What do the coefficients mean?
– What would you hope to include in X?
– Why does the model include an interaction term?
– What are some possible threats to validity if we
tried to estimate the model using OLS?
Think About Assumptions
1. Independence: Z is randomly assigned.

2. SUTVA: Non-interference\No Spillovers

3. Exclusion: Z has no direct effect on outcomes.

4. First Stage: Z does affect D

5. Monotonicity: Z always affects D in the same


direction.
What should we use as an
instrumental variable?
Candidate Instruments
• Dummy variable indicating whether the
person works in a health care occupation.
• Price of flu vaccines by city and year in the US.
• State regulations that require insurance plans
to cover flu vaccinations.
• Time series data on the total supply of
vaccines by year.
• Other ideas?
Suppose we settle on one of the
instruments.
How do we proceed?
A Rough Plan
• Collect data on [Y, X, GT65, FluVac].
• Collect data on Z.

• Often Z and [Y, X, GT65, FluVac] will come


from different sources and you will need to
merge two data sets together.
• Collecting data on Z may be a lot of work.
Implementation
𝐻𝑜𝑠𝑝𝑉𝑖𝑠𝑖𝑡𝑖 = 𝑋𝑖 𝛽0 + 𝛽1 𝐺𝑇60 + 𝛽2 𝐹𝑙𝑢𝑉𝑎𝑐𝑖 + 𝛽3 (𝐹𝑙𝑢𝑉𝑎𝑐𝑖 × 𝐺𝑇60𝑖 ) + 𝜀𝑖

• What is the First Stage?

• What is the Second Stage?


Stata Tips

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy