A Practical Introduction To Regression Discontinui
A Practical Introduction To Regression Discontinui
∗
Department of Operations Research and Financial Engineering, Princeton University.
†
Department of Political Science, University of Pennsylvania.
‡
Department of Politics, Princeton University.
CONTENTS
Contents
Acknowledgments 1
1 Introduction 3
i
CONTENTS
Bibliography 151
ii
ACKNOWLEDGMENTS
Acknowledgments
This monograph, together with its accompanying first part (Cattaneo, Idrobo, and Titiu-
nik, 2020), collects and expands the instructional materials we prepared for more than 40
short courses and workshops on Regression Discontinuity (RD) methodology that we taught
between 2014 and 2022. These teaching materials were used at various institutions and pro-
grams, including the Asian Development Bank, the Philippine Institute for Development
Studies, the International Food Policy Research Institute, the ICPSR’s Summer Program
in Quantitative Methods of Social Research, the Abdul Latif Jameel Poverty Action Lab,
the Inter-American Development Bank, the Georgetown Center for Econometric Practice,
the Universidad Católica del Uruguay’s Winter School in Methodology and Data Analysis,
the Centre for Research in Economics and Management (NIPE), the Summer Institute of
the Econometric Society (SIES), the International Initiative for Impact Evaluation (3ie), the
National Bureau of Economic Research (NBER), the Bogota Summer School in Economics,
Amazon, Inc., the Summer School of the Italian Econometric Association (SIdE), and the
Northwestern Workshop on Research Design for Causal Inference. We also used these ma-
terials for teaching at the undergraduate and graduate level at Brigham Young University,
Cornell University, Instituto Tecnológico Autónomo de México, Pennsylvania State Univer-
sity, Pontificia Universidad Católica de Chile, Princeton University, University of Michigan,
University of Washington, and Universidad Torcuato Di Tella. We thank all these institu-
tions and programs, as well as their many audiences, for the interest, feedback, and support
we received over the years.
The work collected in our two-volume monograph evolved and benefited from many in-
sightful discussions with our current and former collaborators: Sebastián Calonico, Rajita
Chandak, Robert Erikson, Juan Carlos Escanciano, Max Farrell, Yingjie Feng, Brigham
Frandsen, Sebastián Galiani, Michael Jansson, Luke Keele, Marko Klašnja, Xinwei Ma,
Kenichi Nagasawa, Brendan Nyhan, Filippo Palomba, Jasjeet Sekhon, Gonzalo Vazquez-
Bare, Rae Yu, and José Zubizarreta. Their intellectual contribution to our research program
on RD designs has been invaluable, and certainly has made our monographs much better
than they would have been otherwise. We also thank Alberto Abadie, Joshua Angrist, Ivan
Canay, Richard Crump, David Drukker, Jianqing Fan, Sebastian Galiani, Guido Imbens,
Patrick Kline, Jason Lindo, Justin McCrary, David McKenzie, Douglas Miller, Aniceto Or-
beta, Zhuan Pei, and Andres Santos for the many stimulating discussions and criticisms we
received from them over the years, which also shaped our work in important ways. The co-
Editors Michael Alvarez and Nathaniel Beck offered insightful and constructive comments
on several preliminary drafts of our manuscripts, including the suggestion of splitting the
1
ACKNOWLEDGMENTS
content into two stand-alone volumes. They were also infinitely patient with us when our
plan to complete this volume was massively delayed due to the Covid-19 pandemic. Last but
not least, we gratefully acknowledge the support of the National Science Foundation through
grants SES-1357561 and SES-2019432.
The goal of our two-part monograph is purposely practical and hence we focus on the
empirical analysis of RD designs. We do not seek to provide a comprehensive review of the
methodological literature on RD designs, which we do in Cattaneo and Titiunik (2022),
nor discuss theoretical aspects in detail. As we did in our the first volume, we mostly re-
frain from citing prior literature on the main sections; instead, we provide a short list of
references at the end of each section to guide readers who are interested in further method-
ological details and formal theoretical results. In this second part, we employ the replication
data from Cattaneo, Frandsen, and Titiunik (2015), Lindo, Sanders, and Oreopoulos (2010),
Londoño-Vélez, Rodrı́guez, and Sánchez (2020), and Keele and Titiunik (2015) for empirical
illustration of the different topics. We thank these authors for making their data and codes
publicly available. We provide complete replication codes in both R and Stata for all the
empirical analyses discussed throughout the monograph. The general purpose, open-source
software we use, as well as all replication codes and other supplementary materials, can be
found at:
https://rdpackages.github.io/
2
1 INTRODUCTION
1 Introduction
The Regression Discontinuity (RD) design has emerged as one of the most credible research
designs in the social, behavioral, biomedical, and statistical sciences for program evalua-
tion and causal inference in the absence of an experimentally assigned treatment. In this
manuscript, we continue the discussion in Cattaneo, Idrobo, and Titiunik (2020), covering
practical topics in the analysis and interpretation of RD designs that were not included
in our first monograph due to space constraints. While our discussion is meant to be self-
contained, we recommend that readers who are unfamiliar with the basic features of the RD
design consult our first monograph before reading this one, as several concepts and ideas
discussed previously will be assumed known in this volume. In what follows, we refer to the
first monograph as Foundations, and to this monograph as Extensions.
The RD design is defined by three fundamental ingredients: a score (also known as a
running variable, forcing variable, or index), a cutoff, and a treatment rule that assigns units
to treatment or control based on a hard-thresholding rule. All units are assigned a score,
and the treatment is assigned to units whose value of the score exceeds the cutoff and not
assigned to units whose value of the score is below the cutoff. This assignment rule implies
that the probability of treatment assignment changes abruptly at the known cutoff. If units
are not able to perfectly determine or manipulate the exact value of the score that they
receive, the discontinuous change in the probability of treatment assignment can be used to
study the effect of the treatment on outcomes of interest, at least locally, because units with
scores barely below the cutoff can be used as comparisons or “counterfactuals” for units with
scores barely above it.
To formalize, we assume that there are n units, indexed by i = 1, 2, . . . , n, and each unit
receives a score Xi . Units with Xi ≥ c are assigned to the treatment condition, and units
with Xi < c are assigned to the untreated or control condition, where c denotes the cutoff.
We summarize this in the treatment assignment rule Ti = 1(Xi ≥ c), where 1(·) is the
indicator function. In Foundations, we focused exclusively on the canonical Sharp RD design
where the running variable is continuous and univariate, there is a single cutoff determining
treatment assignment, compliance with treatment assignment is perfect, and the analysis
is conducted using continuity-based methods (e.g., local polynomial approximations with
robust bias correction inference). The goal of this manuscript is to discuss practical RD
analysis when these assumptions are extended.
We adopt the potential outcomes framework—see Imbens and Rubin (2015) for an intro-
duction to potential outcomes and causality, and Abadie and Cattaneo (2018) for a review
3
1 INTRODUCTION
of program evaluation methodology. We assume that each unit has two potential outcomes,
Yi (1) and Yi (0), which correspond, respectively, to the outcomes that would be observed
under treatment or control. Treatment effects are therefore defined in terms of comparisons
between features of (the distribution of) both potential outcomes, such as their mean or
quantiles. If unit i receives the treatment, we observe the unit’s outcome under treatment,
Yi (1), but Yi (0) remains unobserved, while if unit i is untreated, we observe Yi (0) but not
Yi (1). This is known as the fundamental problem of causal inference. The observed outcome
Yi is therefore defined as
(
Yi (0) if Xi < c
Yi = (1 − Ti ) · Yi (0) + Ti · Yi (1) = .
Yi (1) if Xi ≥ c
µ1(x)
Outcome
τSRD
µ0(x)
Cutoff
c
Score (x)
where µ0 (x) ≡ E[Yi (0)|Xi = x] and µ1 (x) ≡ E[Yi (1)|Xi = x]. This parameter is called the
Sharp RD treatment effect, and is depicted in Figure 1.1, where we also plot the regres-
sion functions µ0 (x) and µ1 (x) for values of the score Xi = x, with solid and dashed lines
4
1 INTRODUCTION
Equation (1.1) says that, if the average potential outcomes given the score are continuous
functions of the score at c, the difference between the limits of the treated and control average
observed outcomes as the score approaches the cutoff is equal to the average treatment effect
at the cutoff. This identification result is due to Hahn, Todd, and van der Klaauw (2001),
and has spurred a large body of methodological work on identification, estimation, inference,
graphical presentation, and falsification for various RD design settings. In Foundations, we
focused exclusively on the canonical Sharp RD design, presenting a practical discussion of
the methods developed by Hahn, Todd, and van der Klaauw (2001), Lee (2008), McCrary
(2008), Calonico, Cattaneo, and Titiunik (2014b, 2015a), Calonico, Cattaneo, and Farrell
(2018, 2020, 2022), Calonico, Cattaneo, Farrell, and Titiunik (2019), and Cattaneo, Jansson,
and Ma (2020), among others.
In this second monograph, we discuss several topics in RD methodology that build on
and extend the analysis of RD designs introduced in Foundations. Our first goal is to present
an alternative RD conceptual framework based on local randomization ideas, introduced
by Cattaneo, Frandsen, and Titiunik (2015) and further developed by Cattaneo, Titiunik,
and Vazquez-Bare (2016, 2017). This methodological approach can be useful in RD designs
with discretely-valued scores, and can also be used more broadly as a complement to the
continuity-based approach in other settings. Then, employing both continuity-based and local
randomization approaches, we extend the canonical Sharp RD design in multiple directions:
fuzzy RD designs, RD designs with discrete scores, and multi-dimensional RD designs. Most
of the methods we discuss build on Calonico, Cattaneo, and Titiunik (2014b), Calonico,
Cattaneo, Farrell, and Titiunik (2019), Keele and Titiunik (2015), Cattaneo, Keele, Titiunik,
and Vazquez-Bare (2016, 2021), Cattaneo, Titiunik, and Vazquez-Bare (2020), and references
therein.
We start in Section 2 by introducing the local randomization framework for RD designs.
In this framework, the score values are assumed to be as-if randomly assigned in a small
window around the cutoff, so that placement above or below the cutoff and hence treatment
assignment can be interpreted to be as-if experimental. This contrasts with the continuity-
based approach, where extrapolation to the cutoff plays a predominant role. Once the local
randomization assumption is invoked, the analysis can proceed by using tools from the
5
1 INTRODUCTION
analysis of experiments. This alternative approach, which we call the local randomization
approach to RD analysis, often requires stronger assumptions than the continuity-based
approach discussed in Foundations, and for this reason it is not always applicable. We discuss
the main features of this alternative framework in Section 2, including how to interpret the
required assumptions, and how to perform estimation, inference, and falsification.
We continue in Section 3 with a discussion of the Fuzzy RD design where, in contrast to
the Sharp RD design, compliance with the treatment assignment is imperfect: some units
above the cutoff fail to take the treatment despite being assigned to take it, and/or some units
below the cutoff take the treatment despite being assigned to the untreated condition. We
define several parameters of interest that can be recovered under noncompliance, and discuss
how to employ both continuity-based and local randomization approaches for analysis. We
also discuss how to perform falsification analysis under noncompliance.
In Section 4, we discuss RD designs where the running variable is discrete instead of
continuous, and hence multiple units share the same value of the score. For example, the
Grade Point Average (GPA) used by universities is often calculated up to one or two decimal
places, and collecting data on all students in a college campus results in a dataset where
hundreds or thousands of students have the same GPA value. In the RD design, the existence
of such “mass points” in the score variable sometimes requires using alternative methods,
as the standard continuity-based methods discussed in Foundations are no longer generally
applicable without modifications. We discuss when and why continuity-based methods may
be inadequate to analyze RD designs with discrete scores, and also describe how the local
randomization approach can be a useful alternative framework for analysis.
We devote the last section, Section 5, to generalize the assumption of a treatment as-
signment rule that depends on a single score and a single cutoff. We start by discussing
Multi-Cutoff RD designs, settings where units have a single score, but different subsets of
units face different cutoff values. When then discuss RD designs with multiple running vari-
ables, which we refer to as Multi-Score RD designs, where the treatment rule requires that
two or more scores be above a cutoff in order to receive the treatment. Our discussion in-
cludes a particular case of the Multi-Score RD design where assignment to treatment changes
discontinuously at the border that separates two geographic areas, typically known as the
Geographic RD design. Throughout this section, we explain how to generalize the methods
discussed both in Foundations and in the first sections of this monograph to both types of
Multi-Dimensional RD designs.
Each section illustrates the methods with a different empirical application. In Section
2, we use the data employed by Cattaneo, Frandsen, and Titiunik (2015) to study the
6
1 INTRODUCTION
incumbency advantage of political parties in elections for the U.S. Senate. In Sections 3 and
5, we use the data provided by Londoño-Vélez, Rodrı́guez, and Sánchez (2020) to study the
effect of a government subsidy in Colombia on enrollment in higher-education institutions.
In Section 4, we use the data in Lindo, Sanders, and Oreopoulos (2010), who analyze the
effects of academic probation on subsequent academic achievement. Finally, in Section 5, we
use the geographic data in Keele and Titiunik (2015) that studies the effect of campaign ads
on voter turnout.
As in Foundations, all the RD methods we discuss and illustrate are implemented using
various general-purpose software packages, which are free and available for R, Stata, and
Python, three leading statistical software environments in the social sciences. Each numerical
illustration we present includes an R command with its output, and the analogous Stata
command that reproduces the same analysis—though we omit the Stata output to avoid
repetition, and we truncate the R output when appropriate to conserve space. The Python
replication code is not shown in the manuscript but is available online.
The local polynomial methods for continuity-based RD analysis are implemented in the
package rdrobust, which is discussed in three companion software articles: Calonico, Catta-
neo, and Titiunik (2014a), Calonico, Cattaneo, and Titiunik (2015b), and Calonico, Catta-
neo, Farrell, and Titiunik (2017); see also Cattaneo, Titiunik, and Vazquez-Bare (2018) for
power calculations and related methods. The rdrobust package has three functions specifi-
cally designed for continuity-based RD analysis: rdbwselect for data-driven bandwidth se-
lection methods, rdrobust for local polynomial point estimation and inference, and rdplot
for graphical RD analysis. In addition, the package rddensity, discussed by Cattaneo, Jans-
son, and Ma (2018), provides manipulation tests of density discontinuity based on local
polynomial density estimation methods. The accompanying package rdlocrand, which is dis-
cussed by Cattaneo, Titiunik, and Vazquez-Bare (2016), implements all the local randomiza-
tion RD methods that we use throughout this volume. This package has two main functions:
rdwinselect selects the local randomization window around the cutoff using pre-treatment
covariates, and rdrandinf performs finite-sample and large sample inference in the selected
window. Finally, to analyze multi-dimensional RD designs we employ the package rdmulti,
which is discussed by Cattaneo, Titiunik, and Vazquez-Bare (2016). This package has three
main functons: rdmc for multi-cutoff estimation and inference, rdmcplot for multi-cutoff RD
plots, and rdms for multi-score estimation and inference. The R, Stata, and Python codes that
replicate all our analysis are available at https://rdpackages.github.io/replication.
We also provide further references for readers who wish to go beyond the contents we
cover. For readers interested in other practical introductions to RD designs with additional
7
1 INTRODUCTION
8
2 LOCAL RANDOMIZATION RD APPROACH
9
2 LOCAL RANDOMIZATION RD APPROACH
10
2 LOCAL RANDOMIZATION RD APPROACH
running variable, by virtue of being a randomly generated number, is unrelated to the average
potential outcomes. This is the reason why, in Figure 2.1(a), µ1 (x) = E[Yi (1)|Xi = x] and
µ0 (x) = E[Yi (0)|Xi = x] are constant for all values of x. Since the regression functions
are flat, the vertical distance between them can be recovered by the difference between
the average observed outcomes among all units in the treatment and control groups, i.e.
E[Yi |Xi ≥ 50] − E[Yi |Xi < 50] = E[Yi (1)|Xi ≥ 50] − E[Yi (0)|Xi < 50] = E[Yi (1)] − E[Yi (0)].
µ1(x)
µ1(x)
τSRD
Outcome
Outcome
Average Treatment Effect
µ0(x)
µ0(x)
c c
Score (x) Score (x)
11
2 LOCAL RANDOMIZATION RD APPROACH
average of the observed outcomes given the score as the score approaches the cutoff for the
treatment and control groups separately, limx↓c E[Yi |Xi = x] − limx↑c E[Yi |Xi = x]. As we dis-
cussed extensively in Foundations, the estimation of these limits requires that the researcher
approximate the regression functions, and this approximation will typically contain an error
that may directly affect estimation and inference. This is in stark contrast to the experiment
depicted in Figure 2.1(a), where estimation does not require functional form assumptions: by
construction, the regression functions are constant in the entire region where the score is ran-
domly assigned. This shows that RD designs are not canonical randomized experiments but
rather natural experiments (Titiunik, 2021), and thus belong to the toolkit of observational
studies methods.
A point often overlooked is that the known functional form of the regression functions in
a true experiment does not follow from the random assignment of the score per se, but rather
from the lack of relationship between the score and the potential outcomes that is assumed to
be a consequence of the randomization. If the value of the score were randomly assigned but
had a direct effect on the average outcomes, the regression functions in Figure 2.1(a) would
not necessarily be flat. Such direct effects are common and occur in any study where the score
affects the outcome directly, separately from the treatment. For example, if 70 is the passing
grade in a 100-point exam, students who receive a score of 68 or 69 might feel discouraged
because they failed to pass by a narrow margin, while students who scored 70 or 71 would not
experience this adverse psychological effect. Imagine that we send a congratulatory certificate
to all students who score 70 and above, and we are interested in the effect of the certificate on
future academic performance. If this discouragement affected future academic achievement,
we might observe a difference in outcomes between students who scored 68-69 and students
who scored 70-71, even if the certificate itself had no effect. Importantly, the spurious effect
would occur even if the true grades of students who originally scored between 68 and 71 were
randomly shuffled and students were notified of their “randomly selected” grade. This kind
of direct effect is the reason why many medical trials are “double blind” and do not reveal
to patients whether they are treated or control until the end of the experiment.
A local randomization approach to RD analysis must thus be based not only on the
assumption that placement above or below the cutoff is randomly assigned within a window
of the cutoff, but also on the assumption that the value of the score within this window
is unrelated to the potential outcomes—a condition that is not guaranteed by the random
assignment of the score Xi (nor by the random assignment of the treatment Ti ). To formalize,
let W = [c − w, c + w] for some window length w > 0, and XW be the vector of scores for all
i such that Xi ∈ W. The basic local randomization framework can be summarized by the
12
2 LOCAL RANDOMIZATION RD APPROACH
(LR2) The potential outcomes are not affected by the score within W.
The first condition requires that, inside the window, the assignment mechanism of the
score is known, as would happen in a randomized experiment. More formally, define PW [·]
to be the probability computed conditionally for those units with Xi ∈ W. Importantly,
in the local randomization framework all probability and moment calculations as well as all
parameter definitions are often done conditionally for those units whose scores fall within the
window W. With these conventions, LR1 requires that PW [XW ≤ x] = F (x) for some known
joint c.d.f. F (x). For example, this condition holds when all units have the same probability
of receiving all possible score values in W, and therefore equal probability of being assign to
control (Xi < c) or treatment (Xi ≥ c) when the window W is symmetric around the cutoff c.
The second condition, LR2, is the exclusion restriction ensuring that the potential outcomes
are not a function of the score for those units with score inside W, as would be expected in a
true double-blind randomized experiment. To formalize this condition, let Yi (0, x) and Yi (1, x)
denote the potential outcomes with now explicit dependence on the score variable only
through their second argument, so that Yi (0) = Yi (0, Xi ) and Yi (1) = Yi (1, Xi ). Then, if the
potential outcomes are non-random, the second condition means that Yi (0, x0 ) = Yi (0, x) and
Yi (1, x0 ) = Yi (1, x), for all x, x0 ∈ W and all units such that Xi ∈ W. If the potential outcomes
are random, the condition means PW [Yi (0, x0 ) = Yi (0, x)] = 1 and PW [Yi (1, x0 ) = Yi (1, x)] = 1
for all x, x0 ∈ W.
Under LR1 and LR2, for all units with Xi ∈ W = [c−w, c+w], placement above or below
the cutoff is unrelated to the potential outcomes, and the potential outcomes are unrelated to
the running variable; therefore, the regression functions are flat inside W. This is illustrated
in Figure 2.2, where for the case of random potential outcomes µ1 (x) = E[Yi (1)|Xi = x] and
µ0 (x) = E[Yi (0)|Xi = x] are constant for all values of x in W, but can have non-zero slopes
outside of it.
The contrast between Figures 2.1(a), 2.1(b), and 2.2 illustrates the differences between
an actual experiment where the score is a randomly generated number, a continuity-based
RD design, and a local randomization RD design. In the actual experiment, the potential
outcomes are unrelated to the score for all possible score values. In this case, the functional
forms of E[Yi (1)|Xi = x] and E[Yi (0)|Xi = x] are known. In the continuity-based RD design,
the potential outcomes can be related to the score everywhere; the functions E[Yi (1)|Xi = x]
and E[Yi (0)|Xi = x] are unknown but assumed to be smooth, and estimation and inference is
13
2 LOCAL RANDOMIZATION RD APPROACH
µ1(x)
Outcome RD
Effect
µ0(x)
c−w c c + w
Score (x)
based on approximating them near the cutoff. Finally, in the local randomization RD design,
the potential outcomes can be related to the running variable far from the cutoff, but there
is a window around the cutoff where this relationship ceases. In this case, the functions
E[Yi (1)|Xi = x] and E[Yi (0)|Xi = x] are unknown over the entire support of the running
variable, but inside the window W they are assumed to be constant functions of x—and are
therefore known.
In many applications, assuming that the score has no effect on the potential outcomes
near the cutoff may be regarded as unrealistic or too restrictive. However, such an assump-
tion can be taken as an approximation, at least for the very few units with scores extremely
close to the RD cutoff. As we will discuss below, a key advantage of the local randomization
approach is that it enables finite sample inference methods, which remain valid and can
be used even when only a handful of observations very close to the cutoff are included in
the analysis. Furthermore, the restriction that the score cannot directly affect the (average)
potential outcomes near the cutoff could be relaxed under additional assumptions. A modi-
fied version of assumption (LR2) can be invoked where the potential outcomes are allowed
to depend on the running variable, but there exists a transformation that, once applied
to the potential outcomes of the units inside W, leads to transformed potential outcomes
that are unrelated to the running variable. This transformation has the advantage of linking
the local randomization approach to RD analysis with the continuity-based approach dis-
14
2 LOCAL RANDOMIZATION RD APPROACH
cussed in Foundations, but would require large sample approximations (or other additional
assumptions) whenever parameters need to be estimated to implement the transformation.
We illustrate the local randomization methods with the study originally conducted by Cat-
taneo, Frandsen, and Titiunik (2015), which uses a Sharp RD design in the United States
to study the effect of the electoral advantages of incumbent political parties in U.S. Senate
elections between 1914 and 2010. In winner-takes-all elections, there is a discontinuous re-
lationship between the incumbency status of a political party and the vote share that the
party obtains in an election: if there are only two parties competing for a seat, the party
that gets just above 50% of the vote wins the election and becomes the incumbent, while
the opponent loses. Thus, party incumbency advantages can be studied with an RD design.
In the U.S., there are two U.S. Senate seats in each of the 50 states, for a total of 100
seats. Each seat is up for election every six years, but the seats are staggered so that one
third of seats is up for election every two years, and the two seats in the same state are
never up for election simultaneously. We re-estimate the RD effect of the Democratic party
winning a Senate seat on its vote share in the following election for that seat. In this RD
design, the unit of analysis is the U.S. state, and the score is the Democratic party’s margin
of victory at election t—defined as the difference between the vote share obtained by the
Democratic party minus the vote share obtained by its strongest opponent. The outcome of
interest is the vote share of the Democratic party in the following election for that same seat;
we denote this election t + 2 because the election immediately following election t, which we
denote t + 1, is for the other Senate seat in the same state.
The Democratic margin of victory can be positive or negative, and the cutoff that deter-
mines a Democratic party victory is located at zero. The treatment group is the set of states
that elect a U.S. Senator from the Democratic party at t, and the control group is the set of
states that elect a U.S. Senator from another party (most of whom are from the Republican
party). The index t covers every even year between 1914 and 2010, inclusive.
We rename the variables as follows:
• Y: vote share obtained by the Democratic party in the U.S. Senate election t + 2 in a
given state.
• X: margin of victory obtained by the candidate from the Democratic party running for
a U.S. Senate seat on election t in a given state, measured as the vote share obtained
15
2 LOCAL RANDOMIZATION RD APPROACH
by the Democratic party minus the vote percentage share obtained by its strongest
opponent.
• T: electoral victory of the Democratic party’s candidate running for a U.S. Senate seat
on election t in a given state, equal to 1 if the Democratic candidate won the election
and 0 if the candidate lost.
The dataset also contains several predetermined covariates: the Democratic vote share
obtained in the previous presidential election in that state, the Democratic vote share ob-
tained in the Senate election immediately prior to election t in the same state (which we
denote t − 1 and is for the other seat in the state), the Democratic vote share obtained in
the prior Senate election for the same seat (which we denote t − 2), and indicators for (i)
whether the Democratic Party won the t − 1 and t − 2 Senate elections in that state, (ii)
whether election t occurred during a midterm election year where the U.S. presidency is not
up for election, and (iii) whether there are no incumbent candidates running at t for the
Senate seat in the state (a so-called “open seat”).
Table 2.1 presents descriptive statistics for the three RD variables (Y, X, and T), and
the state-year level predetermined covariates. The outcome of interest (Y) has a minimum
of −100 (Democratic party receives zero votes) and a maximum of 100 (Democratic party
receives 100% of the vote), indicating that the Democratic and Republican parties run unop-
posed in some elections; the mean is 7.17, showing that Democrats have a moderate average
advantage over this period. Consistent with this, the mean of the treatment variable (T) is
0.54, indicating that the Democratic Party wins 54% of all Senate races between 1914 and
2010.
Figure 2.3 presents an RD plot of the outcome Y against the score X that illustrates the
16
2 LOCAL RANDOMIZATION RD APPROACH
100
80
Outcome
60
40
continuity-based average treatment effect at the cutoff. The solid line represents a third-
order (p = 3) global polynomial fit, instead of the default fourth-order (p = 4) global
polynomial fit, to avoid the Runge’s phenomenon (over-fitting) near the cutoff; see Section
3 in Foundations for details. The dots represent local means computed using evenly-spaced
bins with a mimicking-variance optimal number of bins. The observations above the cutoff
correspond to elections where the Democratic party won the t election, while observations
below the cutoff correspond to elections where the Democratic party lost the t election.
At the cutoff, the average Democratic vote share is lower for states where the Democratic
party loses than for states where the Democratic party wins. Employing the continuity-based
analysis discussed in Foundations, we use rdrobust to fit a local linear polynomial within a
mean-squared-error (MSE) optimal bandwidth and find that this effect is large and positive:
states where the Democratic party barely wins the U.S. Senate election at t receive on average
7.4 additional percentage points in their vote share in the following election for the same
seat. (We show an abbreviated output for future comparison with the local randomization
results.)
17
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.1
> out <- rdrobust(Y, X, kernel = "triangular", p = 1, bwselect = "mserd")
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 7.414 1.459 5.083 0.000 [4.555 , 10.273]
Robust - - 4.311 0.000 [4.094 , 10.919]
=============================================================================
Adopting a local randomization approach to RD analysis implies assuming that the assign-
ment of units above or below the cutoff was as if random inside the window W (condition
LR1), and that in this window the potential outcomes are unrelated to the score (condition
LR2). The implementation of experimental methods to analyze RD designs thus requires
knowledge or estimation of two important ingredients: (i) the window W where the local
randomization assumption is invoked; and (ii) the randomization mechanism that is needed
to approximate the assignment of units within W to the treatment and control conditions
(i.e., to placement above or below the cutoff). In real applications, W is often unknown and
must be selected by the researcher. Once W has been chosen, the choice of the randomization
18
2 LOCAL RANDOMIZATION RD APPROACH
The combination of non-stochastic potential outcomes and the sharp null hypothesis leads
to inferences that are (type-I error) correct for any sample size because, under HF0 , the ob-
served outcome of each unit is equal to the unit’s two potential outcomes, Yi = Yi (1) = Yi (0),
and there is no missing data. When the assignment mechanism is known, the full knowledge
of all potential outcomes under the null hypothesis allows us to derive the null distribution
19
2 LOCAL RANDOMIZATION RD APPROACH
of any test statistic from the randomization distribution of the treatment assignment alone.
Since the latter distribution is known exactly in finite samples, the Fisherian framework
allows researchers to make inferences without relying on large sample approximations.
We illustrate with a hypothetical example where there are five units inside W, and we
randomly assign NW,+ = 3 units to treatment and NW,− = NW − NW,+ = 5 − 3 = 2 units to
control, where NW is the total number of units inside W. We choose the difference-in-means
as the test-statistic:
1 X 1 X
S = ȲW,+ − ȲW,− , ȲW,+ = Yi Ti , ȲW,− = Yi (1 − Ti ).
NW,+ i:i∈W
NW,− i:i∈W
10—there are ten different ways to assign five units to two groups of size three and two. We
assume that Yi (1) = 5 and Yi (0) = 2 for all units, so that Yi (1) − Yi (0) = 3 for all i. The top
panel of Table 2.2 shows the ten possible treatment assignment vectors, t1 , . . . , t10 , and the
two potential outcomes for each unit.
Suppose that the observed treatment assignment inside W is t6 , so that units 1, 4 and 5
are assigned to treatment, and units 2 and 3 are assigned to control. Given this assignment,
the vector of observed outcomes is Y = (5, 2, 2, 5, 5), and the observed value of the difference-
in-means statistic is S obs = ȲW,+ − ȲW,− = 5+5+5
3
− 2+2
2
= 5 − 2 = 3. The bottom panel of
Table 2.2 shows the distribution of the test statistic under the null—that is, the ten different
possible values that the difference-in-means can take when HF0 is assumed to hold. The
observed difference-in-means S obs is the largest of the ten, and the exact p-value is therefore
pF = 1/10 = 0.10. This p-value is finite-sample exact, because the null distribution in Table
2.2 was derived directly from the randomization distribution of the treatment assignment,
20
2 LOCAL RANDOMIZATION RD APPROACH
and does not rely on any statistical model or large sample approximations.
Ȳ+ 3 4 4 4 4 5 3 3 4 4
Ȳ− 5 3.5 3.5 3.5 3.5 2 5 5 3.5 3.5
Ȳ+ − Ȳ− -2 0.5 0.5 0.5 0.5 3 -2 -2 0.5 0.5
PW (S = Stobs
j
) 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10
This example illustrates that implementation of the local randomization approach re-
quires specifying a particular window W and the particular way in which the treatment
assignment was randomized within W. Naturally, the distribution of the treatment assign-
ment within W is unknown; in practice, it can be approximated by assuming a complete
randomization within W. Implementation also requires choosing a particular test statistic;
the difference-in-means is a simple choice, but below we discuss other options.
We can now generalize the above example. We define the assignment mechanism within
W in general as PW [TW = t], where TW denotes the vector of treatment assignment variables
for all units with Xi ∈ W, and t ∈ TW . Again, we collect in the set TW all possible treatment
assignments that can occur given the assumed randomization mechanism. In a complete or
fixed margins randomization, TW includes all vectors of length NW such that each vector
has NW,+ ones and NW,− = NW − NW,+ zeros. Similarly, YW collects the NW observed
outcomes for units with Xi ∈ W. We also need to choose a test statistic, which we denote
S = S(TW , YW ), a function of TW and YW .
Of all the possible values of the treatment vector TW that can occur, only one will have
occurred in W; we call this value the observed treatment assignment, tobs W , and we denote
obs obs
S the observed value of the test-statistic associated with it, i.e. S = S(tobs
W , YW ). (In
obs
our example, we had tW = t6 .) Then, the one-sided finite-sample exact p-value associated
with a test of the sharp null hypothesis HF0 is the probability that the test-static is larger
21
2 LOCAL RANDOMIZATION RD APPROACH
Under HF0 , all potential outcomes are known and can be imputed, YW = YW (1) = YW (0),
so that S(TW , YW ) can be computed for all treatment assignments. Thus, under HF0 , the
only randomness in S(TW , YW ) comes from the random assignment of the treatment, which
is assumed to be known.
In practice, it often occurs that the total number of different treatment vectors tW that
can occur inside the window W is too large, and enumerating them exhaustively is unfeasible.
For example, assuming a fixed-margins randomization inside W with 15 observations on each
side of the cutoff, there are NNW,+ = 30
W
15
= 155, 117, 520 possible treatment assignments.
When exhaustive enumeration is unfeasible, we can approximate pF using simulations by
randomly sampling different vectors of tretament assignment.
Fisherian confidence intervals can be obtained by specifying sharp null hypotheses about
treatment effects, and then inverting these tests. This requires specifying a treatment effect
model, and testing hypotheses about the specified parameters. A simple choice is a constant
treatment effect model, Yi (1) = Yi (0) + τ , which leads to the null hypothesis HFτ0 : τ = τ0 .
(Note that HF0 is a special case of HFτ0 when τ0 = 0.) Under this model, a 1 − α confidence
interval for τ can be obtained by collecting the set of all the values τ0 that fail to be rejected
when we test HFτ0 : τ = τ0 with an α-level test.
To test HFτ0 , we build test statistics based on an adjustment to the potential outcomes
that renders them constant under this null hypothesis. Under HFτ0 , the observed outcome
is Yi = Ti · τ0 + Yi (0) and the adjusted outcome Ÿi ≡ Yi − Ti τ0 = Yi (0) is constant. A
randomization-based test of HFτ0 proceeds by first calculating the adjusted outcomes Ÿi for
all the units in the window, and then computing the test statistic using the adjusted outcomes
instead of the raw outcomes, i.e. computing S(TW , ŸW ). Once the adjusted outcomes are
used to calculate the test statistic for all possible treatment assignments, a test of HFτ0 : τ = τ0
can be implemented as a test of the sharp null hypothesis HF0 , using S(TW , ŸW ) instead of
S(TW , YW ). We use pFτ0 to refer to the p-value associated with a randomization-based test
of HFτ0 .
In practice, assuming that τ takes values in [τmin , τmax ], computing these confidence in-
tervals requires building a grid Gτ0 = τ01 , τ02 , . . . , τ0G , with τ01 ≥ τmin and τ0G ≤ τmax , and
collecting all τ0 ∈ Gτ0 that fail to be rejected with an α-level test of HFτ0 . Thus, the Fisherian
22
2 LOCAL RANDOMIZATION RD APPROACH
Although we used the difference-in-means test statistic and the fixed margins random-
ization mechanism for illustration, the general principle of Fisherian inference is general and
works for any appropriate choice of test statistic and randomization mechanism. Other possi-
ble test statistics include the Kolmogorov-Smirnov (KS) statistic and the Wilcoxon rank sum
statistic. Other randomization mechanisms include the Bernoulli assignment, where each unit
is assigned independently to treatment with the same probability—for implementation, it is
common to choose either 1/2 or the proportion of treated units in W. In practice, complete
randomization and Bernoulli randomization often lead to similar conclusions.
Finally, while the main goal of Fisherian methods is inference and not point estimation,
it is possible to define parameters of interest and point estimate them. However, any point
estimator based on the Fisherian framework requires assuming a sharp treatment effect model
that allows full imputation of all potential outcomes under the null hypothesis, as we did to
build confidence intervals by inversion. In particular, because the average treatment effect
is not sharp, Fisherian methods do not provide a general way to estimate this parameter.
This is sometimes seen as a limitation, since most common parameters do not allow for null
hypotheses that are sharp.
In some RD applications, even the smallest windows have many observations. In these cases,
although Fisherian methods continue to be valid and can certainly be used, researchers may
choose to use more standard methods that rely on large sample approximations. Compared
to Fisherian methods, the main advantage of large sample methods is that they provide
consistent point estimators of common parameters of interest, in addition to leading to
statistical inferences based on asymptotic distributional approximations.
All large sample methods assume that the sample size is “large” (the formal requirement
is that the sample size tends to infinity). The application of these methods to the local
randomization RD context thus requires that the number of observations within the window,
NW , be large enough. The are two kinds of frameworks for large sample methods, depending
on whether the potential outcomes are seen as fixed or random.
23
2 LOCAL RANDOMIZATION RD APPROACH
In the Neyman framework the potential outcomes (Yi (0), Yi (1)), i = 1, 2, . . . , n, are non-
stochastic, so all parameters are, in this sense, conditional on the potential outcomes. Neyman
envisions an urn model of assignment, where there is one urn per treatment condition and
each urn has the potential outcomes corresponding to that treatment condition for each unit.
In the binary treatment case, and proceeding conditionally for those units with Xi ∈ W, the
treatment urn contains the NW “balls” Y1 (1), Y2 (1), . . . , YNW (1), and the control urn contains
Y1 (0), Y2 (0), . . . , YNW (0). Estimates of average potential outcomes, µW,+ ≡ N1W i:Xi ∈W Yi (1)
P
and µW,− ≡ N1W i:Xi ∈W Yi (0), are created by drawing balls from the urns. For example, in
P
a fixed margins randomization, NW,+ balls are taken from the treated urn, and NW,− =
NW − NW,+ are taken from the control urn, in such a way that once a ball is taken from one
urn, it disappears from the other. Because the sampling is without replacement, the draws
are not independent.
The Neyman approach relies on large sample approximations, imagining that the urn
model is used many, many times to produce different assignments of units to treatment and
control. In particular, it considers the problem of using ȲW,+ to estimate µW,+ and ȲW,−
to estimate µW,− . This is achieved by assuming that NW,+ → ∞ and NW,− → ∞ and
invoking the law of large numbers to conclude that ȲW,+ and ȲW,− are consistent estimators.
Similarly, the approach invokes appropriate central limit theorems to perform statistical
inferences based on a Gaussian distributional approximation.
In the super-population framework, the n units are assumed to be drawn from a larger
population using independently and identically distributed (i.i.d.) sampling. This sampling
scheme results in the potential outcomes (Yi (0), Yi (1)), i = 1, 2, . . . , n, being random variables
rather than fixed quantities. Thus, in the super-population framework, there are two sources
of randomness: the sampling from the super-population, and the assignment of the sampled
units to treatment or control. In contrast, in the Neyman framework (and also in Fisher’s)
the only source of randomness is the treatment assignment. Table 2.3 compares the three
approaches.
Regardless of whether a Fisher, Neyman or super-popolation approach is adopted, we can
now define parameters of interest. Let EW [·] denote the expectation computed with respect
24
2 LOCAL RANDOMIZATION RD APPROACH
to the probability PW , that is, the expectation computed conditionally for those units with
Xi ∈ W. The local randomization sharp RD treatment effect is the average treatment effect
inside W:
1 X
θSRD ≡ EW [Yi (1) − Yi (0)].
NW i:X ∈W
i
The definition of θSRD is designed to cover both random and non-random potential out-
comes under different sampling schemes. In a Neyman framework, it reduces to θSRD =
1
P
NW i:Xi ∈W [Yi (1) − Yi (0)] because the potential outcomes are fixed and the (conditional)
expectation integrates to one. In the super-population framework under i.i.d. sampling, we
have θSRD = E [Yi (1) − Yi (0)|Xi ∈ W].
The parameter θSRD is different from the more conventional continuity-based RD param-
eter τSRD defined in the introduction and discussed in Foundations. While θSRD is an average
effect inside an interval (the window W), τSRD is an average at a single point (the cutoff c)
where the number of observations is zero whenever the score is continuously distributed. This
means that the decision to adopt a continuity-based approach versus a local randomization
approach directly affects the definition of the parameter of interest. Naturally, the smaller
the window W is, the more conceptually similar θSRD and τSRD become.
Under the local randomization assumptions invoked within W, we have
1 X h Ti Yi i 1 X h (1 − T )Y i
i i
θSRD = EW − EW ,
NW i:X ∈W PW [Ti = 1] NW i:X ∈W 1 − PW [Ti = 1]
i i
regardless of whether the potential outcomes are fixed or random. This identification result
expresses the counterfactual RD effect θSRD as a function of observed random variables, and
suggests the weighted difference-in-means estimator
1 X 1 X
θbSRD = ȲW,+ − ȲW,− , ȲW,+ = ωi Ti Yi , ȲW,− = ωi (1 − Ti )Yi ,
NW,+ i:Xi ∈W
NW,− i:Xi ∈W
where ωi denotes an appropriately defined weighting scheme for unit i. (We use the same
notation ȲW,+ and ȲW,− for simplicity.)
For example, when the assignment mechanism is Bernoulli, we have PW [Ti = 1] = p ∈
(0, 1) for all units with Xi ∈ W. In this case, defining the weights as
+ −
NW NW
ωi = · Ti + · (1 − Ti ),
NW · p NW · (1 − p)
25
2 LOCAL RANDOMIZATION RD APPROACH
we have EW [θbSRD ] = EW [ȲW,+ ] − EW [ȲW,− ] = θSRD , that is, θbSRD is unbised for θSRD . This result
follows from noting that EW [Ti Yi ] = EW [Ti ]EW [Yi (1)] and EW [(1 − Ti )Yi ] = EW [Ti ]EW [Yi (0)]
from fixed potential outcomes in the Neyman framework or from independence between the
treatment assignment and the potential outcomes in the super-population framework.
The standard difference-in-means estimator is a particular case of θbSRD with ωi = 1 for all
units. When the assignment mechanism follows a fixed-margins randomization, this choice
of weighting scheme makes θbSRD unbiased for θSRD , that is, EW [θbSRD ] = θSRD . This result follows
N+
from noting that EW [Ti ] = PW [Ti = 1] = NWW
for all i with Xi ∈ W, and under the specific
conditions on the potential outcomes imposed in each framework. By implication, whenever
the assignment mechanism does not follow a fixed-margins randomization, the unweighted
difference-in-means estimator is not unbiased for θSRD , although it is consistent under standard
large sample arguments. Thus, whenever the randomization mechanism is assumed to be
different from a fixed-margins randomization, the use of the unweighted difference-in-means
estimator must be justified based on large sample approximations.
For inference, both the Neyman and the super-population approaches rely on a Gaus-
sian approximation justified by appropriate central limit theorems. A possibly conservative
estimator of the variance of θbSRD can be constructed using standard least squares results. A
100(1 − α)% confidence interval can be constructed in the usual way by relying on a Gaus-
sian large sample approximation to the statistic of interest. For example, an approximate
two-sided 95% confidence interval is
h p i
CILS = θ̂SRD ± 1.96 · Vb ,
where Vb denotes an appropriate choice of variance estimator, which can depend on the
specific framework considered. A conservative choice is obtained if the so-called HC2 or HC3
heteroskedastic-robust variance estimators are used. Hypothesis testing is based on Gaussian
approximations as well. The Neyman or super-population null hypothesis is
1 X 1 X
H0 : EW [Yi (1)] = EW [Yi (0)].
NW i:X ∈W NW i:X ∈W
i i
In contrast to Fisher’s sharp null hypothesis HF0 , this null hypothesis does not allow us to
calculate the full profile of potential outcomes for every possible realization of the treatment
assignment vector. Thus, unlike the Fisherian approach, the large sample approach to hy-
pothesis testing relies on an approximation and is therefore not exact but, when valid, it
allow us to rely on well-known methods for estimation and inference based on least squares
26
2 LOCAL RANDOMIZATION RD APPROACH
We start the local randomization analysis of the U.S. Senate application using the function
rdrandinf, which is part of the rdlocrand library. The main arguments of rdrandinf
include the outcome variable Y, the running variable X, and the upper and lower limits of the
window where inferences will be performed (wr and wl). We first choose the ad-hoc window
[−2.5, 2.5], postponing the discussion of automatic data-driven window selection until the
next section. To make inferences in W = [−2.5, 2.5], we set wl = −2.5 and wr = 2.5. Since
Fisherian methods are simulation-based, we also choose the number of simulations via the
argument reps, in this case choosing 1, 000. Finally, in order to ensure the replicability of the
simulation-based results at a later time, we set the random seed using the seed argument.
27
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.2
> out <- rdrandinf(Y, X, wl = -2.5, wr = 2.5, seed = 50)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 5.313
================================================================================
Diff. in means 9.167 0.000 0.000 0.866
================================================================================
The output is divided in three panels. The top panel first presents the total number of
observations in the entire support of the running variable, the order of the polynomial used
to transform the outcomes, and the kernel function that is used to weigh the observations.
By default, rdlocrand uses a polynomial of order zero, which means the outcomes are not
transformed. The default is also to use a uniform kernel, that is, to compute the test statistic
using the unweighted observations (this can be changed with the option kernel). The rest
28
2 LOCAL RANDOMIZATION RD APPROACH
of the top panel reports the number of simulations used for Fisherian inference, the method
used to choose the window, and the null hypothesis that is tested (default is τ0 = 0, i.e. a
test of HF0 and H0 ). Finally, the last row of the top panel reports the chosen randomization
mechanism, which by default is fixed margins (i.e. complete) randomization.
The middle panel reports the number of observations to the left and right of the cutoff in
both the entire support of the running variable, and in the chosen window. Although there is
a total of 595 control observations and 702 treated observations, the number of observations
in the window [−2.5, 2.5] is much smaller, with only 63 elections below the cutoff and 57
elections above it. The middle panel also reports the mean and standard deviation of the
outcome inside the chosen window.
The last panel reports the results. The first column reports the type of test statistic
employed for testing the Fisherian sharp null hypothesis (the default is the difference-in-
means), and the column labeled T reports its value. In this case, the difference-in-means
is 9.167; given the information in the Mean of outcome row in the middle panel, we see
that this is the difference between a Democratic vote share of 53.235 percentage points in
elections where the Democratic party barely wins, and 44.068 percentage points in elections
where the Democratic party barely loses. The Finite sample column reports the p-value
(pF ) associated with a randomization-based test of the Fisherian sharp null hypothesis HF0
(or the alternative sharp null hypothesis HFτ0 based on a constant treatment effect model if
the user sets τ0 6= 0 via the option nulltau). This p-value is 0.000, which means we reject
the sharp null hypothesis 5%, 1%, 0.1%, and even lower levels.
Finally, the Large sample columns in the bottom panel report inferences based on the
large sample approximate behavior of the (distribution of the) statistic. The p-value reported
in the large sample columns is thus associated with a test of the null hypothesis H0 that the
average treatment effect is zero. The last column in the bottom panel reports the power of
the test to reject a true average treatment effect equal to d, where by default d is set to
one half of the standard deviation of the outcome variable for the control group, which in
this case is 10.627 percentage points. The value of d can be modified with the options d or
dscale. Like for the p-value, the calculation of the power versus the alternative hypothesis
d is based on the Gaussian approximation. The large sample p-value is 0.000, indicating
that this null hypothesis is also easily rejected at conventional levels. The power calculation
indicates that the probability of rejecting the null hypothesis when the true effect is equal
to half a (control) standard deviation is high, at 0.866. The estimated effect is large, roughly
equal to one standard deviation of the control outcome.
We note the different interpretation of the difference-in-means test statistic in the Fishe-
29
2 LOCAL RANDOMIZATION RD APPROACH
rian versus large sample framework. In Fisherian inference, the difference-in-means is simply
one of the various test statistics that can be chosen to test the sharp null hypothesis, and
should not be interpreted as an estimated effect; this is because the focus is on hypothesis
testing, not on point estimation. In contrast, in the large sample framework, the focus is on
the sample average treatment effect; since the difference-in-means is a consistent estimator
of this parameter under the assumptions we have made, it can be appropriately interpreted
as an estimated effect under those assumptions.
To illustrate how robust Fisherian inferences can be to the choice of randomization mech-
anism and test statistic, we modify our call to randinf to use a binomial randomization
mechanism, where every unit in the ad-hoc window [−2.5, 2.5] has a 1/2 probability of being
assigned to treatment. For this, we first create an auxiliary variable that contains the treat-
ment assignment probability of every unit in the window; this variable is then passed as an
argument to rdrandinf.
30
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.3
> bern_prob <- numeric(length(X))
> bern_prob[abs(X) > 2.5] <- NA
> bern_prob[abs(X) <= 2.5] <- 1/2
> out <- rdrandinf(Y, X, wl = -2.5, wr = 2.5, seed = 50, bernoulli = bern_prob)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 5.313
================================================================================
Diff. in means 9.167 0.000 0.000 0.866
================================================================================
The last row of the top panel now says Randomization = Bernoulli, indicating that
the Fisherian randomization-based test of the sharp null hypothesis assumes a Bernoulli
assignment; in this case, given our construction of the bern prob variable, this probability
31
2 LOCAL RANDOMIZATION RD APPROACH
is 1/2 for all units. The Fisherian p-value is again 0.000, the same p-value obtained above
under the assumption of a fixed margins randomization. The conclusion of rejection of HF0 is
therefore unchanged. This robustness of the Fisherian p-value to the choice of fixed margins
versus Bernoulli randomization is typical in applications. Note also that the large sample
results are exactly the same as before—this is expected, since the choice of randomization
mechanism does not affect the large sample inferences.
We could also change the test statics used to test the Fisherian sharp null hypothesis. For
example, to use the Kolmogorov-Smirnov test statistic instead of the difference-in-means, we
can use the option statistic = "ksmirnov" (output not shown).
To obtain confidence intervals, we must specify a grid Gτ0 of treatment effect values
to invert tests of the sharp null hypothesis. The function rdrandinf tests the null hy-
potheses HFτ0 : Yi (1) − Yi (0) = τ0 for all values of τ0 in the grid, and collect in the con-
fidence interval all the hypotheses that fail to be rejected in a randomization-based test
of the desired level (default is α = 0.05). To calculate these confidence intervals, we cre-
ate the grid, and then call rdrandinf with the ci option. For this example, we choose a
grid of values for τ0 between −20 and 20, with 0.10 increments. Thus, we test Hτ0 for all
τ0 ∈ Gτ0 = {−20, −19.90, −19.80, . . . , 19.80, 19.90, 20}.
32
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.4
> ci_vec <- c(0.05, seq(from = -20, to = 20, by = 0.1))
> out <- rdrandinf(Y, X, wl = -2.5, wr = 2.5, seed = 50, reps = 1000,
+ ci = ci_vec)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 5.313
================================================================================
Diff. in means 9.167 0.000 0.000 0.866
================================================================================
The Fisherian 95% confidence interval is [5.7, 12.6]. As explained, this confidence interval
33
2 LOCAL RANDOMIZATION RD APPROACH
assumes a constant treatment effect model. The interpretation is therefore that, given the
assumed randomization mechanism, all values of τ between 5.7 and 12.6 in the constant
treatment effect model Yi (1) = Yi (0) + τ fail to be rejected with a 5%-level randomization-
based Fisherian test.
In practice, the window W is almost always unknown and must be chosen; this is an impor-
tant step in the implementation of the local randomization RD approach. For simplicity, the
windows we consider are symmetric around the cutoff, i.e. they the form W = [c − w, c + w]
for w ≥ 0. One option is to choose W in an ad hoc way. For example, a scholar may believe
that elections decided by 0.5 percentage points or less are essentially decided as if by the flip
of a coin, and choose W = [c − 0.5, c + 0.5]. The disadvantage of an ad-hoc method is that
it lacks transparency and objectivity.
A preferred alternative is to use a principled data-driven procedure. A leading example
is based on predetermined covariates—variables that capture important characteristics of
the units and whose values are determined before the treatment is assigned and received.
This approach requires assuming that there exists at least one predetermined covariate, Zi ,
that is associated with the running variable only outside the window W = W0 where the
local randomization assumptions hold. Specifically, the requirement is that Zi be associated
with the score in windows larger than W0 , possibly due to correlation between the score and
another characteristic that also affects Zi , but independent of the score in W0 and all smaller
windows. Moreover, because Zi is a predetermined covariate, the effect of the treatment on Zi
is zero by construction. Figure 2.4 shows a hypothetical illustration based on the conditional
expectation of Zi given the score. (We focus on the conditional expectation of a random
covariate for illustration purposes only, but the idea applies more generally.)
This motivates a data-driven method to choose W. We define a generic null hypothesis H0
stating that the treatment is unrelated to Zi (or that Zi is “balanced” between the groups).
This hypothesis could be the Fisherian hypothesis HF0 or the large sample hypothesis H0 . The
procedure starts with the smallest possible window—W1 in Figure 2.4—and tests H0 . Since
there is no treatment effect inside W1 , H0 will fail to be rejected. A larger window W2 is
selected, and the null hypothesis is tested again inside W2 . The procedure keeps increasing
the length of the window and re-testing H0 in each larger window, until a window is reached
where H0 is rejected at the chosen significance level α? ∈ (0, 1). In the figure, assuming the
test has perfect power, H0 will not be rejected in W0 , nor will it be rejected in W2 or W1 . The
34
2 LOCAL RANDOMIZATION RD APPROACH
chosen window is the largest window such that H0 fails to be rejected inside that window
and in all windows contained in it.
E[Z|X=x]
Covariate
W6
W5
W4
W3
W0
W2
W1
c + w1
c + w0
c + w3
c + w4
c + w5
c + w6
c−w6
c−w5
c−w4
c−w3
c−w0
c−w1
c
Score (x)
• Null hypothesis. Since the procedure will typically involve some windows with very few
observations, we recommend using the Fisherian methods for the sharp null hypothesis,
HF0 : Zi (1) = Zi (0) for all i.
• Relevant covariates. The covariates employed should be related to both the outcome
and the treatment assignment. If multiple covariates are chosen, the procedure can be
applied using either the p-value of an omnibus test statistic, or by testing H0 for each
covariate separately and using the minimum p-value across all covariates.
• Test statistic. Typical choices of the statistic used to test H0 include the difference-in-
means, the Kolmogorov-Smirnov statistic, and the Wilcoxon rank-sum statistic.
35
2 LOCAL RANDOMIZATION RD APPROACH
• Minimum number of observations in the smallest window. If the smallest window where
H0 is tested is too small, it will contain too few observations and the power to reject
the null hypothesis when it is false will be too low. Thus, the smallest window should
contain a minimum number of observations to ensure acceptable power; we recommend
at least roughly ten observations on either side of the cutoff.
• Level α? . Because the main concern is failing to reject a false H0 , the threshold signif-
icance level that determines when H0 is rejected should be higher than the usual 0.05.
When we test H0 at a higher level, we tolerate a higher probability of Type I error
and a lower probability of concluding that the covariate is unrelated to the treatment
assignment when in fact it is. We recommend setting α? ≥ 0.15 if possible, and ideally
no smaller than 0.10.
We use this procedure to select a window in U.S. Senate application using the predeter-
mined covariates described in Table 2.1. We use the function rdwinselect, which is part of
the rdlocrand library. The main arguments are the score variable X, the matrix of prede-
termined covariates, and the sequence of nested windows. We also choose 1, 000 simulations
for the calculation of Fisherian p-values in each window.
There are two ways to increment the length of the windows in rdwinselect. One is
to increase the length in fixed steps, which can be implemented with the option wstep.
For example, if the first window selected is [0.1, 0.1] and wstep = 0.1, the sequence is
W1 = [0.1, 0.1], W2 = [0.2, 0.2], W3 = [0.3, 0.3], etc. The other is to increase window length
so that the number of observations increases by a minimum fixed amount on every step,
which can be done via the option wobs. For example, by setting wobs = 5, every window in
the sequence is the smallest symmetric window such that the number of added observations
on each side of the cutoff relative to the prior window is at least 5. By default, rdwinselect
starts with the smallest window that has at least 10 observations on either side, but this
default behavior can be changed with the options wmin or obsmin. Finally, rdwinselect
uses the chosen level α? to recommend the chosen window; the default is α? = 0.15, but this
can be modified with the level option.
We start by considering a sequence of symmetric windows where the number of observa-
tions in each step increases by at least two observations on either side (option wobs=2).
36
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.5
> Z <- cbind(data$presdemvoteshlag1, data$demvoteshlag1, data$demvoteshlag2,
+ data$demwinprv1, data$demwinprv2, data$dmidterm, data$dpresdem,
+ data$dopen)
> colnames(Z) <- c("presdemvoteshlag1", "demvoteshlag1", "demvoteshlag2",
+ "demwinprv1", "demwinprv2", "dmidterm", "dpresdem", "dopen")
> out <- rdwinselect(X, Z, seed = 50, reps = 1000, wobs = 2)
Mass points detected in running variable
You may use wmasspoints option for constructing windows at each mass point
================================================================================
Window p-value Var. name Bin.test Obs<c Obs>=c
================================================================================
-0.5287 0.5287 0.186 demvoteshlag2 0.327 10 16
-0.5907 0.5907 0.404 dopen 0.362 12 18
-0.6934 0.6934 0.464 dopen 0.311 14 21
-0.7652 0.7652 0.241 dopen 0.154 15 25
-0.9694 0.9694 0.076 dopen 0.135 17 28
-1.0800 1.0800 0.034 dopen 0.119 19 31
-1.1834 1.1834 0.097 dopen 0.134 21 33
-1.2960 1.2960 0.115 dopen 0.245 25 35
-1.3289 1.3289 0.225 dmidterm 0.382 28 36
-1.4174 1.4174 0.126 dmidterm 0.396 30 38
================================================================================
Recommended window is [-0.7652;0.7652] with 40 observations (15 below, 25 above).
37
2 LOCAL RANDOMIZATION RD APPROACH
The top and middle panels in the rdwinselect output are very similar to the correspond-
ing panels in the rdrandinf output. One difference is the Testing method, which indicates
whether Fisherian methods are used to test HF0 , or Normal approximations methods are used
to test H0 . The default is Fisherian methods, but this can be changed with the approximate
option. The other difference is the Balance test row, which indicates the type of test statis-
tic used for testing the null hypothesis—the default is diffmeans, the difference-in-means.
The option statistic allows the user to select a different test statistic; the available options
are the Kolmogorov-Smirnov statistic (ksmirnov), the Wilcoxon-Mann-Whitney studentized
statistic (ranksum), and Hotelling’s T-squared statistic (hotelling).
The bottom panel shows tests of the null hypothesis for each window considered. By
default, rdwinselect starts with the smallest symmetric window that has at least 10 obser-
vations on either side of the cutoff. Since we set wobs=2, we continue to consider the smallest
possible (symmetric) windows so that at least 2 observations are added on each side of the
cutoff in every step. For every window, the column p-value reports the minimum of all the
p-values associated with the tests of the null hypothesis performed for each covariate (pmin ),
or the unique p-value if an omnibus test is used. The column Var. name reports the covariate
associated with the minimum p-value—that is, the covariate Zk such that pk = pmin .
Finally, the column Bin. test uses a Binomial test to calculate the probability of observ-
ing NW,+ successes out of NW trials, where NW,+ is the number of observations within the
window that are above the cutoff (reported in column Obs>c) and NW is the total number
of observations within the window (the sum of the values reported in Obs<c and Obs>c). We
postpone discussion of this test until the upcoming section on falsification.
The output indicates that the p-values are above 0.15 in all windows between the min-
imum window [−0.5287, 0.5287] and the window [−0.7652, 0.7652]. In the window immedi-
ately after [−0.7652, 0.7652], the p-value drops to 0.076, considerably below the suggested
0.15 threshold. The data-driven window is therefore W0 = [−0.7652, 0.7652]. After this win-
dow, the p-values start decreasing, albeit initially this decrease is not monotonic. By default,
rdwinselect only shows the first 20 windows, but this number can be increased with the
option nwindows. We can also set the option plot=TRUE to create a plot of the minimum
38
2 LOCAL RANDOMIZATION RD APPROACH
p-values associated with the length of each window considered; we show the plot in Figure
2.5 for the first 200 windows.
0.4
0.3
Pvals
0.2
0.1
0.0
0 10 20 30
wlist_right
Figure 2.5 shows that the minimum p-value decreases sharply with window length, stay-
ing below 0.10 for all windows approximately larger than [−2.5, 2.5]. Although the p-values
increase above 0.10 for some windows between [−1, 1] and [−2.5, 2.5], they decrease sharply
once windows larger than [−3, 3] are considered. The pattern in this plot is common in most
applications: a strong negative relationship between p-values and window length, with high
p-values for the smallest windows that decrease rapidly (albeit not necessarily monotoni-
cally) and stay at zero once the window length is large enough. Although in this example
the absolute value of the running variable ranges from 0 to 100, the p-values become ap-
proximately zero for windows larger than [−3, 3]. This shows that there are sharp differences
between districts where the Democratic party wins versus loses even for elections decided by
moderate margins. For this reason, the window selector chose [−0.7652, 0.7652], suggesting
that the local randomization assumptions, if they hold at all, hold in a very small window
near the cutoff.
To assess the sensitivity of the window selector, we can call rdwinselect with wstep=0.1.
39
2 LOCAL RANDOMIZATION RD APPROACH
This option starts at the minimum window and increases the length by 0.1 at each side
of the cutoff. The suggested window in this case is [−0.8287, 0.8287], very similar to the
[−0.7652, 0.7652] window chosen above with wobs=2. We omit the output to conserve space.
We can now use rdrandinf to perform a local randomization analysis in the chosen
window. For this, we use the options wl and wr to input, respectively, the lower and upper
limit of the chosen window. We also use the option d = 7.414 to calculate the power of a
large sample test to reject the null hypothesis of a zero average treatment effect when the
true average difference is 7.414. This value is the continuity-based linear polynomial point
estimate shown in Snippet 2.1.
The difference-in-means in W0 = [−0.7652, 0.7652] is 10.203, larger than the continuity-
based local linear point estimate of 7.414 but leading to the same conclusion of a positive
advantage. Both the large sample and Fisherian approaches reject the null hypothesis of,
respectively, no average treatment effect and no treatment effect for any unit. As shown in
the last column, the large sample power to detect a difference of around 7 percentage points is
87.2%, a large value. In accordance with these results, the Fisherian 95% confidence interval
under a constant treatment effect model is [5, 15.3], showing positive effects of Democratic
victory on future vote share. To calculate these confidence intervals, we use the option ci
to pass a grid of treatment effect values, each of which is used as a null hypothesis in a
Fisherian test to collect all hypotheses that fail to be rejected in a confidence interval.
40
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.6
> ci_vec <- c(0.05, seq(from = -20, to = 20, by = 0.1))
> out <- rdrandinf(Y, X, wl = -0.7652, wr = 0.7652, seed = 50,
+ reps = 1000, ci = ci_vec, d = 7.414)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 7.414
================================================================================
Diff. in means 10.203 0.000 0.000 0.872
================================================================================
41
2 LOCAL RANDOMIZATION RD APPROACH
Finally, we mention that instead of calling rdwinselect first and rdrandinf second, we
can choose the window and perform inference in one step by using the covariates option
in rdrandinf. However, it is usually better to first choose the window using rdwinselect
and then use rdrandinf. The reason is that calling rdwinselect by itself will never show
outcome results, and will reduce the possibility of choosing the window where the outcome
results are in the “expected” direction—in other words, choosing the window without looking
at the outcome results minimizes pre-testing and specification-searching issues.
This crucial falsification test focuses on two types of variables: predetermined covariates—
variables that are determined before the treatment is assigned, and placebo outcomes—
variables that are determined after the treatment is assigned but are known to be unaffected
by the treatment for scientific reasons. The idea is that, in a valid RD design, there should
be no systematic differences between treated and control groups at the cutoff in terms of
both placebo outcomes and predetermined covariates, because these variables could not have
been affected by the treatment. For implementation, the researcher conducts a test of the
hypothesis that the treatment effect is zero for each predetermined covariate and placebo
outcome. If the treatment does have an effect on these variables, the plausibility of the RD
assumptions is called into question.
42
2 LOCAL RANDOMIZATION RD APPROACH
An important principle behind this type of falsification analysis is that all predetermined
covariates and placebo outcomes should be analyzed in the same way as the outcome of inter-
est. In the local randomization approach, this means that the null hypothesis of no treatment
effect should be tested within the window where the assumption of local randomization is
assumed to hold, using the same inference procedures and the same treatment assignment
mechanism and test statistic used for the analysis of the outcome. Since the local random-
ization assumptions are assumed to hold in W = W0 , all covariates and placebo outcomes
should be analyzed within this window. This illustrates a fundamental difference between the
approaches: in the continuity-based approach, estimation and inference requires approximat-
ing unknown regression functions, which requires estimating different bandwidths for each
covariate or placebo variable analyzed; in contrast, in the local randomization approach, all
analyses occur within the same window W0 .
In order to test if the predetermined covariates are balanced within our chosen window
W0 = [−0.7652, 0.7652], we analyze their behavior in this window using Fisherian methods.
We use the difference-in-means statistic, the same statistic we used for the outcome. Under
the local randomization assumptions, we expect the difference-in-means between treated and
control groups for each covariate to be indistinguishable from zero within W0 . Naturally, we
already know that the covariates that we used to choose the window are balanced in W0 . In
this sense, the window selector procedure is itself a validation procedure. We note, however,
that it is possible (and indeed common) for researchers to choose the window based on a
given set of covariates, and then assess balance on a different set.
In order to test this formally, we use the rdrandinf function, using each covariate as
the outcome of interest. For example, when we study the covariate presdemvoteshlag1, we
see that the difference-in-means statistic is relatively small (46.415 − 44.463 = 1.952), and
the finite sample p-value is large (0.461), showing that this covariate is balanced inside the
chosen windows.
43
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.7
> out <- rdrandinf(data$presdemvoteshlag1, X, seed = 50, reps = 1000,
+ wl = -0.7652, wr = 0.7652)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 5.006
================================================================================
Diff. in means 1.952 0.461 0.495 0.418
================================================================================
Table 2.4 contains a summary of the balance analysis for all covariates using Fisherian
methods. We cannot conclude that the control and treatment means are different for any
covariate, since the p-values are above 0.14 in all cases. The number of observations is fixed
in all cases (the small changes are due to missing values for particular covariates) because
the window is set to the same W0 for each covariate.
44
2 LOCAL RANDOMIZATION RD APPROACH
Another important falsification test is a density test that analyzes whether the number of
observations just above the cutoff is roughly similar to the number of observations just
below the cutoff. The idea is that, if units lack the ability to control precisely the value of
the score they receive, they should be as likely to receive a score value just above the cutoff
as they are to receive a score value just below it. In a local randomization approach, this
falsification analysis is implemented by testing the null hypothesis that, within the window
W where the treatment is assumed to be randomly assigned, the number of treated and
control observations is consistent with whatever assignment mechanism is assumed inside
W.
For example, assuming a simple “coin flip” or Bernoulli trial with probability of success
q, we would expect the control sample size, NW,− , and treatment sample size, NW,+ , within
W to be compatible with the numbers generated by these NW,− + NW,+ = NW Bernoulli
trials. In this case, the number of treated units in W follows a binomial distribution, and the
null hypothesis of the test is that the probability of success in the NW Bernoulli experiments
is q. As discussed, the true probability of treatment is unknown. In practice, researchers can
choose q = 1/2 (a choice that can be justified from a large sample perspective when the
score is continuous).
The binomial test is implemented in all common statistical software, and is also part
of the rdlocrand package via the rdwinselect command. Using the Senate data, we can
implement this falsification test in our selected window W0 = [−0.7652, 0.7652] employing
rdwinselect with the score as the single argument. Since we only want to see the binomial
test in this window, we use the option nwindows(1).
45
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.8
> out <- rdwinselect(X, wmin = 0.7652, nwindows = 1)
Mass points detected in running variable
You may use wmasspoints option for constructing windows at each mass point
================================================================================
Window p-value Var. name Bin.test Obs<c Obs>=c
================================================================================
-0.7652 0.7652 NA NA 0.211 16 25
================================================================================
Note: no covariates specified.
There are 16 control observations and 25 treated observations in the window. The column
Bin. test shows the p-value of a binomial test that uses a success probability equal to 1/2.
The p-value is 0.211, so we find no evidence against the null hypothesis: the difference in the
number of treated and control observations in W0 = [−0.7652, 0.7652] is generally consistent
with what would be expected if states were assigned to a Democratic win or loss by the flip
of an unbiased coin.
We can also implement the binomial test using the base distribution in R or Stata.
46
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.9
> binom.test(25, 41, 1/2)
data: 25 and 41
number of successes = 25, number of trials = 41, p-value = 0.211
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.4450478 0.7579890
sample estimates:
probability of success
0.6097561
This falsification test chooses one ore more artificial cutoff values at which the probability of
treatment assignment does not change, and analyzes the outcome of interest at these cutoffs
using the same methods used to conduct the analysis at the actual cutoff. The expectation
is that no effect should be found at any of the artificial cutoffs. To avoid contamination from
the actual treatment effect, only treated observations are included for artificial cutoffs above
the actual cutoff, and only control observations are included for cutoffs below the actual
cutoff.
In the local-randomization approach, one possible implementation is to choose several
artificial cutoff values, and then conduct a randomization-based analysis of the outcome
using a symmetric window of the same length as the original window W0 around each of the
cutoffs. Since our chosen window in the U.S. Senate application is W0 = [−0.7652, 0.7652], we
consider windows of length ±0.7652 around each artificial cutoff. For example, for the cutoff
c = 1, we analyze the outcome in the window [0.235, 1.765] (only using treated observations);
the output is below.
47
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.10
> out <- rdrandinf(Y, X, cutoff = 1, wl = 0.2348, wr = 1.7652,
+ seed = 50)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 3.556
================================================================================
Diff. in means 2.297 0.382 0.375 0.279
================================================================================
We perform a similar analysis for the artificial cutoff c = −1 with window given by the
cutoff −1 ± 0.7652, this time only using control observations. We summarize the results in
Table 2.5, where we see that the point estimate of roughly 10 percentage points that we
saw around the real cutoff is dramatically reduced to 2.30 and −0.33 at the artificial cutoffs,
with very large Fisherian p-values. In contrast to the true cutoff, there is no evidence of a
treatment effect at the artificial cutoffs; this is reassuring.
48
2 LOCAL RANDOMIZATION RD APPROACH
Table 2.5: Local Randomization Analysis for Placebo Cutoffs—U.S. Senate data
Just like in a continuity-based approach researchers are interested in the sensitivity of the
results to the bandwidth choice, in a local randomization approach we are often interested
in sensitivity to the window choice. To assess this sensitivity, researchers can simply con-
sider windows smaller than the chosen W0 and repeat the randomization-based analysis
for the outcome of interest as conducted in the original window—that is, using the same
test-statistic, same randomization mechanism, etc.
This analysis should be implemented carefully, however. If W0 was chosen based on co-
variate balance as we recommend, results in windows larger than W0 will not be reliable
because in such windows the treated and control groups will be imbalanced in important
covariates. Thus, the sensitivity analysis should only consider windows smaller than W0 ;
unfortunately, in many applications this analysis will be limited by the small number obser-
vations that is likely to occur in these windows. In the Senate example, our chosen window
W = [−0.7652, 0.7652] has only 25 and 16 observations on either side of the cutoff, so our
ability to explore smaller windows is very restricted. Nonetheless, we consider the smaller
window W = [−0.6934, 0.6934]. The output, presented below, shows that the conclusion of
a positive party advantage remains unchanged when this smaller windows is considered.
49
2 LOCAL RANDOMIZATION RD APPROACH
R Snippet 2.11
> out <- rdrandinf(Y, X, wl = -0.6934, wr = 0.6934, seed = 50)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 3.619
================================================================================
Diff. in means 9.124 0.001 0.000 0.303
================================================================================
The RD treatment assignment rule 1(Xi ≥ c) does not imply that the treatment is randomly
assigned within some window. Like the continuity assumption, the local randomization as-
sumption must be made in addition to the RD assignment mechanism, and is inherently
untestable. When the score is continuous, the local randomization assumption is strictly
50
2 LOCAL RANDOMIZATION RD APPROACH
stronger than the continuity assumption, in the sense that if there is a window around c in
which the regression functions are constant functions of the score, these regression functions
will also be continuous functions of the score at c. But the converse is not true: there may be
applications where the regression functions satisfy the continuity assumptions even though
there is no window around the cutoff that satisfies the local randomization assumptions. Why,
then, would researchers want to impose stronger assumptions to make their inferences?
Although the continuity-based approach relies on the weaker condition of continuity,
it unavoidably requires extrapolation because there are no observations with score exactly
equal to the cutoff. The extrapolation consists of using observations in a neighborhood of the
cutoff to approximate the unknown regression function, and then calculating the value of the
regression function exactly at the cutoff using the approximated functional form. Although
the smoothness assumptions required for this approximation to be valid do not impose
parametric restrictions, the approximation does introduce an error that is only negligible if
the sample size is large enough. This makes the continuity-based approach more appealing
when there are enough observations near the cutoff to approximate the regression functions
with reasonable accuracy—but possibly inadequate when the number of observations is small.
In applications with few observations, the local randomization approach has the advantage
of requiring minimal extrapolation and avoiding the use of smoothing methods.
Another situation in which a local randomization approach may be preferable to a
continuity-based approach is when the running variable is discrete—i.e., when multiple units
share the same value of the score. When the score is discrete, the continuity-based approach
is not directly applicable, and the local randomization is often a natural and useful alterna-
tive. We consider this issue in Section 4, where we discuss how to analyze RD designs with
discrete running variables.
Textbook reviews of Fisherian and Neyman estimation and inference methods in the con-
text of the analysis of experiments are given by Rosenbaum (2002, 2010) and Imbens and
Rubin (2015); the latter also discusses super-population approaches and their connections to
finite population inference methods. Ernst (2004) discusses the connection and distinctions
between randomization and permutation inference methods. Cattaneo, Frandsen, and Titiu-
nik (2015) propose Fisherian randomization-based inference to analyze RD designs based on
a local randomization assumption, and the window selection procedure based on covariate
balance tests. Cattaneo, Titiunik, and Vazquez-Bare (2017) use transformations of the poten-
51
2 LOCAL RANDOMIZATION RD APPROACH
tial outcomes to relax the local randomization assumption and allow for a weaker exclusion
restriction; they also compare RD analysis in continuity-based and randomization-based ap-
proaches. See also Cattaneo, Titiunik, and Vazquez-Bare (2016). The interpretation of the
RD design as a local experiment and its connection to the continuity-based framework is also
discussed by Sekhon and Titiunik (2016, 2017). Other refinements are surveyed in Cattaneo
and Titiunik (2022). For an RD application where the treatment is truly randomized in a
window around the cutoff, see Hyytinen, Meriläinen, Saarimaa, Toivanen, and Tukiainen
(2018).
52
3 THE FUZZY RD DESIGN
We now discuss how to modify the analysis and interpretation of the RD design when some
units fail to comply with the treatment condition that is assigned to them. In all RD designs,
the assignment of treatment follows the rule Ti = 1(Xi ≥ c), which assigns all units whose
score is below the cutoff c to the control condition, and all units whose score is above c
to the treatment condition. In the Sharp RD design, all units assigned to the treatment
condition do in fact take the treatment, and no units assigned to the control condition take
the treatment. In this case, the rule Ti = 1(Xi ≥ c) indicates not only the treatment assigned
to the units, but also the treatment received by the units.
However, it is common in practice to encounter RD designs where either some of the
units with Xi ≥ c fail to receive the treatment or some of the units with Xi < c receive
the treatment anyway—or both. The phenomenon of units receiving a treatment condition
different from the condition that is originally assigned to them is generally known as imperfect
compliance or non-compliance. The RD design with imperfect compliance is usually referred
to as the Fuzzy RD design, to distinguish it from the Sharp RD design where compliance is
perfect. Imperfect compliance is common in randomized experiments, and is no less common
in RD designs.
The Fuzzy RD treatment assignment rule is still Ti = 1(Xi ≥ c) but compliance with this
assignment is imperfect. As a consequence, although the probability of receiving treatment
still jumps abruptly at the cutoff, it no longer changes from 0 to 1 as in the Sharp RD case.
(Naturally, the probability of being assigned to treatment still jumps from 0 to 1 at c.) We
use the binary variable Di to denote whether the treatment was actually received by unit
i. Our notation now distinguishes between the treatment assigned, Ti , and the treatment
received, Di . We can thus can say that the key characteristic of the Fuzzy RD design is that
there are some units for which Ti 6= Di .
We illustrate the difference between the Sharp and Fuzzy RD designs in Figure 3.1, where
we plot the conditional probability of receiving treatment given the score, P(Di = 1|Xi = x),
for different values of the running variable Xi . As shown in Figure 3.1(a), in a Sharp RD
design the probability of receiving treatment changes exactly from zero to one at the cutoff.
In contrast, in a Fuzzy RD design, the change in the probability of being treated at the cutoff
is always less than one. Figure 3.1(b) illustrates a Fuzzy RD design with so-called two-sided
non-compliance: near the cutoff, some control units receive the treatment, and some treated
units fail to receive the treatment.
The treatment received Di , also known as the treatment take-up, has two potential values:
53
3 THE FUZZY RD DESIGN
Cutoff Cutoff
0 0
c c
Score (x) Score (x)
Figure 3.1: Conditional Probability of Receiving Treatment in Sharp vs. Fuzzy RD Designs
Di (1) is the treatment received by i when this unit is assigned to the treatment condition
(i.e, when Xi ≥ c and Ti = 1) and Di (0) is the treatment received when this unit is assigned
to the control condition (i.e, when Xi < c and Ti = 0), with Di (1), Di (0) ∈ {0, 1}. For
example, if unit i receives the treatment when assigned to the control condition, we write
Di (0) = 1, and if this unit complies with the control assignment, we write Di (0) = 0. The
observed treatment taken is Di = Ti · Di (1) + (1 − Ti ) · Di (0) and the fundamental problem of
causal inference now extends to the treatment received in addition to the outcome: for every
unit, we observe either Di (1) or Di (0), but never both. The quantities Di (1) and Di (0) are
thus the potential decisions to comply with the treatment assignment; for brevity, we refer
to them as potential treatments.
Given the possibility of noncompliance, we generalize the notation for the potential out-
comes to Yi (Ti , Di (Ti )), which now includes both the treatment assigned (Ti ) and the treat-
ment received (Di ) as arguments. Because Ti and Di are both binary, we now have four
potential outcomes instead of two. The potential outcome when unit i is assigned to treat-
ment is Yi (1, Di (1)) = Di (1)Yi (1, 1) + (1 − Di (1))Yi (1, 0), which results in Yi (1, 1) or Yi (1, 0)
depending on whether Di (1) is equal to 1 or 0. Similarly, the potential outcome when i is
assigned to control is Yi (0, Di (0)) = Di (0)Yi (0, 1)+(1−Di (0))Yi (0, 0). The observed outcome
is now Yi = Ti Yi (1, Di (1)) + (1 − Ti )Yi (0, Di (0)).
In the Fuzzy RD design, researchers are usually interested in the effects of both assigning
the treatment and receiving the treatment on the outcome of interest. Since it is always the
case that all units below the cutoff are assigned to control and all units above it are assigned
54
3 THE FUZZY RD DESIGN
to treatment, the analysis of the effect of assigning the treatment follows standard Sharp
RD design methods. In contrast, the study of the effect of receiving the treatment requires
modifications and different assumptions. We devote this section to discuss both types of
effects, organizing our discussion around the same topics previously discussed in Foundations
and in the last section: estimation of effects, inference, falsification, graphical illustration, and
interpretation. As in the Sharp RD case, the analysis of Fuzzy RD designs can be based on a
continuity-based approach or a local randomization approach, depending on the assumptions
invoked. After introducing our empirical example, we discuss and illustrate both approaches
together in Section 3.2.
We re-analyze the study by Londoño-Vélez, Rodrı́guez, and Sánchez (2020) of the effects of
a governmental subsidy for post-secondary education in Colombia. The program, Ser Pilo
Paga (SPP), funds the full tuition of a four-year or five-year undergraduate program in any
government-certified higher education institution (HEI) that satisfies minimum quality stan-
dards. Program eligibility depends on both merit and economic need: in order to qualify for
the program, students must obtain a high grade in Colombia’s national standardized high
school exit exam, SABER 11, and they must also come from economically disadvantaged
families, measured by a survey-based wealth index known as SISBEN. In both cases, eligi-
bility follows a deterministic rule with fixed cutoffs: students must obtain a SABER 11 score
in the top 9 percent of scores, and they must come from a household with SISBEN index
below a region-specific threshold.
The analysis includes only students who took the SABER 11 test in the fall of 2014; this
is the first cohort of beneficiaries of the SPP program. Because program eligibility is based
on whether observed scores exceed fixed cutoffs, the SPP program is a clear example of a RD
design. However, it differs from the setup discussed in Foundations and in the prior section
in this Element in two ways. First, because some eligible students did not receive the SPP
subsidy, there is imperfect compliance with the treatment assignment, making this a Fuzzy
RD design. Second, program eligibility is determined by two scores as opposed to one, which
makes this a multi-dimensional RD design in general.
For the purposes of this section, we transform this two-dimensional RD design into a one-
dimensional RD design by considering only the subset of students whose SABER 11 score is
above the merit cutoff. For this subsample, program eligibility obeys a one-dimensional Fuzzy
55
3 THE FUZZY RD DESIGN
RD design where the score is the SISBEN wealth index. (We re-analyze this application
using the full sample and considering both scores simultaneously in Section 5, where we
discuss multi-dimensional RD designs.) In this one-dimensional Fuzzy RD design, the unit
of analysis is a student in 2014, the running variable is the student’s SISBEN wealth index,
the treatment is receipt of the SPP subsidy, and the cutoff varies according to the student’s
area of residence (40.75 in rural areas, 57.21 in the fourteen main metropolitan areas, and
56.32 in other urban areas). The SISBEN wealth index is continuous and ranges between
0 (poorest) and 100 (richest); it is constructed based on a household survey that measures
housing quality, ownership of durable goods, pubic utility services, and other indicators of
wealth. The main outcome of interest is enrollment in a high-quality HEI (a binary indicator).
We present descriptive statistics for the main variables in Table 3.1. In our replication
dataset, there are 23,132 total observations (the differences in sample size across variables
are due to missing values), corresponding to students whose SABER 11 score was above
the cutoff and whose household received welfare benefits. The running variable X1 is the
difference between the student’s SISBEN wealth index and her corresponding cutoff, ranging
from −43.84 to 56.23 in the sample; the cutoff is thus normalized to zero. The treatment
assignment (Ti ) is an indicator equal to one when the running variable is below zero, which
indicates that the student is eligible to receive the SPP subsidy. The treatment received
(Di ) is an indicator equal to one if the student actually received the subsidy, regardless of
their SISBEN score value. As shown in the table, 66.7% of the students in this sample are
eligible to receive the SPP program, and 40% actually receive it. As we will see, the 40%
of students who do receive the program does not include any students with SISBEN score
below the eligibility cutoff, which makes this an example of a RD design with one-sided
non-compliance.
Table 3.1 also shows the main outcome of interest: an indicator equal to one if the
student enrolled in a HEI immediately after receiving the subsisdy (Yi ). Finally, the table
also shows six predetermined covariates, measured on the day of the SABER 11 exam, that
56
3 THE FUZZY RD DESIGN
we use below for the falsification analysis: an indicator equal to one if the student identifies
as female (icfes female), the student’s age (icfes age), an indicator equal to one if the
student identifies as an ethnic minority (icfes urm), the residential stratum of the student’s
household (icfes stratum), an indicator equal to one if the student attends a private high
school (icfes privatehs), and the student’s family size (icfes famsize).
where the equality follows from the more general definition of the observed outcome given
above and thus requires no special assumptions.
Analogously, applying the Sharp RD estimation strategy in the local randomization
57
3 THE FUZZY RD DESIGN
framework yields
1 X h Ti Yi i 1 X h (1 − T )Y i
i i
θY ≡ EW − EW
NW i:X ∈W PW [Ti = 1] NW i:X ∈W 1 − PW [Ti = 1]
i i
In words, τY and θY are the parameters that are estimated in a Fuzzy RD design when we
compare the average outcome of observations just below the cutoff to the average outcome
of observations just above the cutoff using, respectively, a continuity-based approach that
takes the limit to the cutoff or a local randomization approach that compares observations
in the small window W.
A natural approach to the analysis of a Fuzzy RD design is to investigate different assump-
tions under which these quantities yield parameters that are of interest to the researcher.
There are two main strategies. One is to focus on assumptions that allow us to interpret τY
and θY as the effect of assigning the treatment on the outcome. The other is to focus on as-
sumptions that allow us to learn about the effect of receiving the treatment on the outcome,
at least for some subpopulation of units. Whether one or the other strategy is preferable
depends on the specific application and goals of the researcher.
We start by considering the first strategy, where the focus is on learning about the effects of
the treatment assignment, not the treatment received. Following the experimental literature,
we call the effects of assigning the treatment on any outcome of interest intention-to-treat
(ITT) effects.
To obtain ITT parameters in the continuity-based framework, we generalize the condi-
tions discussed in Foundations, and assume that the regression functions E[Yi (1, Di (1))|Xi =
x] and E[Yi (0, Di (0))|Xi = x] are smooth near the cutoff c. In other words, seeing Ti as
the intervention of interest, we ask that the regression functions for both values of this
variable be continuous in the score at the cutoff. This assumption implicitly restricts how
compliance decisions change at the cutoff: for example, if Di (1), seen as a function of the
score x, changes discontinuously at x = c for some units and that leads to discontinuity of
E[Yi (1, Di (1))|Xi = x] at c, the assumption would not hold. Under continuity, we have
58
3 THE FUZZY RD DESIGN
and the estimated jump in the average observed outcome at the cutoff recovers the average
effect of Ti on Yi at c, which we denote τITT .
In the local randomization framework, adapting the assumptions discussed in Section 2,
we continue to require that the joint distribution of the score Xi be known in W, but now
we also require that the augmented potential outcomes Yi (1, Di (1)) and Yi (0, Di (0)) not be
functions of Xi inside W. Under these assumptions, we have
1 X
θY = θITT , θITT ≡ EW [Yi (1, Di (1)) − Yi (0, Di (0))],
NW i:X ∈W
i
and thus the estimated difference in the average observed outcomes inside the window re-
covers the average ITT effect of Ti on Yi in W, which we call θITT .
The perfect compliance sharp RD setting can now be understood as a particular case of
the fuzzy RD design where P[Di (0) = 0|Xi = x] = 1 for x < c (no units with score below
the cutoff receive the treatment) and P[Di (1) = 1|Xi = x] = 1 for x ≥ c (all units with
score above the cutoff receive the treatment), the treatment assignment rule reduces to the
sharp rule, Di = Ti = 1(Xi ≥ c), the four potential outcomes reduce to Yi (1, 1) = Yi (1)
and Yi (0, 0) = Yi (0), and the ITT parameters become τITT = E[Yi (1) − Yi (0)|Xi = c] and
θITT = EW [Yi (1) − Yi (0)|Xi ∈ W]. Thus, when compliance is perfect, the ITT effects of
the treatment assignment on the outcome reduce to the Sharp RD effects of the treatment
received.
In addition to investigating the effects of the treatment assignment on the outcome, an
ITT analysis of a Fuzzy RD design should include a study of how the RD assignment rule
affects the probability of receiving the treatment. The effect of the treatment assignment on
the treatment received reveals information about compliance and the effectiveness of the RD
rule in inducing individuals to take the treatment. We define parameters analogous to τY and
θY , but this time treating Di as the outcome. Applying a Sharp RD estimation strategy that
compares observations just above and below the cutoff, we estimate the following parameters,
and h (1 − T )D i
1 X h TD
i i
i 1 X i i
θD ≡ EW − EW ,
NW i:X ∈W PW [Ti = 1] NW i:X ∈W 1 − PW [Ti = 1]
i i
for the continuity-based and local randomization frameworks, respectively. Since Di is binary,
τD and λD capture the difference in the probability of receiving the treatment between units
59
3 THE FUZZY RD DESIGN
and
1 X
θD = θFS , θFS ≡ EW [Di (1) − Di (0)].
NW i:X ∈W
i
The parameters τFS and θFS thus capture the effect of assigning the treatment on receiving
the treatment for units with scores near or at the cutoff. Following the instrumental variables
(IV) literature, we call them first-stage effects.
In sum, to study ITT effects in a Fuzzy RD design we must augment the continuity
and local randomization assumptions appropriately to cover the regression functions of the
augmented potential outcomes Yi (t, d), and the additional potential treatments Di (t). In the
continuity-based framework, we require continuity of the regression functions of the potential
outcomes, E[Yi (1, Di (1))|Xi = x] and E[Yi (0, Di (0))|Xi = x], and the potential treatments,
E[Di (1)|Xi = x] and E[Di (0)|Xi = x]. In the local randomization framework, the exclusion
restriction requires that both the potential outcomes and the potential treatments be unaf-
fected by the score within W. Informally, these extended continuity and local randomization
assumptions require that near the cutoff, the outcomes of units with scores below the cutoff
be similar to the outcomes that the units with scores above the cutoff would have had if
they had been assigned to the control condition instead of the treatment. As in any Sharp
RD design, if any important variable other than the treatment assignment changes abruptly
at the cutoff, these assumptions will fail to hold.
Once we generalize the continuity and local randomization conditions to accommodate
non-compliance, estimation, inference and validation for the ITT parameters λITT , λFS , τITT ,
and τFS proceed by applying the methods of analysis for Sharp RD designs outlined in
Foundations. More precisely, local randomization and continuity-based Sharp RD methods
are deployed where Xi remains the RD score, Ti = 1(Xi ≥ c) is seen as the “treatment” of
interest, and now both Yi and Di are viewed as outcomes for the analysis. In the continuity-
based framework, the ITT parameters can be estimated with the difference in the intercepts
60
3 THE FUZZY RD DESIGN
of local polynomials of the observed outcome on the score, fit separately for observations
above and below the cutoff,
and
b i |Xi = x] − lim E[D
τ̂FS = lim E[D b i |Xi = x],
x↓c x↑c
with bandwidth selection and inference methods as discussed in Sections 4.2 and 4.3 in
Foundations.
In the local randomization framework, λITT and λFS can be estimated by calculating
sample difference-in-means between units above and below the cutoff for units with scores
in W:
θbITT = ȲW,+ − ȲW,− and θbFS = D̄W,+ − D̄W,− ,
where
1 X 1 X
ȲW,+ = ωi Ti Yi , ȲW,− = ωi (1 − Ti )Yi
NW,+ i:Xi ∈W
NW,− i:Xi ∈W
and
1 X 1 X
D̄W,+ = ωi Ti Di , D̄W,− = ωi (1 − Ti )Di ,
NW,+ i:Xi ∈W
NW,− i:Xi ∈W
with the weights appropriately selected as discussed in Section 2.2.2, and after the window
W has been selected, preferably based on pre-treatment covariates as discussed in Section
2.2.4. Inference can similarly proceed based on the methods we discussed in Section 2, using
either Fisherian or super-population methods, depending on whether the potential outcomes
are seen as fixed or stochastic.
For interpretation, it is important to remember that, under the appropriate assumptions,
λITT and τITT capture the average overall effect of assigning the treatment on the outcome, not
of receiving the treatment. In some applications, this effect will be of primary interest. For
example, when households with income below a cutoff are eligible to receive a cash transfer,
households whose income would have been above but near the cutoff might decrease their
labor supply so that they become eligible for the program. In this case, the main effect of
the program on total household income at the cutoff may be null, as households’ income
decreases when their labor supply is reduced but simultaneously increases when they receive
the transfer for which they are now eligible. A policy maker who is interested in assessing
whether the program affects labor supply decisions will be interested primarily in the effect
of program eligibility on household income, not on the effect of the cash transfer itself.
61
3 THE FUZZY RD DESIGN
In some Fuzzy RD applications, the effect of Ti on Yi is not of interest per se, and researchers
are instead interested in learning about the effect of receiving the treatment itself, not of
its assignment. In these cases, it is common consider other parameters which, under dif-
ferent assumptions, can provide information about the treatment received, at least for a
subpopulation of units.
When interest is on the effect of the treatment received, it is common to focus on the
parameters
τY θY
τFRD ≡ and θFRD ≡
τD θD
for the continuity-based and local randomization frameworks, respectively. We call these
parameters the Fuzzy RD parameters, and discuss conditions under which they can be inter-
preted as the average effect of the treatment for some subpopulations.
Under the augmented continuity and local randomization assumptions discussed above
for the ITT effects, these Fuzzy parameters will be equal to the ratio of the effects of the
treatment assignment on the outcome and the treatment received, τFRD = ττITT
FS
and λFRD = λλITT
FS
.
This interpretation of the Fuzzy RD parameters as the ratio of two ITT effects is analogous
to results in the IV literature. However, our discussion below does not assume that these
conditions hold.
Putting aside conditions to recover ITT effects, we now explore assumptions under which
the Fuzzy RD parameters λFRD and τFRD can be directly interpreted as treatment effects. The
first assumption we discuss is required by all interpretations and is analogous to the exclu-
sion restriction in IV settings: the treatment assignment must affect the potential outcomes
and potential treatments only via the treatment received, but not directly. In other words,
given a particular value of the treatment received, Di = d, the potential outcomes (or their
distributions) should not be affected by the value of Ti , at least not near the cutoff.
This exclusion restriction is already imposed in the local randomization framework when
we assume that the potential outcomes and potential treatments cannot be a function of
the score within W. Since the assignment Ti = 1(Xi ≤ c) is a function of Xi , assuming
that Yi (Ti , 1) and Yi (Ti , 0) (or their distributions) are not functions of Xi within W implies
assuming that, given a value of the treatment received, Di = d, the potential outcomes (or
their distributions) do not depend on Ti . In contrast, in the continuity-based framework, it
is still possible for Xi to affect the potential outcomes directly because all parameters are
defined at the same point Xi = c, which makes any direct effects of Xi irrelevant. In this
62
3 THE FUZZY RD DESIGN
assumption of monotonicity, the Fuzzy RD parameters recover the average effect of the
treatment at the cutoff for compliers. Following the IV literature, these effects are sometimes
called Local Average Treatment Effects (LATE) or Complier Average Treatment Effects
63
3 THE FUZZY RD DESIGN
(CATE).
In the local randomization framework, the formal definitions of complier strata and mono-
tonicity are analogous to those in the IV literature, restricted to units whose scores are in W.
In contrast, in the continuity-based framework, the formalization of monotonicty and related
conditions is less straightforward, and several alternatives have been proposed. The techni-
cal details are beyond the scope of our practical guide, but we offer references at the end of
this section for the interested reader. The general conclusion is that adding a monotonicity
assumption to the continuity conditions, the Fuzzy RD parameters can be interpreted as the
average treatment effect at the cutoff for the compliers.
Naturally, estimation and inference for the Fuzzy RD parameters θFRD and τFRD proceeds
in the same manner regardless of what particular assumptions are invoked to interpret them.
Estimation proceeds by simply using local polynomials or difference-in-means to estimate
the numerator and denominator, and taking their ratio, which entails taking the ratio of the
estimators defined above,
and
θbY ȲW,+ − ȲW,−
θbFRD = = .
θbD D̄W,+ − D̄W,−
Inference methods are analogous to those used in the Sharp RD design, with some modi-
fications. In the local randomization approach, inferences in the super-population framework
rely on standard IV large sample approximations (based on the Delta method) to the sam-
pling distribution of the ratio of the two effects, applied to observations with scores inside W.
In the Fisherian framework, inferences are implemented as before by permuting the vector
of treatment assignments Ti according to the assumed distribution. Under the sharp null
hypothesis of no treatment effect for any unit, no modifications are needed; under the null
hypothesis that the effect is the same γ for every unit, implementation relies on testing the
sharp null on the adjusted observed outcomes Yi − Di γ. In all cases, the implementation
permutes the treatment assignment, not the treatment received.
In the continuity-based framework, inferences can be based on robust local polynomial
methods. As discussed in Foundations for the Sharp RD case, these methods use polynomials
to approximate the unknown regression functions in a neighborhood or bandwidth around
the cutoff, and in general contain an error because the approximation is not exact. This
error of approximation (also known as bias) is controlled by the bandwidth, and affects the
64
3 THE FUZZY RD DESIGN
large sample distribution of the test statistics of interest. For this reason, it is recommended
to employ robust bias-correction methods for inference when local polynomial methods are
used for RD analysis, particularly when the bandwidth is chosen to be mean-squared-error
optimal, as is common (and recommended) in practice. Analogously to the Sharp RD case,
the resulting bias-corrected robust confidence interval is not centered around τ̂FRD , but rather
around τ̂FRD minus its estimated bias. All conceptual issues are the same as those discussed in
Foundations for the case of Sharp RD (Sections 4.2 and 4.3). Similarly, covariate adjustment
for the ITT parameters follows directly the procedures discussed in Foundations and Section
2, depending on the framework adopted. As discussed in Cattaneo, Keele, and Titiunik
(2022a), covariate adjustment can be used for efficiency gains without altering the parameter
of interest, but not to “fix” RD designs where predetermined covariates are imbalanced at
or near the cutoff.
The main points of our discussion so far can be summarized as follows:
• In the Fuzzy RD design, some units fail to comply with the treatment they are assigned,
which introduces a distinction between the treatment assignment and the treatment
received. Researchers must decide whether they are interested in the effect of the
treatment assignment, the effect of the treatment received, or both.
• When interest is on the effect of the treatment assignment, estimation and inference
proceeds analogously to the Sharp RD case. The effect of the treatment assignment
(Ti ) on both the outcome (Yi ) and the treatment received (Di ) can be studied in a
straightforward manner using Sharp RD methods where Ti is seen as the treatment
of interest and the outcomes are Yi and Di . The local randomization and continuity
assumptions required are the same as those discussed in Section 2 and Foundations,
respectively, with appropriate extensions. These are called the intention-to-treat ef-
fects, and capture the effect of assigning the treatment, not the effect of receiving the
treatment.
• When interest is on the effect of the treatment received, it is common to focus on the
Fuzzy RD parameter, which is the ratio (at or near the cutoff) of the difference in
average outcomes and the difference in the probablity of receiving treatment between
units above and below the cutoff. Under appropriate assumptions, this is equal to
the effect of the treatment received for all or a subset of units near or at the cutoff.
For example, under monotonicity, it is equal to the effect of the treatment received
near the cutoff for compliers; under local independence, it is equal to the treatment
of the treatment received for all units near the cutoff. Moreover, under appropriate
65
3 THE FUZZY RD DESIGN
assumptions, the Fuzzy RD parameter can be interpreted as the ratio (at or near the
cutoff) between the ITT effect of the treatment assignment on the outcome and the
ITT effect of the treatment assignment on the treatment received.
In the continuity-based framework, the bandwidth selection for estimation of the ITT pa-
rameters proceeds exactly as explained in Foundations (Section 4.2.2). The bandwidth for
τFRD , however, requires further consideration. Because this parameter is a ratio, the question
arises of whether researchers should use a different bandwidth for the denominator and nu-
merator, or the same bandwidth for both. If the focus is on ITT effects, the numerator and
denominator parameters will be of independent interest; in this case, the researcher should
estimate these parameters separately, selecting a separate optimal bandwidth for each. But
if the researcher is also (or only) interested in τFRD , using different bandwidths for the nu-
merator and denominator has the disadvantage that, if the bandwidths are different, the
estimators will be constructed based on a different set of observations (naturally, one group
will be a strict subset of the other).
Thus, when interest is on the ratio parameter τFRD , it is sensible to use the same bandwidth
to estimate both the numerator and the denominator. This adds transparency to the analysis,
as researches can clearly explain which observations are included in the calculations. This is
the approach adopted by rdrobust, where the default is to choose a single mean-squared-
error (MSE) optimal bandwidth by minimizing the MSE of a linear approximation of the
ratio estimator τ̂FRD . By using a linear approximation to τ̂FRD , we build a single MSE objective
function that leads to a single bandwidth for estimation of the Fuzzy RD parameter and
avoids using different bandwidths for numerator and denominator.
This issue does not arise in the local randomization framework, because the local ran-
domization assumptions are invoked in a single window that applies to all outcomes. The
window selection should be implemented by assessing whether pre-determined covariates are
balanced in nested windows around the cutoff, exactly as discussed in Section 2 for the Sharp
RD case. In other words, window selection in the Fuzzy RD design is unchanged.
Because the Fuzzy RD parameter is the ratio of two parameters, it will be undefined if the
denominator is zero. Thus, the study of this parameter requires the additional assumption
66
3 THE FUZZY RD DESIGN
that the denominator is non-zero. This assumption is equivalent to the notion of “relevant
instrument” in the IV literature, and can be studied empirically with a test of the null
hypothesis that τD or θD are zero, which should be the first step in the analysis of the Fuzzy
RD parameters τFRD or λFRD . If the p-value associated with this hypothesis is smaller than
conventional thresholds, the evidence allows us to conclude that the effect is non-zero.
However, the assumption of non-zero first stage effect is not enough. When the RD
rule has a non-zero but very small effect on the probability of receiving the treatment, the
standard Gaussian approximations to the distributions of the RD test statistics (i.e., those
based on difference-in-means in W or limits of local polynomial estimators at the cutoff)
are not reliable, and statistical inference based on those approximations will be invalid. In
the IV literature, this is a well understood problem known as weak instruments or, more
generally, weak identification, which may persist even if when the number of observations is
very large.
In the context of RD designs, it is often not particularly interesting to study fuzzy RD
treatment effects under weak identification, since their policy relevance is already quite lim-
ited. Nevertheless, it is possible to construct inference procedures that are robust to the weak
instrument problem in the usual way. For example, in the context of local randomization
methods, Fisherian inference methods continue to be valid in the sense that randomization-
based tests of the sharp null hypothesis will continue to be valid even if the treatment
assignment is a weak instrument for the treatment received. Furthermore, all standard super-
population approaches based on large sample approximations under weak-IV asymptotics can
be deployed for units with score within W. Similar approaches can be implemented within
the continuity-based framework based on local polynomial methods. The practically appli-
cability of these results is often limited, however, because confidence intervals for the Fuzzy
RD treatment effects often become long and practically uninformative when the treatment
assignment is a weak instrument.
Although different methods have been developed in the IV literature to obtain valid
inferences in the presence of weak instruments and some of those methods can be extended
to the RD context, our practical recommendation is for researchers to avoid Fuzzy RD designs
where the RD treatment assignment has a weak and small effect on the treatment received.
A weak effect implies that that the RD assignment rule failed to induce a large change in the
probability of taking the treatment. When this occurs, any attempt to learn about the effects
of the treatment received on the outcome will be severely limited. The recommendation is
therefore to always investigate the first stage effect (τD or λD ) first, and only proceed with
the analysis of the ratio parameter if the estimated first stage is strong. Analogously to IV
67
3 THE FUZZY RD DESIGN
settings, strength can be measured empirically by the size of the F-statistic in the first-stage
regression. The rule of thumb in the IV literature is to conclude that an instrument is weak
if the F-statistic is less than 10; in the RD context, this threshold is likely to be too low,
and recommendations are a minimum F-statistic of 20 or more. If the first-stage effect is
weak, researchers should report only ITT effects and interpret these parameters carefully in
light of the very weak relationship between treatment assignment and treatment received.
An analysis that reveals a weak first stage effect, however, can be very valuable, as it will
likely offer important lessons about the design of the program and the strategic compliance
decisions of individuals near the cutoff. In fact, in this case, the ITT effect can be a useful
test of the exclusion restriction.
As our discussion has emphasized, the methods of analysis for Fuzzy RD designs largely
resemble those of Sharp RD designs, either exactly (in the case of ITT parameters) or con-
ceptually (in the case of Fuzzy parameters). Nonetheless, the presence of non-compliance
leads to some specific issues that may be important for implementation. The strategies for
validation and falsification in the Fuzzy RD design are largely the same as those discussed in
Foundations (Section 5) and in Section 2 for the Sharp RD design: density test, treatment
effects on covariates and placebo outcomes, artificial cutoffs, and sensitivity to local neigh-
borhood. In order to avoid repetition, we only focus on the modifications that are required
to accommodate imperfect compliance.
For implementation of the density test and estimation of effects on predetermined covari-
ates and placebo outcomes, researchers should focus on intention-to-treat effects. Because
the goal of these falsification analyses is to asses whether the observations just above the
cutoff are similar to the observations just below the cutoff, the effects of interest are those
of Ti (the treatment assignment) on the covariates and placebo outcomes. Similarly, because
the goal of the density test is to asses whether the number of observations above the cutoff is
similar to the number of observations below the cutoff, the relevant density test is one that
compares the number of observations with Ti = 1 and Ti = 0 near the cutoff, not those with
Di = 1 and Di = 0.
Finally, when interest is on treatment effects for subpopulations of units, it is natural to
investigate departures from the two underlying identifying assumptions: exclusion restriction
and instrument relevance. These assumptions can be assessed empirically via both continuity-
based and local randomization methods.
68
3 THE FUZZY RD DESIGN
We illustrate how to analyze a Fuzzy RD design using the SPP application introduced
in Section 3.1. We first implement a continuity-based analysis based on local polynomial
methods. Since we illustrated Sharp RD methods in Foundations, we omit all details except
when they pertain to issues that arise specifically due to non-compliance. We then illustrate
the use of local randomization methods.
We start by investigating the ITT (sharp) RD effects of assigning SPP eligibility. The first
step is to analyze the first stage relationship between the eligibility to receive SPP funding
(T) and the actual receipt of SPP funds (D). Figure 3.2 shows the corresponding RD plot
with default choices, as discussed in Foundations. The figure shows that this application has
one-sided non-compliance: no student whose SISBEN score is below the cutoff receives SPP
funding, but some of the students whose scores are above the cutoff fail to receive funding
despite being eligible. The estimated probability of receiving the treatment of SPP funding
thus jumps from zero to around 0.60 at the cutoff.
1.00
0.75
SPP recipient
0.50
0.25
0.00
−25 0 25 50
Distance to SISBEN cutoff
A formal continuity-based analysis of the first stage can be conducted with local poly-
69
3 THE FUZZY RD DESIGN
nomial methods. Using the command rdrobust, we first choose a MSE-optimal bandwidth
and then fit two linear polynomials of Yi on Xi within this bandwidth, separately for obser-
vations above and below the cutoff. The RD effect is the difference between the estimated
intercepts in both regressions, and we build confidence intervals using robust bias-corrected
inference—for details, see Section 3 in Foundations. Because compliance is one-sided and all
students with SISBEN score below the cutoff fail to receive SPP funding, there is no need to
fit a polynomial with observations below the cutoff: all those observations have Di = 0 and
thus the intercept is zero. It follows that there is no need to select a bandwidth below the
cutoff either; in fact, the bandwidth selector is undefined when the observations are constant.
We use rdrobust to estimate the first-stage relationship between SPP eligibility and the
actual receipt of funding.
R Snippet 3.1
> out <- rdrobust(D, X1)
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.625 0.012 51.592 0.000 [0.601 , 0.649]
Robust - - 43.115 0.000 [0.595 , 0.652]
=============================================================================
The lack of variability below the cutoff leads rdrobust to behave as follows: (i) it prints
70
3 THE FUZZY RD DESIGN
a warning (not shown) stating that there is not enough variablity in the data, (ii) it sets the
bandwidth below the cutoff equal to the smallest value of the score, and (iii) it implements
the bandwidth selector above the cutoff as usual. Although the output shows that 18.511 is
the bandwidth on both sides, this value corresponds to the MSE-optimal bandwidth above
the cutoff. Re-running the command allowing for different bandwidths above and below the
cutoff (option rdbwselect=msetwo) reports the bandwidth below the cutoff to be 43.480,
which is the minimum value of the SPP score in the data (see Table 3.1).
The output shows that the first-stage effect is 0.625, consistent with the jump observed
in Figure 3.2. The effect is highly statistically significant, with a tight 95% robust confidence
interval between 0.595 and 0.652. This is evidence of a very strong effect of eligibility on
receiving SPP funding, showing that approximately 62% of those who are barely eligible
to receive SPP funding do in fact receive it, comparing with 0% of those who are barely
ineligible—an infinite change!
We continue with the analysis of the ITT effect of SPP eligibility on the outcome of
interest, attending an HEI immediately after becoming eligible (Y). The RD plot of this
effect is shown in Figure 3.3, where we present a third-order (p = 3) global polynomial fit to
avoid over-fitting near the boundary, and formal estimation and inference is again conducted
with rdrobust.
71
3 THE FUZZY RD DESIGN
R Snippet 3.2
> out <- rdrobust(Y, X1)
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.269 0.023 11.709 0.000 [0.224 , 0.314]
Robust - - 10.047 0.000 [0.221 , 0.328]
=============================================================================
The plot and the formal analysis both show a large effect at the cutoff: students whose
SISBEN scores are barely above the cutoff and are thus barely eligible to receive SPP funding
are about 27 percentage points more likely to enroll in a HEI, jumping from near 50%
enrollment just below the cutoff to about 77% just above the cutoff.
In order to point-estimate the fuzzy RD parameter, τFRD , we could simply take the ratio
between the two ITT effects estimated above, 0.269/0.625 = 0.4304. If we follow this ap-
proach, each effect is estimated using a different bandwidth, chosen to be MSE-optimal for
each individual effect. The difference in bandwidths is considerable, 18.511 for the denom-
inator versus 9.041 for the numerator, implying that many more observations are used to
estimate the denominator than the numerator. Although this poses no problems theoretically,
applied researchers may prefer to use the same observations to estimate both quantities; this
72
3 THE FUZZY RD DESIGN
1.00
0.50
0.25
0.00
−25 0 25 50
Distance to SISBEN cutoff
73
3 THE FUZZY RD DESIGN
R Snippet 3.3
> out <- rdrobust(Y, X1, fuzzy = D)
> summary(out)
Fuzzy RD estimates using local polynomial regression.
First-stage estimates.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.619 0.017 35.857 0.000 [0.585 , 0.653]
Robust - - 29.885 0.000 [0.575 , 0.656]
=============================================================================
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.434 0.034 12.773 0.000 [0.368 , 0.501]
Robust - - 11.026 0.000 [0.366 , 0.524]
=============================================================================
When the fuzzy option is specified, rdrobust first computes a single optimal band-
width, and then uses this bandwidth to estimate the denominator, the numerator, and the
74
3 THE FUZZY RD DESIGN
ratio. When non-compliance is one-sided as in our SPP example, rdrobust uses the opti-
mal bandwidth for estimation of the Sharp ITT effect τY . When compliance is imperfect on
both sides of the cutoff, it chooses a bandwidth that is optimal for point-estimation of the
linearized ratio τY /τD . In both cases, the result is a single optimal bandwidth that is used
to estimate all effects. An additional advantage of using the fuzzy option is that it reports
robust bias-corrected confidence intervals for the fuzzy RD effect τY /τD .
The estimated fuzzy RD effect using the fuzzy option is 0.434, very similar to the ratio
of 0.430 obtained above using different bandwidths (the difference occurs because the first-
stage point estimate changes from 0.625 within the 18.511 bandwidth to 0.619 within the
9.041 bandwidth). As discussed before, the interpretation of this parameter depends on
the assumptions invoked. Under appropriate continuity and monotonicity conditions, it is
showing that receiving SPP funding resulted in an increase at the cutoff of roughly 43
percentage points in the probability of enrolling in a HEI for the subset of students who are
compliers.
We now present the analysis based on local randomization methods. The first step is
to select the local randomization window W using the four covariates presented above:
icfes female, icfes age), icfes urm, icfes stratum, icfes privatehs, and icfes famsize.
75
3 THE FUZZY RD DESIGN
R Snippet 3.4
> Z <- cbind(data$icfes_female, data$icfes_age, data$icfes_urm,
+ data$icfes_stratum, data$icfes_famsize)
> colnames(Z) <- c("icfes_female", "icfes_age", "icfes_urm", "icfes_stratum",
+ "icfes_famsize")
> out <- rdwinselect(X1, Z)
Mass points detected in running variable
You may use wmasspoints option for constructing windows at each mass point
================================================================================
Window p-value Var. name Bin.test Obs<c Obs>=c
================================================================================
-0.0400 0.0400 0.213 icfes_stratum 0.720 14 17
-0.0700 0.0700 0.442 icfes_urm 0.471 21 27
-0.0700 0.0700 0.364 icfes_stratum 0.526 28 34
-0.1000 0.1000 0.322 icfes_stratum 1.000 44 43
-0.1200 0.1200 0.354 icfes_stratum 1.000 52 51
-0.1300 0.1300 0.204 icfes_female 0.582 63 56
-0.1400 0.1400 0.111 icfes_female 0.386 72 61
-0.1800 0.1800 0.611 icfes_famsize 0.630 81 74
-0.2100 0.2100 0.369 icfes_female 0.554 96 87
-0.2400 0.2400 0.275 icfes_female 0.259 109 92
================================================================================
Recommended window is [-0.13;0.13] with 119 observations (63 below, 56 above).
76
3 THE FUZZY RD DESIGN
The minimum p-value is above 0.200 in the first six windows, dropping to 0.111 in the
seventh window. The chosen window is therefore [−0.13, 0.13], which has a total of 63 control
and 56 treated observations. Once the window is selected, we use rdrandinf to estimate the
first-stage parameter, to assess whether the RD eligibility rule (T) did in fact have the effect
of changing the probability of receiving SPP funding (D) for students with scores within
this window. We perform this analysis selecting (D) as the outcome of interest in our call to
rdrandinf.
77
3 THE FUZZY RD DESIGN
R Snippet 3.5
> out <- rdrandinf(D, X1, wl = -0.13000107, wr = 0.13000107)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 0.000
================================================================================
Diff. in means 0.571 0.000 0.000 0.050
================================================================================
Consistent with the continuity-based analysis, the output shows a very strong first stage:
within the window, 57.1% of students above the cutoff received SPP, and no students below
the cutoff received it (from Mean of outcome row). This leads to a difference-in-means of
57.1%, as shown in the main output row, an extremely large effect that is statistically different
from zero according to both Fisherian and large sample tests. In sum, using either continuity-
based or local randomization methods, the evidence is clear that SPP eligibility induces a
78
3 THE FUZZY RD DESIGN
large take-up of the program near the cutoff. This local randomization first-stage point
estimate is very similar to the value of 0.625 that we estimated above using continuity-based
methods. This similarity shows that the first-stage effect in this application is remarkably
robust, as the conclusions from the empirical analysis are similar whether we use continuity-
based methods with approximately 8, 000 observations in a bandwidth of ±18.5 or local
randomization methods with just 130 observations in a ±0.13 window.
We continue by considering the ITT effect of being eligible to receive SPP (Ti ) on our out-
come of interest, HEI enrollment. We estimate it inside the chosen window using rdrandinf,
this time using Y as the outcome of interest.
R Snippet 3.6
> out <- rdrandinf(Y, X1, wl = -0.13000107, wr = 0.13000107)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 0.252
================================================================================
Diff. in means 0.171 0.064 0.056 0.804
================================================================================
79
3 THE FUZZY RD DESIGN
The results estimate the effect of being barely eligible to receive SPP on HEI enollment
within [−0.13, 0.13], the ITT parameter λIT . As shown in the output, this effect is estimated
to be 0.171, with a Fisherian p-value of 0.064 and a large sample p-value of 0.056. This
estimated effect is lower than the 0.269 effect estimated with continuity-based methods,
and the p-values are larger. The increase in p-values is expected, as the number of effective
observations decreases from over 8, 000 to just 130. The decrease in the point estimate is
considerable. Nevertheless, the overall conclusion is the same: becoming just eligible to receive
SPP funding increases enrollment in HEI.
In order to study the ratio parameter λFRD , we call rdrandinf using the option fuzzy =
c(D, "tsls"), where the first argument (D) is the indicator for treatment received and the
second argument requests the two-stage least-squares (TSLS) statistic—the estimate of the
ratio λY /λD .
80
3 THE FUZZY RD DESIGN
R Snippet 3.7
> out <- rdrandinf(Y, X1, wl = -0.13000107, wr = 0.13000107, fuzzy = c(D,
+ "tsls"))
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 0.252
================================================================================
TSLS 0.299 NA 0.038 0.416
================================================================================
When the tsls option is chosen, available inference results are only based on large sample
approximations, and the finite sample p-value is not returned. The column labeled T reports
the test statistic, which is 0.299. This number is the ratio between the two ITT effects
reported above: the effect of SPP eligibility on HEI enrollment (λY ), 0.171, over the effect of
SPP eligibility on SPP funding (λD ), 0.571. The effect is estimated to be 0.299 with p-value
0.038. This point estimate is smaller than the continuity-based estimate of 0.434, a difference
mostly due to the smaller λY estimate (0.171 versus 0.269).
81
3 THE FUZZY RD DESIGN
c = 0 Left of c Right of c
Number of obs 7709 15423
Eff. Number of obs 2729 3703
Order est. (p) 2 2
Order bias (q) 3 3
BW est. (h) 6.208 8.622
82
3 THE FUZZY RD DESIGN
The results indicate that, in both frameworks, the null hypothesis fails to be rejected
(with p-values of 0.4243 and 0.0.5825) and there is therefore no evidence of ‘sorting’ around
the cutoff based on this measure.
We then formally estimate ITT effects on pre-determined covariates in the continuity-
based and local randomization frameworks using, respectively, the commands rdrobust and
rdrandinf. For example, for the covariate icfes female, we estimate
R Snippet 3.9
> out <- rdrobust(data$icfes_female, X1, bwselect = "cerrd")
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional -0.020 0.034 -0.608 0.543 [-0.086 , 0.045]
Robust - - -0.648 0.517 [-0.092 , 0.046]
=============================================================================
83
3 THE FUZZY RD DESIGN
R Snippet 3.10
> out <- rdrandinf(data$icfes_female, X1, wl = -0.13000107, wr = 0.13000107)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 0.252
================================================================================
Diff. in means -0.131 0.190 0.152 0.786
================================================================================
with local randomization methods, where we omit the outputs to conserve space (details
are provided in Foundations).
The ITT falsification effects for all the pre-determined covariates are reported in Table
3.2 using continuity-based methods and Table 3.3 using local randomization methods. The
goal of these falsification analysis is inference, not point estimation, because we know that
the effect of the treatment assignment on any pre-determined covariate is zero. For this
84
3 THE FUZZY RD DESIGN
reason, we implement the local polynomial analysis with a bandwidth that optimizes the
coverage-error (CER) of the confidence intervals.
Both frameworks lead to the same conclusion: there is no evidence that the treatment
assignment is correlate with the covariates at or near the cutoff. In other words, students
who are barely eligible to receive SPP funding are similar the students who are barely
ineligible in terms of age, sex, minority status, household stratum, and family size. This kind
of evidence suggests that the continuity and local randomization assumptions are plausible
in this application.
Identification in Fuzzy RD designs was first discussed in Hahn, Todd, and van der Klaauw
(2001), and later in a sequence of papers including Dong (2018), Cattaneo, Keele, Titiunik,
and Vazquez-Bare (2016) and Arai, Hsu, Kitagawa, Mourifié, and Wan (2022). Estimation
and inference methods are discussed in Calonico, Cattaneo, and Titiunik (2014b), Calonico,
Cattaneo, Farrell, and Titiunik (2019) and Calonico, Cattaneo, and Farrell (2020) within the
continuity-based framework, and in Cattaneo, Frandsen, and Titiunik (2015) and Cattaneo,
Titiunik, and Vazquez-Bare (2017) within the local randomization framework. Weak instru-
ments issues are discussed in Feir, Lemieux, and Marmer (2016). See Cattaneo and Titiunik
(2022) for details and more references.
85
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
The canonical continuity-based RD design assumes that the score that determines treatment
assignment is a continuous random variable. A random variable is continuous when it can
take an uncountable number of values with positive probability. For example, a share such
as a party’s proportion of the vote is continuous because it can take any value in the [0, 1]
interval. In practical terms, when the score is continuous, all the observations in the dataset
have distinct score values—i.e., there are no ties. In contrast, a discrete score such as date
of birth can only take a finite number of values; as a result, a random sample of a discrete
running variable will exhibit “mass points”—that is, many observations share the same value
of the score.
When the RD score is not a continuous random variable, the continuity-based local poly-
nomial methods we discussed in Foundations are not directly applicable. This is practically
important because many RD applications have a discrete score. The key issue when deciding
how to analyze a RD design with a discrete score is the number of distinct mass points. Lo-
cal polynomial methods will behave essentially as if each mass point is a single observation;
therefore, if the score is discrete but the number of mass points is sufficiently large (and
ideally close to the cutoff), then using local polynomial methods may still be appropriate
under reasonable assumptions. In contrast, if the number of mass points is small (and sparse,
away from the cutoff), then local polynomial methods will not be directly applicable in the
absence of more restrictive assumptions. In this case, analyzing the RD design using the
local randomization approach is a natural alternative. When the score is discrete, the local
randomization approach has the advantage that the window selection procedure is often no
longer needed, as the smallest window is well defined and typically has enough observations.
We devote the rest of this section to discuss an empirical RD example with a discrete
running variable in order to illustrate how identification, estimation, and inference can be
modified when the dataset contains multiple observations with the same value of the RD
score.
We re-analyze the study by Lindo, Sanders, and Oreopoulos (2010), who used an RD design
to investigate the impact of placing students on academic probation on their future academic
performance. Our choice of an education example is intentional. The origins of the RD
86
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
design can be traced to the education literature, and RD methods continue to be used
extensively in education because interventions such as scholarships or remedial programs
are often assigned on the basis of a test score and a fixed approving threshold. Moreover,
despite being continuous in principle, it is common for test scores and grades to be discrete
in practice.
This application analyzes a policy at a Canadian university that places students on
academic probation when their grade point average (GPA) falls below a certain threshold.
The treatment of placing a student on academic probation involves setting a standard for
the student’s future academic performance: a student who is placed on probation in a given
term must improve her GPA in the next term according to campus-specific standards, or
face suspension. Thus, in this RD design, the unit of analysis is the student, the score is the
student’s GPA, the treatment of interest is placing the student on probation, and the cutoff
is the GPA value that triggers probation placement. Students come from three different
campuses. In campuses 1 and 2, the cutoff is 1.5; in campus 3 the cutoff is 1.6. In their
original analysis, Lindo, Sanders, and Oreopoulos (2010) normalized the score, centering
each student’s GPA at the appropriate cutoff, and pooling the observations from the three
campuses in a single dataset. (This approach is standard in Multi-Cutoff RD design settings,
as we discuss in Section 5.) The resulting running variable is therefore the difference between
the student’s GPA and the cutoff; this variable ranges from −1.6 to 2.8, with negative values
indicating that the student was placed on probation, and positive values indicating that the
student was not placed on probation.
Table 4.1 contains basic descriptive statistics for the score, treatment, outcome and pre-
determined covariates that we use in our re-analysis. There are 40, 582 student-level obser-
vations coming from the 1996–2005 period. The outcome we analyze is the GPA obtained
by the student in the term immediately after he was placed on probation (Next Term GPA).
Naturally, this variable is only observed for students who decide to continue at the univer-
sity; thus, the effects of probation on this outcome must be interpreted with caution, as the
decision to leave the university may itself be affected by the treatment. (The original study
contains an analysis of the effects of the treatment on dropout rates; we omit this outcome to
simplify our illustration.) We also investigate some predetermined covariates: the percentile
of the student’s average GPA in standard classes taken in high school (hsgrade pct), the
total number of credits for which the student enrolled in the first year (totcredits year1),
the student’s age at entry (age), an indicator for whether the student is male (male), and
an indicator for whether the student was born in North America (bpl north america).
87
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
The crucial issue in the practical analysis of RD designs with discrete scores is the number
of mass points (i.e., unique values) that actually occur in the dataset. When this number is
large, it may be possible to apply continuity-based methods for RD analysis, after changing
the interpretation of the treatment effect of interest and imposing additional assumptions
enabling extrapolation. In contrast, when the number of unique score values is either mod-
erately small or very small, a local randomization approach may be more appropriate. With
few mass points, local or global polynomial fitting will be useful only as an exploratory strat-
egy but not as a formal method of analysis, unless the researcher is willing to impose strong
parametric assumptions. Therefore, the first step in the analysis of an RD design with a
discrete running variable is to analyze the empirical distribution of the score and determine
(i) the total number of observations, (ii) the total number of mass points, and (iii) the total
number of observations per mass point. We illustrate this step with the academic probation
application.
Since only students who have GPA below a threshold are placed on probation, the treat-
ment is administered to students whose GPA is to the left of the cutoff. It is customary
to define the RD treatment indicator as equal to one for units whose score is above (i.e.,
to the right of) the cutoff. To conform to this convention, we multiply the original running
variable—the distance between GPA and the campus cutoff—by −1, so that students placed
on probation are now above the cutoff, and students not placed on probation are now below
the cutoff. For example, a student who has Xi = −0.2 in the original score is placed on
probation because her GPA is 0.2 units below the threshold. The value of the transformed
running variable for this treated student is X̃i = 0.2. Moreover, since we define the treatment
as 1(X̃i ≥ 0), this student will be now be placed above the cutoff. The only caveat is that
we must shift slightly those students whose original GPA is exactly equal to the cutoff (and
thus are not placed on probation), since for these students the original normalized running
88
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
variable is exactly zero and thus multiplying by −1 does not alter their score. In the scale of
the transformed variable, we need these students to be below zero to continue to assign them
to the control condition (i.e., the non-probation condition). We manually change the score
of students who are exactly at zero to Xi = −0.000005 so that the rule 1(X̃i ≥ 0) correctly
identifies treated and control students. A histogram of the transformed running variable is
shown in Figure 4.1a.
2000
1500
1000
2
500
1
0 0
We first check how many total observations we have in the dataset, that is, how many
observations have a non-missing value of the score.
R Snippet 4.1
> length(X[!is.na(X)])
[1] 40582
The total sample size in this application is large: 40, 582 observations. However, because
the running variable is discrete, the crucial step is to calculate how many mass points we
have.
R Snippet 4.2
> length(unique(X))
[1] 429
89
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
The 40, 582 total observations in the dataset take only 429 distinct values. This means
that, on average, there are roughly 95 observations per value. To have a better idea of the
density of observations near the cutoff, Table 4.2 shows the number of observations for the
six mass points closest to the cutoff; this table also illustrates how the score is transformed.
Since the original score ranges between −1.6 and 2.8, our transformed score ranges from
−2.8 to 1.6. Both the original and the transformed running variables are discrete, because
the GPA increases in increments of 0.01 units and there are many students with the same
GPA value. For example, there are 72 students who are 0.02 GPA units above the cutoff.
Of these 72 students, 41 + 5 = 46 have a GPA of 1.52 (because the cutoff in Campuses 1
and 2 is 1.5), and 26 students have a GPA of 1.62 (because the cutoff in Campus 3 is 1.6).
The same phenomenon of multiple observations with the same value of the score occurs at
all other values of the score; for example, there are 208 students who have a value of zero in
the original score (and −0.000005 in our transformed score).
When the number of mass points in the discrete score is sufficiently large, we can use the
continuity-based approach to RD analysis that we discussed extensively in Foundations. The
academic probation application illustrates a case in which a continuity-based analysis might
90
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
be possible, since the total number of mass points is 429, a moderate value. Because there
are mass points, extrapolation between these points is unavoidable; however, in practical
terms, this is no different from analyzing a dataset from any continuous score RD design
with a sample of size 429.
We start with a falsification analysis, doing a continuity-based density test and a continuity-
based analysis of the effect of the treatment on predetermined covariates. First, we use
rddensity to test whether the density of the score is continuous at the cutoff.
R Snippet 4.3
> out <- rddensity(X, bino = FALSE)
> summary(out)
c = 0 Left of c Right of c
Number of obs 34854 5728
Eff. Number of obs 5249 3503
Order est. (p) 2 2
Order bias (q) 3 3
BW est. (h) 0.432 0.532
The p-value is 0.082, and we fail to reject the hypothesis that the density of the score
changes discontinuously at the cutoff point at the conventional 5% level. However, the p-
value is below 10%, suggesting a possible density imbalance. This is consistent with the local
randomization binomial density test that we discuss below, and with the jump from 208 to
67 observations in the mass points closes to the cutoff shown in Table 4.2.
Next, we employ local polynomial methods to perform falsification analyses on several
predetermined covariates. We use rdrobust with the default polynomial of order one. Be-
91
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
cause the focus is on inference and not on point estimation, we select the bandwidth to be
coverage-error (CER) optimal. (For further discussion, see Section 5 in Foundations.)
For example, we estimate the RD effect of probation on hsgrade pct, the measure of
high school performance.
R Snippet 4.4
> out <- rdrobust(data$hsgrade_pct, X, bwselect = "cerrd")
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 1.428 1.336 1.069 0.285 [-1.190 , 4.045]
Robust - - 1.076 0.282 [-1.248 , 4.288]
=============================================================================
We also explore the RD effect graphically using rdplot, with the plot shown in Figure
4.2(a).
R Snippet 4.5
> rdplot(data$hsgrade_pct, X, x.label = "Score", y.label = "",
+ title = "")
92
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
Both the formal analysis and the graphical analysis indicate that, according to this
continuity-based local polynomial analysis, the students right above and below the cutoff
are similar in terms of their high school performance.
We repeat this analysis for the five predetermined covariates in Table 4.1. Table 4.3
presents a summary of the results, and Figure 4.2 shows the associated default RD plots.
The results indicate that the probation treatment has no effect on the predetermined
covariates, with the exception of on the effect on totcredits year1, which has an associated
p-value of 0.001, rejecting the hypothesis of no effect at standard levels. The point estimate of
the effect on this covariate (estimated with MSE-optimal bandwidth, not shown) is relatively
small: treated students take an additional 0.08 credits in the first year, but the average value
of totcredits year1 in the overall sample is 4.43, with a standard deviation of roughly 0.5.
Next, we analyze the effect of being placed on probation on the outcome of interest,
nextGPA, the GPA in the following academic term. We first use rdplot to visualize the
effect with default options in Figure 4.3.
93
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
R Snippet 4.6
> out <- rdplot(nextGPA, X, binselect = "esmv")
[1] "Mass points detected in the running variable."
> summary(out)
Call: rdplot
IMSE-optimal bins 42 14
Mimicking Variance bins 624 391
Relative to IMSE-optimal:
Implied scale 14.857 27.929
WIMSE variance weight 0.000 0.000
WIMSE bias weight 1.000 1.000
94
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
3
Outcome
−3 −2 −1 0 1
Score
The RD plot suggests a negative relationship between the running variable and the out-
come: students who have a low GPA in the current term (and thus have a higher value of the
running variable) tend to also have a low GPA in the following term. The plot also shows that
students with scores just above the cutoff (who are just placed on probation) tend to have
a higher GPA in the following term relative to students who are just below the cutoff and
just avoided probation. These results are confirmed when we use a local linear polynomial
and robust bias-corrected inference to provide a formal statistical analysis of the RD effect.
95
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
R Snippet 4.7
> out <- rdrobust(nextGPA, X, kernel = "triangular", p = 1, bwselect = "mserd")
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.224 0.038 5.852 0.000 [0.149 , 0.299]
Robust - - 4.726 0.000 [0.126 , 0.304]
=============================================================================
As shown, students who are just placed on probation improve their GPA in the following
term by approximately 0.224 additional points, relative to students who just miss probation.
The robust p-value is less than 0.00005, and the robust 95% confidence interval ranges from
0.126 to 0.304. Thus, the evidence indicates that, conditional on not leaving the university,
being placed on academic probation translates into an increase in future GPA. The point
estimate of 0.224—obtained with rdrobust within a MSE-optimal bandwidth of 0.470—is
very similar to the effect of 0.23 grade points found in the original study (which employed
an ad-hoc bandwidth of 0.6).
To better understand this treatment effect, we may be interested in knowing the point
estimate for the controls and treated students separately. To see this information, we explore
the information returned by rdrobust.
96
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
R Snippet 4.8
> rdout <- rdrobust(nextGPA, X, kernel = "triangular", p = 1, bwselect = "mserd")
> print(names(rdout))
[1] "Estimate" "bws" "coef" "se" "z" "pv"
[7] "ci" "beta_Y_p_l" "beta_Y_p_r" "V_cl_l" "V_cl_r" "V_rb_l"
[13] "V_rb_r" "N" "N_h" "N_b" "M" "tau_cl"
[19] "tau_bc" "c" "p" "q" "bias" "kernel"
[25] "all" "vce" "bwselect" "level" "masspoints" "rdmodel"
[31] "beta_covs" "call"
> print(rdout$beta_Y_p_r)
[1] 2.0681763 -0.6804732
> print(rdout$beta_Y_p_l)
[1] 1.8444877 -0.6853278
This output shows the estimated intercept and slope from the two local regressions esti-
mated separately to the right (beta Y p r) and left (beta Y p l) of the cutoff. At the cutoff,
the average GPA in the following term for control students who just avoid probation is
1.8444877, while the average future GPA for treated students who are just placed on pro-
bation is 2.0681763. The increase is the estimated RD effect reported above, 2.0681763 −
1.8444877 = 0.2236886. This represents approximately a 11% GPA increase relative to the
control group.
In some applications, it may be desiriable to cluster the standard errors by every value
of the score. We implement this using the cluster option in rdrobust.
97
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
R Snippet 4.9
> clustervar <- X
> out <- rdrobust(nextGPA, X, kernel = "triangular", p = 1, bwselect = "mserd",
+ vce = "hc0", cluster = clustervar)
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.221 0.032 6.991 0.000 [0.159 , 0.283]
Robust - - 5.768 0.000 [0.140 , 0.284]
=============================================================================
The conclusions remain essentially unaltered, as the 95% robust confidence interval
changes only slightly from [0.126, 0.304] to [0.140, 0.284]. The point estimate moves slightly
from 0.224 to 0.221 because the MSE-optimal bandwidth with clustering shrinks to 0.428
from 0.470, and the bias bandwidth also decreases.
As we have shown, provided that the number of mass points in the score is reasonably
large, it is possible to analyze an RD design with a discrete score using the tools from the
98
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
99
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
R Snippet 4.10
> data2 <- data.frame(nextGPA, X)
> dim(data2)
[1] 40582 2
> collapsed <- aggregate(nextGPA ~ X, data = data2, mean)
> dim(collapsed)
[1] 429 2
> out <- rdrobust(collapsed$nextGPA, collapsed$X)
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.246 0.032 7.650 0.000 [0.183 , 0.308]
Robust - - 6.278 0.000 [0.166 , 0.316]
=============================================================================
The estimated effect is 0.246, with robust p-value less than 0.00005. This is similar to
the 0.224 point estimate obtained with the raw dataset. The two estimates and inference
procedures are very similar, even though the former was calculated using 429 observations,
while the latter was calculated using 40, 582 observations. Indeed, the inference conclusions
from both analysis are consistent, as the robust 95% confidence interval using the raw data
is [0.126, 0.304], while the robust confidence interval for the collapsed data is [0.166, 0.316],
100
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
both indicating that the values of the effect that are not rejected are in roughly the same
positive range.
This analysis shows that the seemingly large number of observations in the raw dataset
is effectively much smaller, and that the behavior of the continuity-based results is governed
by the average behavior of the data at every mass point. Thus, a natural point of departure
for researchers who wish to study a RD design with a discrete score and many mass points
is to collapse the data and estimate the effects on the aggregate results. As a second step,
these aggregate results can be compared to the results using the raw data—in most cases,
both sets of results should lead to the same conclusions.
While the mechanics of local polynomial fitting using a discrete running variable are
clear, the actual relevance and interpretation of the treatment effect may change. As we
discuss below, researchers may want to change the parameter of interest when the score
is discrete; if they do not, then parametric extrapolation will be unavoidable to achieve
point identification. Because the score is discrete, it is not possible to nonparametrically
point identify the continuity-based RD treatment effect at the cutoff, τSRD = E[Yi (1)|Xi =
c]−E[Yi (0)|Xi = c], because the lack of denseness of Xi near the cutoff makes it impossible to
appeal to limit arguments and large sample approximations. Put differently, if the researcher
insists on retaining the same parameter of interest as in the canonical RD design, then
extrapolation via additional parametric assumptions from the closest mass points above and
below the cutoff to the cutoff point will be needed, no matter how large the sample size is.
Since parametric extrapolation is unavoidable when the running variable is discrete and
the parameter τSRD is still of interest, a simple local linear extrapolation towards the cut-
off may be a reasonable strategy. This extrapolation approach will always operate in the
background when continuity-based methods are used to analyze a RD design with discrete
score. However, if the number of mass points is very small, bandwidth selection methods will
not be appropriate; in this case, the researcher may conduct linear parametric extrapolation
globally, fitting the polynomial using all the observations (i.e., employing the few unique
values of the score). This runs counter to the local nature of the RD parameter, but it is
essentially the only possibility for implementation if the goal is to estimate the canonical
continuity-based RD parameter and the number of mass points is small.
A natural alternative for the analysis of an RD design with a discrete running variable is to
use the local randomization approach discussed in Section 2, which effectively changes the
101
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
parameter of interest from the RD treatment effect a the cutoff (τSRD ) to the RD treatment
effect in the neighborhood W around the cutoff where local randomization is assumed to
hold (θSRD ). A key advantage of this alternative conceptual framework is that, unlike the
continuity-based approach, it can be used even when there are very few mass points in the
running variable; indeed, it can be used with as few as two mass points (one on each side of
the cutoff).
To illustrate the change in the RD parameter of interest, we consider a hypothetical
example where the score takes five values, Xi ∈ {−2, −1, 0, 1, 3}, the RD cutoff is c = 0,
and the treatment assignment is therefore Ti = 1(Xi ≥ 0). In this case, the continuity-
based RD treatment effect at the cutoff is τSRD = E[Yi (1)|Xi = 0] − E[Yi (0)|Xi = 0], which
is not identifiable nonparametrically because the score can never get close enough to 0 for
untreated observations (the closest value an untreated observation can have is −1). However,
if the local randomization assumptions hold in the window W = [−1, 0], we can define
the local randomization parameter θSRD = E[Yi (1)|Xi = 0] − E[Yi (0)|Xi = −1], which is
nonparametrically identifiable under the conditions discussed in Section 2. Going from θSRD
to τSRD requires extrapolating from E[Yi (0)|Xi = −1] to E[Yi (0)|Xi = 0], which is impossible
without additional parametric assumptions on the conditional expectation E[Yi (0)|Xi = x]
because of the intrinsic discreteness of the running variable. In some specific applications,
additional features may allow researchers to extrapolate (e.g., rounding or heaping), but in
general extrapolation will require additional restrictions on the data generating process.
More generally, if Xi ∈ {xK− , . . . , x−2 , x−1 , c, x1 , x2 , . . . , xK+ }, a natural local randomiza-
tion RD treatment effect parameter is θSRD = E[Yi (1)|Xi = c] − E[Yi (0)|Xi = x−1 ], that
is, the average treated potential outcome for observations with scores equal to the smallest
value that leads to treatment and the average control potential outcome for observations
with scores equal to the largest value that leads to control assignment. In applications with
a large number of observations per cutoff, there will be enough observations with Xi = c
and Xi = x−1 so that Ȳc and Ȳ−1 can be used as consistent estimators of E[Yi (1)|Xi = c] and
E[Yi (1)|Xi = x−1 ], respectively. In this case, window selection is not necessary, because the
smallest possible window is W = [x−1 , c], and the number of observations permits estimation
of the effect in this window. Because this is the window where extrapolation is smallest, if
the effect can be estimated in this window, it is not necessary to consider other windows.
However, in some applications the smallest window will contain too few observations and
estimation of θSRD in this window will not be possible. In such cases, researchers can use the
covariate-based window selection procedure discussed in Section 2 to select a larger window
with enough observations where pre-treatment covariates are still balanced.
102
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
In some applications, little will be lost by studying the parameter θSRD = E[Yi (1)|Xi = c]−
E[Yi (0)|Xi = x−1 ]. For example, if the running variable is date of birth measured in days and
individuals become eligible to vote when they turn 18 years old, E[Yi (0)|Xi = x−1 ] represents
the average control outcome the day before turning 18. Since age is measured in days for most
social science purposes, we do not expect that the 23 hours and 59 minutes of additional age
will significantly affect the average potential outcomes, and thus we expect E[Yi (0)|Xi = x−1 ]
and E[Yi (0)|Xi = c] to be largely similar. In contrast, in other applications the extrapolation
may be significant and have stronger conceptual consequences. For example, if the policy is
receiving social security benefits at age 65, the running variable is measured in years, and
the outcome is overall health, the difference between E[Yi (0)|Xi = x−1 ] and E[Yi (0)|Xi = c]
may be considerable if one extra year of age at 64 is enough to affect overall average health.
When the score is discrete, using the local randomization approach for inference does
not require choosing a window in most applications. In other words, with a discrete running
variable the researcher knows the exact location of the minimum window around the cutoff:
this window is the interval of the running variable that contains the two mass points, one
on each side of the cutoff, that are immediately consecutive to the cutoff value. Crucially, if
local randomization holds, then it must hold for the smallest window in the absence of design
failures such as manipulation of the running variable. To illustrate, as shown in Table 4.2,
in the academic probation application the original score has a mass point at zero where all
observations are control (because they reach the minimum GPA required to avoid probation),
and the mass point immediately below it occurs at −0.01, where all students are placed on
probation because they fall short of the threshold to avoid probation. Thus, the smallest
window around the cutoff in the scale of the original score is W = [0.00, −0.01]. In the scale
of the transformed score, the minimum window is W = [−0.000005, 0.01].
Regardless of the scale used, the important point is that the minimum window around
the cutoff in a local randomization analysis of an RD design with a discrete score is precisely
the interval between the two consecutive mass points where the treatment status changes
from zero to one. The particular values taken by the score are irrelevant, as the analysis
will proceed to assume that the treated and control groups were assigned to treatment as-if
randomly, and will typically make the exclusion restriction assumption that the particular
value of the score has no direct impact on the outcome of interest. Moreover, the location
of the cutoff is no longer meaningful, as any value of the cutoff between the minimum value
of the score on the treated side and the maximum value of the score in the control side will
produce identical treatment and control groups.
Once the researcher finds the treated and control observations located at the two mass
103
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
points around the cutoff, the local randomization analysis can proceed as explained in Section
2. We first conduct a falsification analysis, to determine whether the assumption of local
randomization in the window [−0.00005, 0.1] seems consistent with the empirical evidence.
We conduct a density test using the rdwinselect function using the option nwindows=1 to
see only results for this window, to test whether the density of observations in this window is
consistent with the density that would have been observed in a series of unbiased coin flips.
R Snippet 4.11
> out <- rdwinselect(X, wmin = 0.01, nwindows = 1, cutoff = 5e-06)
Mass points detected in running variable
You may use wmasspoints option for constructing windows at each mass point
================================================================================
Window p-value Var. name Bin.test Obs<c Obs>=c
================================================================================
-0.0100 0.0100 NA NA 0.000 208 67
================================================================================
Note: no covariates specified.
As shown in the rdwinselect output and also showed previously in Table 4.2, there are
208 control observations immediately below the cutoff, and 67 above the cutoff. In other
words, there are 208 students who get exactly the minimum GPA needed to avoid probation,
104
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
and 67 students who get the maximum possible GPA that still places them on probation. The
number of control observations is roughly three times higher than the number of treated ob-
servations, a ratio that is inconsistent with the assumption that the probability of treatment
assignment in this window was 1/2—the p-value of the Binomial test reported in column
Bin. test is indistinguishable from zero.
We can also obtain this result by using the Binomial test commands directly.
R Snippet 4.12
> binom.test(67, 275, 1/2)
Although these results alone do not imply that the local randomization RD assumptions
are violated, the fact that there are many more control than treated students is consistent
with what one would expect if students were actively avoiding an undesirable outcome. The
results raise some concern that students may have been aware of the probation cutoff, and
may have tried to appeal their final GPA in order to avoid being placed on probation.
Strictly speaking, an imbalanced number of observations would not pose any problems if
the types of students in the treated and control groups were on average similar. To establish
whether treated and control students at the cutoff are similar in terms of observable char-
acteristics, we use rdrandinf to estimate the RD effect of probation on the predetermined
covariates introduced above.
We report the full results for the covariate hsgrade pct.
105
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
R Snippet 4.13
> out <- rdrandinf(data$hsgrade_pct, X, wl = -0.005, wr = 0.01,
+ seed = 50)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 10.976
================================================================================
Diff. in means 4.009 0.167 0.197 0.942
================================================================================
We repeat this analysis for the predetermined covariates in Table 4.1 , but do not present
the individual runs to conserve space. A summary of the results is reported in Table 4.4.
As shown, treated and control students seem indistinguishable in terms of prior high school
grade, total number of credits, age, sex, and place of birth. The minimum p-value is 0.138,
which is slightly smaller than our recommended value of 0.15, but still considerably larger
than conventional levels.
106
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
In order to compare the smallest window around the cutoff including only two mass points
to slightly larger windows, we employ the window selector discussed in 2. This selector
considers a sequence of nested windows, starting with the smallest, and in each window
conducts balance tests for each covariate specified. We use the command rdwinselect with
the default randomization inference method for the difference in means test statistic:
107
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
R Snippet 4.14
> Z <- cbind(data$hsgrade_pct, data$totcredits_year1, data$age_at_entry,
+ data$male, data$bpl_north_america)
> colnames(Z) <- c("hsgrade_pct", "totcredits_year1", "age_at_entry",
+ "male", "bpl_north_america")
> out <- rdwinselect(X, Z, seed = 50, wmin = 0.01, wstep = 0.01,
+ cutoff = 5e-06, level = 0.135)
Mass points detected in running variable
You may use wmasspoints option for constructing windows at each mass point
================================================================================
Window p-value Var. name Bin.test Obs<c Obs>=c
================================================================================
-0.0100 0.0100 0.138 totcredits_year 0.000 208 67
-0.0200 0.0200 0.000 totcredits_year 0.000 273 189
-0.0300 0.0300 0.010 totcredits_year 0.000 345 236
-0.0400 0.0400 0.000 totcredits_year 0.000 452 326
-0.0500 0.0500 0.077 totcredits_year 0.000 587 365
-0.0600 0.0600 0.033 totcredits_year 0.000 656 430
-0.0700 0.0700 0.240 male 0.000 740 583
-0.0800 0.0800 0.280 bpl_north_ameri 0.000 807 638
-0.0900 0.0900 0.177 totcredits_year 0.000 964 719
-0.1000 0.1000 0.075 totcredits_year 0.000 1038 854
================================================================================
Recommended window is [-0.01;0.01] with 275 observations (208 below, 67 above).
108
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
The empirical results show that the minimum p-value in the smallest window is 0.138, as
we had seen in Table 4.4. The results also show that as soon as we consider the next largest
window, the minimum p-value drops to less than 0.00005, suggesting the treated and control
students are not comparable in larger windows around the cutoff. Given this, we only report
the outcome analysis in the smallest window.
We investigate the local randomization RD treatment effect on the main outcome of
interest using rdrandinf.
109
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
R Snippet 4.15
> out <- rdrandinf(nextGPA, X, wl = -0.005, wr = 0.01, seed = 50)
================================================================================
Finite sample Large sample
------------------ -----------------------------
Statistic T P>|T| P>|T| Power vs d = 0.434
================================================================================
Diff. in means 0.234 0.057 0.051 0.952
================================================================================
The difference-in-means between the 208 control students and the 67 treated students
in the smallest window around the cutoff is 0.234 grade points, remarkably similar to the
continuity-based local polynomial RD effects of 0.224 and 0.246 that we found using the
raw and aggregated data, respectively. Moreover, we can reject the null hypothesis of no
effect at 6% level using both the Fisherian and the large-sample inference approaches. This
shows that the results for next term GPA are robust: we found similar results using the
110
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
208 + 67 = 275 observations closest to the cutoff in a local randomization analysis, the total
40, 582 observations using a continuity-based analysis, and the 429 collapsed observations in
a continuity-based analysis.
Lee and Card (2008) discuss alternative local polynomial methods in the continuity-based
RD framework when the running variable is discrete. Dong (2015) and Barreca, Lindo, and
Waddell (2016) discuss issues of rounding and heaping in the running variable. Cattaneo,
Frandsen, and Titiunik (2015, Section 6.2) discuss explicitly the connections between discrete
scores and the local randomization approach; see also Cattaneo, Titiunik, and Vazquez-Bare
(2017). Cattaneo and Titiunik (2022) review other methods and extensions.
111
4 RD DESIGNS WITH DISCRETE RUNNING VARIABLES
5.5
100
5.0
75
4.5
50
4.0
25 3.5
3.0
0
−3 −2 −1 0 1 −3 −2 −1 0 1
Score Score
(a) High School Grade Percentile (b) Total Credits in First Year
21 1.00
0.75
20
0.50
19
0.25
18
0.00
−3 −2 −1 0 1 −3 −2 −1 0 1
Score Score
1.00
0.75
0.50
0.25
0.00
−3 −2 −1 0 1
Score
112
5 MULTI-DIMENSIONAL RD DESIGNS
5 Multi-Dimensional RD Designs
The standard RD design assumes that the treatment is assigned on the basis of a score Xi
and a cutoff c according to the rule Ti = 1(Xi ≥ c), where both the score and the cutoff
are scalars (i.e., one dimensional variables). In contrast, multi-dimensional RD designs occur
when the treatment is assigned on the basis of more than one score or more than one cutoff—
or both.
In the Multi-Cutoff RD design, the treatment is assigned on the basis of a scalar score,
but different groups of units face different cutoff values. A common instance occurs when a
federal program is administered by sub-national agencies, and each agency chooses a differ-
ent cutoff value to determine program eligibility. For example, in order to target the most
disadvantaged households in a given area, the Mexican conditional cash transfer program
Progresa determined program eligibility based on a household-level poverty index. In ru-
ral areas, the cutoff that determined program eligibility varied geographically, with seven
different cutoffs used in seven different regions.
In the Multi-Score RD design, the treatment is assigned on the basis of two or more
scores, where typically a different cutoff is used for each score and the treatment is assigned
to a unit only if all scores simultaneously exceed their respective cutoffs. For example, in
education settings it is common to award scholarships to students who score above a cutoff
in both a mathematics exam and a language exam. This leads to two running variables—the
student’s grade in the mathematics exam and her grade in the language exam—and two
(possibly different) cutoffs. Another common example of a Multi-Score RD design is the
Geographic RD design, where treatment eligibility is determined based on the location of
the units relative to a geographic boundary. This type of RD settings are sometimes also
called Boundary Discontinuity Designs.
We discuss both types of multi-dimensional RD designs. We start with the Multi-Cutoff
RD design in the next section, and continue with the Multi-Score RD design. In both cases,
we present conceptual distinctions that are central for interpretation and analysis, and use
different empirical examples to illustrate how to implement estimation, inference, and falsi-
fication using both the continuity-based and the local randomization frameworks.
To formalize the Multi-Cutoff RD design we assume that the cutoff is a random variable Ci
taking on J distinct values C = {c1 , c2 , . . . , cJ }, instead of a single known constant as in the
113
5 MULTI-DIMENSIONAL RD DESIGNS
An important practical issue in the Multi-Cutoff RD design is the relationship between the
multiple cutoffs and the score induced by the treatment assignment mechanism. If a unit
with score Xi = x can be exposed to any cutoff c ∈ C, we say that the cutoffs are non-
cumulative. Figure 5.1a shows a hypothetical Multi-Cutoff RD design with three different
non-cumulative cutoffs, c1 , c2 and c3 , where a particular value x is shown for the three
subpopulations exposed to each of the three cutoffs. Panels I, II and III show that a unit
with Xi = x can be exposed to any one of the three cutoff values. Although the process that
determines whether a unit faces c1 , c2 or c3 may be related to Xi , the support of the score
is common across the three subpopulations.
In contrast, when cutoffs are cumulative, a unit’s score value restricts the number of
cutoffs to which the unit can be exposed. This case arises most frequently when different
doses of a treatment are given for different ranges of the running variable, making the cutoff
faced by each unit a deterministic function of the unit’s score value. In Figure 5.1b, units
with Xi < c1 receive treatment A, units with c1 ≤ Xi < c2 receive treatment B, units with
c2 ≤ Xi < c3 receive treatment C, and units with c3 ≤ Xi receive treatment D. Thus, a
unit’s score value is sufficient to know which cutoff (or pair of cutoffs) the unit faces: a unit
with Xi = x for c1 < x < c2 may only face cutoffs c1 or c2 , but not c3 .
There are three important practical consequences of this distinction. First, in Multi-
Cutoff RD designs with non-cumulative cutoffs, it is common for all units to receive the
same treatment regardless of which cutoff they are exposed to. In contrast, when cutoffs
are cumulative, it is common for the treatment to vary by cutoff. For example, in the SPP
114
5 MULTI-DIMENSIONAL RD DESIGNS
program introduced in Section 3, all students in Colombia receive the same subsidy, but the
cutoff for eligibility varies by geographic region. This is a case of a non-cumulative Multi-
Cutoff RD design. Alternatively, for example, municipalities may receive a different amount
of federal transfers depending on the municipality’s population, or patients may receive dif-
ferent medicine dosages depending on the result of some continuously distributed laboratory
result. With cumulative cutoffs, every time a different cutoff is crossed, the treatment received
typically increases or decreases—but it could also change altogether. Researchers interested
in an overall effect may need to redefine the treatment of interest. In the federal transfers
example, we can redefine the treatment as receiving higher transfers regardless of the partic-
ular amount, and treat all units exposed to different cutoffs as receiving the same treatment.
From this perspective, the presence of multiple cutoffs can imply observable heterogeneity
in the treatment.
Second, the cumulative rule implies a lack of common support in the value of the running
variable for units facing different cutoffs. For example, in Figure 5.1b, a unit with Xi = x
for c1 < x < c2 can only be exposed to cutoffs c1 or c2 but not c3 , and all units exposed to
the highest cutoff c3 must have score higher than or equal to c2 . In general, with cumulative
cutoffs, the subpopulations of units exposed to different cutoffs will have systematically
different values of the running variable. If the score is related to the potential outcomes, as
is common, the type of units exposed to one cutoff may thus be different from the type of
units exposed to a different cutoff, which may lead to important heterogeneity in treatment
effects, and even lack of meaningful comparability across treatment effects.
Finally, the subpopulations exposed to the different cutoffs are well defined in the non-
cumulative case but are ambiguously defined in the cumulative case. When cutoffs are non-
cumulative as in the scenario in Figure 5.1a, every unit is exposed to exactly one cutoff,
and the subpopulations exposed to each cutoff c1 , c2 , . . . , cJ are defined straightforwardly by
selecting units with Ci = c1 , Ci = c2 , . . ., Ci = cJ . In contrast, when cutoffs are cumulative,
the same unit may be exposed to two cutoffs. For example, in Figure 5.1b, the unit with
Xi = x is above the cutoff c1 and below the cutoff c2 . Thus, a cutoff-specific analysis of c1
may include this unit as a treated unit, while a cutoff-specific analysis of c2 may include the
same unit as a control. This would lead the estimated cutoff-specific effects to be correlated
with each other, in addition to altering the interpretation of the treatment effects. To avoid
this, researchers can calculate some midpoint between c1 and c2 such as c21 = c2 −c 2
1
or
c21 = median(Xi : c1 ≤ Xi < c2 ), and use units with Xi ≤ c21 in the analysis of the effect at
the cutoff c1 , and units with Xi > c21 in the analysis of the effect at the cutoff c2 .
115
5 MULTI-DIMENSIONAL RD DESIGNS
Assigned to Assigned to Units exposed to Units exposed to Units exposed to Units exposed to
c1 c1 and c2 c2 and c3 c3
control treatment
c1 x
Assigned to Assigned to Assigned to Assigned to
treatment A treatment B treatment C treatment D
Panel II: Units exposed to cutoff c2
Assigned to Assigned to
control treatment
x c2
A unit with X=x
may not face c3
Panel III: Units exposed to cutoff c3
A unit with X=x may Assigned to Assigned to
face any cutoff control treatment
x c3
c1 x c2 c3
Score (X) Score (X)
116
5 MULTI-DIMENSIONAL RD DESIGNS
1 X
θSRD (c) ≡ EWc [Yi (1, c) − Yi (0, c)] ,
NWc i:Xi ∈Wc
µ1c1(x)
τc1(c2) µ0c1(x)
τSRD(c1)
Population Exposed
Outcome
to Cutoff c1
µ1c2(x)
Population Exposed
to Cutoff c2
µ0c2(x)
τSRD(c2)
τc2(c1)
c1 c2
Score (x)
Estimation, inference, and falsification for τSRD (c) and θSRD (c) can be implemented us-
117
5 MULTI-DIMENSIONAL RD DESIGNS
ing the one-dimensional continuity-based methods discussed in Foundations and the one-
dimensional local randomization methods discussed in Section 2, respectively. These meth-
ods are implemented by considering each subsample defined by all units i with Ci = c, for
c ∈ C, and analyzing each subsample separately. These effects can then be collected for fur-
ther analysis and interpretation under additional conditions. To be more precise, restricting
the analysis to units with cutoff equal to c, a continuity-based analysis can be based on the
standard single-cutoff identification result for the sharp RD design,
where NWc , PWc and EWc are defined as in Section 2 for each Wc , c ∈ C. Plug-in estimators
and related inference procedures proceed as discussed in previous sections.
When the cutoffs are cumulative, cutoff-specific treatment effects can also be defined.
Denoting the different values or doses of the treatment as tj , j = 0, 1, . . . , J, the treatment
j=1 (tj − tj−1 )1(Xi ≥ cj ) ∈ {t1 , t2 , . . . , tJ } with t−1 = 0. This
PJ
level variable is Li =
assignment rule continuous to be a standard RD assignment rule local to the cutoff because
for each cutoff the treatment assignment rule still is Ti (c) = 1(Xi ≥ c). It follows that τSRD (c)
and θSRD (c) continue to have the same interpretation as before with the caveat that now
each treatment effect is defined relative to the previous treatment level in a cumulative way.
Observations that are exposed to two different cutoffs can be used to estimate two different
but consecutive treatment effects. For example, a unit with score cj < Xi < cj+1 will receive
treatment dosage tj and could be used both as a treatment unit when estimating τSRD (cj ) and
as a control unit when estimating τSRD (cj+1 ). As a result, cutoff-specific estimators may not
be independent, although the dependence disappears if the bandwidths or window length
are chosen so that they do not overlap across cutoffs, which implies that units contribute
to estimation and inference for only one cutoff. Once the data has been assigned to each
cutoff under analysis, local polynomial methods can also be applied cutoff by cutoff in the
cumulative multiple cutoffs case. The same logic can be used for the local randomization
framework applied to the cutoff-specific treatment effects analysis.
Normalized-and-pooled Treatment Effects
To estimate a single treatment effect for all units, we define the normalized score X̃i =
118
5 MULTI-DIMENSIONAL RD DESIGNS
Xi − Ci , pool all observations using the normalized score instead of the original score, and
use zero as the cutoff for all observations. The treatment assignment indicator is therefore
Ti = 1(Xi − Ci ≥ 0) = 1(X̃i ≥ 0) for all units. This normalizing-and-pooling strategy
combines all observations exposed to different cutoffs into a single parameter, called the
normalized-and-pooled RD treatment effect. For example, for the continuity-based Sharp
RD case, this treatment effect is
P
τSRD = lim E[Yi |X̃i = x] − lim E[Yi |X̃i = x].
x↓0 x↑0
P
Like in the cutoff-specific case, estimation and inference for τSRD can proceed in the same
way as in the standard sharp RD design with a single cutoff, using the methods discussed in
Foundations with X̃i as the score and c = 0 as the cutoff. The local randomization parameter
can be defined analogously using the notation introduced in Section 2:
1 X h Ti Yi i 1 X h (1 − T )Y i
P i i
θSRD = EW̃ − EW̃ ,
NW̃ PW̃ [Ti = 1] NW̃ 1 − PW̃ [Ti = 1]
i:X̃i ∈W̃ i:X̃i ∈W̃
where W̃ = [−w, w] for w > 0, NW̃ denotes the number of observations with normalized
score X̃i within W̃, and PW̃ [·] and EW̃ [·] denote probability and expectations computed
conditionally for those units with normalized score X̃i within W̃.
Under regularity conditions, these pooled treatment effects are equal to a weighted aver-
age of the corresponding cutoff-specific RD treatment effects. For example, in the continuity-
based framework,
P
X fX|C (c|c)P[Ci = c]
τSRD = τSRD (c)ω(c), ω(c) ≡ P[Ci = c|X̃i = 0] = P
c∈C c∈C fX|C (c|c)P[Ci = c]
with fX|C (x|c) denoting the conditional density of the score given the cutoff. In the local
P
randomization framework, a similar representation of θSRD as a weighted average of θSRD (c),
c ∈ C, can be derived. These results show that the cutoff-specific effects that contribute the
P P
most to τSRD and θSRD are those whose cutoffs have a relatively high number of observations
near them.
Extrapolation Treatment Effects
A third type of parameters of interest within the Multi-Cutoff RD design are extrapolation
parameters, which capture the effect of the treatment at values of the score other than the
cutoff c ∈ C. These parameters are useful because they allow researchers to learn about the
119
5 MULTI-DIMENSIONAL RD DESIGNS
effect of the treatment for units whose score values are not necessarily close to the specific
cutoff used for treatment assignment. Because the treatment assignment is still based on the
RD rule, the fundamental problem of causal inference makes it impossible to learn about
effects arbitrarily far away from the cutoff in the absence of additional assumptions. We
discuss one possible assumption that explicitly exploits the presence of multiple cutoffs.
We focus on the continuity-based Sharp RD case for simplicity. Letting the treatment
effect include an additional subscript indicating the cutoff to which units are exposed, we
define
τc (x) ≡ E[Yi (1) − Yi (0)|Xi = x, Ci = c] = µ1,c (x) − µ0,c (x),
where µ1,c (x) = E[Yi (1)|Xi = x, Ci = c] and µ0,c (x) = E[Yi (0)|Xi = x, Ci = c] denote the
regression functions of the potential outcomes under treatment and control, respectively, for
a given cutoff c. This notation separates the cutoff to which each population is exposed (c
subindex) from the value of the score being conditioned on (x argument). In the Multi-Cutoff
RD design, τc (x) is the average treatment effect that a population exposed to cutoff c would
exhibit at the score value x. For a fixed cutoff c, this parameter captures how the average
treatment effect varies for a subpopulation as the score changes. Our previously defined
continuity-based Sharp RD parameter τSRD (c) is thus τc (c).
Suppose we have two cutoffs, c1 and c2 , with c2 > c1 . This means that there are two
subpopulations: units exposed to the low cutoff, c1 , and units exposed to the high cutoff, c2 .
The standard RD effects at each cutoff are τc1 (c1 ) and τc2 (c2 ), which are easily estimated
with the methods already discussed. In contrast, the problem of extrapolation is to study
an effect such as τc1 (x), for c1 < x ≤ c2 , that is, the average effect of the treatment at a
score value away from the cutoffs for the subpopulation of units exposed to the low cutoff
c1 . The main challenge to identification is that all units exposed to the cutoff c1 are treated
for values of the score above c1 . That is, for x ∈ (c1 , c2 ], the treatment response function
µ1,c1 (x) and the control response function µ0,c2 (x) are estimable, while the needed control
response of the population exposed to c1 , µ0,c1 (x), is not.
A natural approach to identify and estimate τc1 (x) = µ1,c1 (x) − µ0,c1 (x) for c1 < x ≤
c2 is to use the control group of the subpopulation of units exposed to cutoff c2 to learn
about µ0,c1 (x) under appropriate conditions. A naı̈ve approach would assume µ0,c2 (x) =
µ0,c1 (x), that is, the control response of the units exposed to c1 at x would have been
the same as the control response of the units exposed to c2 at x. This assumption would
ignore any systematic differences between both subpopulations; it would be valid if, for
example, the cutoffs were randomly assigned to units. In the absence of ignorable cutoff
assignment, an alternative assumption is that any pre-existing differences between the two
120
5 MULTI-DIMENSIONAL RD DESIGNS
subpopulations are constant for all values of the score. Then, we can use the “bias” or
difference B(c1 ) ≡ µ0,c2 (c1 ) − µ0,c1 (c1 ), which is estimable from the data, to calculate the
difference B(x) ≡ µ0,c2 (x) − µ0,c1 (x) for c1 < x ≤ c2 , because such difference is assumed
constant, B(c1 ) = B(x). The assumption, which is illustrated in Figure 5.3, is analogous to
the “parallel trend” assumption in the difference-in-differences design. With this assumption,
we obtain the identification result
τc1 (x) = µ1,c1 (x) − {µ0,c2 (x) + µ0,c1 (c1 ) − µ0,c2 (c1 )} , x ∈ (c1 , c2 ],
where estimation and inference for the four conditional expectations can be conducted with
the local polynomial methods discussed in Foundations at boundary and interior points
(for example, c1 is a boundary point for estimation of µ0,c1 (c1 ) but an interior point for
estimation of µ0,c2 (c1 )). An analogous assumption can be invoked in a window around the
cutoff to identify, estimate and conduct inference on extrapolation RD treatment effects
using local randomization methods.
µ1c1(x)
µ0c1(x)
τc1(c1) τc1(x)
Outcome
µ1c2(x)
B(c1) B(x)
µ0c2(x)
c1 c2
Score (x)
121
5 MULTI-DIMENSIONAL RD DESIGNS
cohort whose SABER 11 score is above the merit cutoff, which results in a RD design with
a single score—the SISBEN wealth index. Recall that SPP program eligibility was assigned
according to three different cutoffs that varied with the student’s area of residence: 40.75 in
rural areas, 57.21 in the fourteen main metropolitan areas, and 56.32 in the rest of the urban
areas. For simplicity of exposition, we ignore compliance issues and focus on intention-to-
treat effects, using a Sharp Multi-Cutoff RD design where the running variable is the SISBEN
wealth score, the cutoff varies by region, and the treatment is being eligible to receive the
SPP program. As before, the outcome of interest is an indicator equal to one if the student
enrolled in a HEI immediately after program eligibility.
Table 5.1 shows the three different cutoff values, the number of observations exposed to
each cutoff, and the maximum and minimum SISBEN score for the subpopulation exposed
to each cutoff.
Because SPP eligibility is given to students with wealth below the cutoffs, we multiply all
cutoffs and all scores by −1 to follow the convention that the active treatment is assigned to
units above the cutoff. The analysis can be implemented with rdrobust after subsetting the
data accordingly, or with rdmulti, which employs rdrobust to perform estimation, inference
and plotting in Multi-Cutoff RD designs. We begin by plotting the data using rdplot for
the cutoff Ci = 57.21 and rdmcplot for all three cutoffs. Figure 5.4a employs a global linear
(p = 1) polynomial fit to avoid Runge’s phenomenon near the cutoff and for comparability
with the accompanying Figure 5.4b, which reports all cutoffs in a single plot also employing
a linear global fit (this figure uses half of the mimicking variance optimal number of bins,
J+MV and J−MV , to reduce the overall cluttering and improve visualization).
We now use rdrobust to conduct cutoff-specific and normalizing-and-pooling analyses.
We first estimate the RD effect of SPP eligibility on HEI enrollment for the subpopulation
of students exposed to the highest cutoff of 57.21 (SISBEN Area 1), using rdrobust with
default specifications (local linear, common MSE bandwidth, and triangular weights) only
on the subset of observations exposed to this cutoff.
122
5 MULTI-DIMENSIONAL RD DESIGNS
1.00
1.00
0.75
0.75
0.50
0.50
0.25
0.25
0.00
0.00
MV , J MV , p = 1) MV /2, J MV /2, p = 1)
(b) All Cutoffs (J−
(a) Cutoff Ci = 57.21 (J− + +
R Snippet 5.1
> out <- rdrobust(data$spadies_any[data$cutoff == -57.21], data$sisben_score[data$cutoff ==
+ -57.21], c = -57.21)
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.346 0.040 8.582 0.000 [0.267 , 0.426]
Robust - - 7.706 0.000 [0.269 , 0.452]
=============================================================================
123
5 MULTI-DIMENSIONAL RD DESIGNS
In this subpopulation, students who are barely eligible to receive the SPP subsidy are
34.6 percentage points more likely to enroll in a HEI than students who are barely ineligible.
The effect is statistically distinguishable from zero, with a robust 95% confidence interval
of approximately [0.269, 0.452]. Although this analysis can be repeated for each cutoff us-
ing rdrobust for each subpopulation, the analysis can be conducted more succinctly using
rdmulti.
R Snippet 5.2
> out <- rdmc(data$spadies_any, data$sisben_score, data$cutoff)
The first three rows show the cutoff-specific effects, which can be directly reproduced
by using rdrobust on the subset of the observations exposed to each individual cutoff. The
results in the first row therefore coincide with the rdrobust output for the subpopulation
of students in Area 1 just shown. The last two rows show two different versions of the
normalized-and-pooled Multi-Cutoff RD effect. The Pooled row displays the normalizing-
and-pooling effect, which can also be implemented with rdobust by first creating the nor-
malized score and then using it with a cutoff of zero. We illustrate this next.
124
5 MULTI-DIMENSIONAL RD DESIGNS
R Snippet 5.3
> data$xnorm <- NA
> data$xnorm[data$sisben_area == "Main metro area"] <- data$sisben_score[data$sisben_area ==
+ "Main metro area"] + 57.21
> data$xnorm[data$sisben_area == "Other urban area"] <- data$sisben_score[data$sisben_area ==
+ "Other urban area"] + 56.32
> data$xnorm[data$sisben_area == "Rural area"] <- data$sisben_score[data$sisben_area ==
+ "Rural area"] + 40.75
> out <- rdrobust(data$spadies_any, data$xnorm, c = 0)
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.269 0.023 11.709 0.000 [0.224 , 0.314]
Robust - - 10.047 0.000 [0.221 , 0.328]
=============================================================================
125
5 MULTI-DIMENSIONAL RD DESIGNS
rdmulti output multiplies each cutoff-specific cutoff by the estimated weights, implemented
as ŵ(c) = P̂(Ci = c|X̃i = 0) = ni=1 1(Ci = c, −h < X̃i < h)/ ni=1 1(−h < X̃i < h), for
P P
bandwidth h > 0. In other words, in the row labeled Weighted, the weights that are implicitly
imposed by normalizing and pooling are directly estimated and then used to calculate the
estimated pooled effect multiplying each cutoff-specific effect by its corresponding weight—
this explains why the point estimates in rows Weighted and Pooled are so similar to each
other. The point estimate in the Weighted column can thus be obtained by multiplying each
cutoff-specific effect by the estimated weights, both of which are returned by rdmc:
R Snippet 5.4
> out <- rdmc(data$spadies_any, data$sisben_score, data$cutoff)
126
5 MULTI-DIMENSIONAL RD DESIGNS
The implied weights in the normalizing-and-pooling approach are 0.3840234 for Area 1,
0.53424658 for Area 2, and 0.08173003 for Area 3. The relatively lower weight for Area 3 is
expected, as this comprises all rural areas where the number of observations is much smaller
than in the urban areas (see Table 5.1). Although the other two areas have similar numbers
of total observations, Area 2’s estimated weight is 0.53, larger than Area 1’s 0.38 weight.
The difference arises because, compared to Area 3, Area 2 has relatively more students with
SISBEN wealth scores near the cutoff.
Both the cutoff-specific analysis and the normalizing-and-pooling analysis lead to similar
conclusions: eligibility for the SPP program leads to a 20 to 30 percentage-point increase in
the probability of enrolling in a HEI. Finally, we can formally test whether the effects at
each specific cutoff are different from each other.
127
5 MULTI-DIMENSIONAL RD DESIGNS
R Snippet 5.5
> out <- rdmc(data$spadies_any, data$sisben_score, data$cutoff)
The SPP eligibility effect is roughly 34.6 percentage points in Area 1, and this is sta-
tistically significantly different from the effect in Area 2, which is roughly 20 percentage
points. (The difference shown in the output above is 16.3 percentage points, larger than
34.6 − 20 = 14.6, because the point estimates used to construct the z-stastic are the bias-
corrected estimated, not the conventional point estimates reported in the printed output.) In
128
5 MULTI-DIMENSIONAL RD DESIGNS
contrast, the effects for the smallest two cutoffs (Areas 2 and 3) are indistinguishable from
each other.
Another extension of the canonical single-cutoff RD design occurs when two or more running
variables determine the treatment assignment. For example, a grant or scholarship may be
given to students who receive a grade above a given cutoff in both a mathematics and a
language exam. This leads to two running variables—the student’s grade in the mathematics
exam and her grade in the language exam—and thus two (often different) cutoffs for the
running variables. To formalize the Multi-Score RD Design, we assume each unit’s score is
a bivariate vector (instead of a scalar as before) denoted by Xi = (X1i , X2i )0 . Then, the
treatment assignment is Ti = a(Xi ) for some assignment function a : R2 7→ {0, 1}. A simple
treatment assignment rule is to require both scores be above a cutoff to assign treatment,
which leads to a(Xi ) = 1(X1i > b1 ) · 1(X2i > b2 ) where b1 and b2 denote the cutoff points for
each dimension. More complex treatment assignment rules would have a varying boundary
on the plane, as it occurs in Geographic RD designs.
Continuing with our education example, assume that the scholarship is given to all stu-
dents who score above 60 in the language exam and above 80 in the mathematics exam,
letting X1i denote the language score and X2i the math score, and b1 = 80 and b2 = 60 be
the respective cutoffs. According to this hypothetical treatment assignment rule, a student
with score xi = (80, 59.9) is assigned to the control condition, since 1(80 ≥ 80) · 1(59.9 ≥
60) = 1 · 0 = 0, and misses the treatment only barely—had she scored an additional 1/10 of
a point in the mathematics exam, she would have received the scholarship. Without a doubt,
this student is close to the cutoff criteria for receiving the treatment. However, scoring very
close to both cutoffs is not the only way for a student to be barely assigned to treatment
or control. A student with a perfect 100 score in language would still be barely assigned to
control if he scored 59.9 in the mathematics exam, and a student with a perfect math score
would be barely assigned to control if she got 79.9 points the language exam. Thus, with
multiple running variables, there is no longer a single cutoff value at which the treatment
status of units changes from control to treated; instead, the discontinuity in the treatment
assignment occurs along a boundary of points. This is illustrated graphically in Figure 5.5a.
In Multi-Score RD designs, the boundary where treatment assignment changes discon-
tinuously is B = {x ∈ R2 : x ∈ (bd(A1 ) ∩ bd(A0 ))} with A1 = {x ∈ R2 : a(x) = 1}
and A0 = {x ∈ R2 : a(x) = 0} denoting the treated and control areas, respectively, and
129
5 MULTI-DIMENSIONAL RD DESIGNS
100 45°N
43°N
Latitude (X2)
60
40
41°N
20 40°N
Language Cutoff
where bd(A) denotes the frontier or boundary of the set A, defined as the set’s closure minus
its interior (bd(A) ≡ cl(A) \ int(A)). In the education example depicted in Figure 5.5a,
the assignment boundary takes the simple form B = {x = (x1 , x2 )0 : (x1 ≥ 80 and x2 =
60) or (x1 = 80 and x2 ≥ 60)}.
An important special case of the Multi-Score RD design is the Geographic RD design,
which often arises when adjacent administrative units such as counties, municipalities o states
are assigned opposite treatment status. In this case, the boundary B at which the treatment
assignment changes discontinuously is the border that separates the adjacent administrative
units—in other words, B separates a geographic treated area from a geographic control
area. For example, in the 2010 primary election in Colorado in the United States, some
counties had all-mail elections where voting could only be conducted by mail and in-person
voting was not allowed, while other counties had traditional in-person voting. Where the
two types of counties are adjacent, the administrative border between the counties induces a
discontinuous treatment assignment between in-person and all-mail voting, and a Geographic
RD design can be used to estimate the effect of adopting all-mail elections on voter turnout.
A hypothetical geographic assignment is shown in Figure 5.5b.
In the Geographic RD design, the score Xi = (X1i , X2i )0 is a vector of two coordinates
such as latitude and longitude that determine the exact geographic location of unit i. In
practice, this score is calculated using Geographic Information Systems (GIS) software, which
allows researchers to obtain the coordinates corresponding to the geographic location of each
unit in the study, as well as to locate the entire treated and control areas, and all points
130
5 MULTI-DIMENSIONAL RD DESIGNS
on the boundary between them. The assignment function a(x) is thus determined by the
administrative or otherwise geographic boundary, which is often not as simple as the Leontief
function in the education example shown in Figure 5.5a. In the upcoming sections, we present
two empirical illustrations: one has an assignment boundary similar to 5.5a, while the other
one has an assignment boundary similar to 5.5b.
Analogously to the Multi-Cutoff RD design, the parameters of interest in the Multi-Score
RD design also change because there is no longer a single cutoff at which the probability
of treatment assignment changes discontinuously but rather an often uncountable collection
of locations along the boundary B induced by the treatment assignment function a(x). One
approach is to consider different RD treatment effects for location-specific points on the
boundary B, analogous to cutoff-specific effects in the Multi-Cutoff RD design. Another
approach is to consider a single treatment effect along the boundary B by normalizing-and-
pooling. Below we discuss both approaches to identification, estimation and inference when
implemented by either considering the multi-dimensional score Xi directly or reducing the
score dimension to scalar score via a distance function. As in the Multi-Cutoff RD design,
extrapolation treatment effects can also be defined in the Multi-Score RD design; we omit
them from our discussion to conserve space.
We assume perfect compliance and no spillovers to simplify the exposition, and thus focus
on a sharp RD setting with multiple scores. The continuity-based parameter of interest in
the Multi-Score RD design is a generalization of the standard Sharp RD design parameter,
where the average treatment effect is calculated at all points along the boundary B where
the treatment assignment changes discontinuously from zero to one:
with the only difference that now limits are taken along two dimensions. In words, for each
two-dimensional cutoff point b along the boundary B, the treatment effect at that point is
identifiable by the limits of the observed bivariate regression functions for the treated and
131
5 MULTI-DIMENSIONAL RD DESIGNS
control groups. The important distinction with respect to the one-dimensional score case is
that the Multi-Score RD design generates a family or curve of treatment effects τSMS (b), one
for each boundary point b ∈ B. For instance, in the context of the example in Figure 5.5, two
different sharp RD treatment effects are τSMS (80, 70) and τSMS (90, 60). Treatment effects and
related methods within the local randomization framework can also be defined and applied
following the analogy with the single-score case.
For implementation, a simple approach is to choose a grid of points in B and estimate
treatment effects for each point on the grid, which effectively maps (via discretization of B)
the problem to a Multi-Cutoff RD design; see Section 5.3 for more discussion. The bivariate
score Xi is often reduced to a univariate score via a distance function to the boundary,
which in turn allows for the deployment of all the methods in Foundations and the previous
sections of this manuscript. We follow this approach for the analysis of both location-specific
and normalized-and-pooled treatment effects.
Location-specific Treatment Effects
For estimation of treatment effects at specific locations along the boundary B, the bi-
variate score Xi is reduced to a scalar score by computing the distance of each unit’s multi-
dimensional score to the desired location on the boundary. To formalize this approach, sup-
pose b = (b1 , b2 )0 ∈ B is the location-specific point. Define
for each unit i = 1, 2, . . . , n, where d : R2 ×R2 7→ R+ denotes a distance metric. The choice of
distance metric depends on the particular application. For non-geographic applications, this
p
is typically the Euclidean distance, d(Xi , b) = (X1i − b1 )2 + (X2i − b2 )2 . For geographic
applications, the Euclidean distance may not be appropriate if calculated over a large geo-
graphic area, because it fails to account for the approximately spherical shape of Earth. In
this case, the geodetic distance (the shortest great-arc distance between points that lie on
a spherical surface) or the chordal distance (the distance of the chord joining two points on
a sphere), may be more appropriate. In some geographic RD designs researchers might also
be interested in other measures of distance to the boundary, such as the driving or walking
distance, or the distance along paved roads; these distances will require more geographic in-
formation in addition to the unit’s geographic coordinates, and are typically calculated with
GIS software. Banerjee (2005) discusses different metrics appropriate for measuring distance
between geographic locations on Earth.
Given (Yi , Di (b)), i = 1, 2, . . . , n, one-dimensional RD analysis for all observations to-
132
5 MULTI-DIMENSIONAL RD DESIGNS
gether can be used to identify and estimate the location-specific treatment effect at the point
b ∈ B, with Xi = Di (b) as the scalar running variable and c = 0 as the cutoff, employing
either the continuity-based or local randomization frameworks.
In the continuity-based case,
τSMS (b) = lim E[Yi |Di (b) = x] − lim E[Yi |Di (b) = x],
x↓0 x↑0
and hence the analysis of the location-specific continuity-based treatment effects requires
analogous assumptions to the one-dimensional RD design for each boundary point studied,
and can be implemented using the methods discussed in Foundations.
Similarly, in the local randomization case,
1 X Ti Yi 1 X (1 − Ti )Yi
θSRD (b) = EWb − EWb
NWb PWb [Ti = 1] NWb 1 − PWb [Ti = 1]
i:Di (b)∈Wb i:Di (b)∈Wb
where NWb , PWb and EWb are defined as in Section 2 for each Wb = [−w, w], w > 0, and
b ∈ B. A local randomization analysis of location-specific effects can be implemented using
the methods discussed in Section 2. In practice, both approaches are implemented for a finite
collection of evaluation points along the boundary B.
Normalized-and-Pooled Treatment Effects
In the Multi-Score RD design, it is also possible to analyze all boundary points simul-
taneously by considering the normalized-and-pooled treatment effect. Instead of performing
estimation and inference for multiple treatment effects located at a specific point on the
boundary, this approach considers the effects at all boundary points simultaneously by using
as the running variable the shortest distance to the boundary, and then pooling all observa-
tions in a one-dimensional RD analysis.
Formally, the score for each unit is set to be Xi = Di = d(Xi )a(Xi ) − d(Xi )(1 − a(Xi ))
with d(Xi ) = minb∈B d(Xi , b), and the cutoff is c = 0. This approach is analogous to the
normalizing-and-pooling approach in the Multi-Cutoff RD design. Because the resulting score
is a scalar, estimation and inference uses the methods discussed for scalar, single-score RD
designs.
133
5 MULTI-DIMENSIONAL RD DESIGNS
134
5 MULTI-DIMENSIONAL RD DESIGNS
For continuity-based estimation and inference at each of the points b1 , b2 and b3 , we use
the rdms command. When we call the command, we include the outcome (spadies any),
the two normalized scores (running sisben and running saber11), a variable that indicates
which observations are assigned to treatment versus control (tr), and the coordinates of each
boundary point (which must be passed via the C and C2 options, corresponding to the two
dimensions).
R Snippet 5.6
> cvec <- c(0, 30, 0)
> cvec2 <- c(0, 0, 50)
> out <- rdms(Y = data$spadies_any, X = data$running_sisben, X2 = data$running_saber11,
+ zvar = data$tr, C = cvec, C2 = cvec2)
================================================================================
Cutoff Coef. P-value 95% CI hl hr Nh
================================================================================
(0.00,0.00) 0.323 0.000 0.293 0.379 30.701 30.701 41771
(30.00,0.00) 0.315 0.000 0.286 0.356 42.582 42.582 71579
(0.00,50.00) 0.229 0.000 0.144 0.351 27.762 27.762 5057
================================================================================
The results indicate, once again, that students who are barely eligible for the SPP pro-
gram enroll in HEI at much higher rates than students who are barely ineligible, with some
heterogeneity across boundary points. While the effects at b1 = (0, 0) and b2 = (30, 0) are
roughly similar at roughly 32 percentage points, the effect at b3 = (0, 50) is substantially
smaller (around 23 percentage points), suggesting that the effects are greater for students
who are marginal in terms of need than for poorer students who are marginal in terms of
merit (students at b3 are further from and above the wealth cutoff and are thus poorer than
students at b1 and b2 ).
135
5 MULTI-DIMENSIONAL RD DESIGNS
We can replicate the results from rdmc by following the steps outlined above: first calcu-
late the Euclidean distance of every observation to this point, and then use that distance in
a one-dimensional RD analysis. Below we show how to implement this for b1 = (30, 0).
R Snippet 5.7
> pdim1 <- 30
> pdim2 <- 0
> data$dist <- sqrt((data$running_sisben - pdim1)^2 + (data$running_saber11 -
+ pdim2)^2)
> data$dist <- data$dist * (2 * data$tr - 1)
> out <- rdrobust(data$spadies_any, data$dist)
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.315 0.013 24.231 0.000 [0.290 , 0.341]
Robust - - 17.933 0.000 [0.286 , 0.356]
=============================================================================
which leads to the same results given by rdms for this boundary point.
136
5 MULTI-DIMENSIONAL RD DESIGNS
Table 5.2: Shortest distance to the assignment boundary (d(Xi ))—SPP data
For the analysis, we use the information in Table 5.2 to calculate the shortest distance to
the boundary for every observation defined before, d(Xi ), which gives the scalar normalized
score Di = d(Xi )(2a(Xi ) − 1) with associated cutoff c = 0.
137
5 MULTI-DIMENSIONAL RD DESIGNS
R Snippet 5.8
> data2 <- data[is.na(data$running_sisben) == FALSE & is.na(data$running_saber11) ==
+ FALSE, ]
> data2$aux1 <- abs(data2$running_sisben)
> data2$aux2 <- abs(data2$running_saber11)
> data2$case <- NA
> data2$case[data2$running_sisben >= 0 & data2$running_saber11 >=
+ 0] <- 1
> data2$case[data2$running_sisben <= 0 & data2$running_saber11 >=
+ 0] <- 2
> data2$case[data2$running_sisben >= 0 & data2$running_saber11 <=
+ 0] <- 3
> data2$case[data2$running_sisben <= 0 & data2$running_saber11 <=
+ 0] <- 4
> data2$xnorm <- NA
> data2$xnorm[data2$case == 1] <- apply(data2[data2$case == 1,
+ c("aux1", "aux2")], 1, FUN = min)
> data2$xnorm[data2$case == 2] <- data2$aux1[data2$case == 2]
> data2$xnorm[data2$case == 3] <- data2$aux2[data2$case == 3]
> data2$xnorm[data2$case == 4] <- sqrt(data2$aux1[data2$case ==
+ 4]^2 + data2$aux2[data2$case == 4]^2)
138
5 MULTI-DIMENSIONAL RD DESIGNS
And we run a one-dimensional RD analysis using this modified distance measure (Di =
d(Xi )(2a(Xi ) − 1)) as the score.
R Snippet 5.10
> out <- rdrobust(data2$spadies_any, data2$xnorm)
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.264 0.014 19.435 0.000 [0.238 , 0.291]
Robust - - 16.556 0.000 [0.229 , 0.290]
=============================================================================
Finally, the same result can be obtained with rdms, passing Di as an argument with the
option xnorm.
139
5 MULTI-DIMENSIONAL RD DESIGNS
R Snippet 5.11
> out <- rdms(Y = data2$spadies_any, X = data2$running_sisben,
+ X2 = data2$running_saber11, zvar = data2$tr, C = cvec, C2 = cvec2,
+ xnorm = data2$xnorm)
================================================================================
Cutoff Coef. P-value 95% CI hl hr Nh
================================================================================
(0.00,0.00) 0.323 0.000 0.293 0.379 30.701 30.701 41771
(30.00,0.00) 0.315 0.000 0.286 0.356 42.582 42.582 71579
(0.00,50.00) 0.229 0.000 0.144 0.351 27.762 27.762 5057
--------------------------------------------------------------------------------
Pooled 0.264 0.000 0.229 0.290 10.815 10.815 22824
================================================================================
140
5 MULTI-DIMENSIONAL RD DESIGNS
an average of 177 presidential campaign ads in the two months before the election, while
New Jersey residents in the New York DMA saw no ads in the same period. The geographic
RD design compares two adjacent areas: the part of the WWP school district contained
within the Philadelphia DMA, and the part of this district contained within the New York
DMA. From the New Jersey voter registration file, the authors collected the list of citizens
in the WWP school district who were registered to vote by 2008, including an indicator of
whether each person voted in the 2008 presidential election, the main outcome of interest.
This file also contained the residential address of each registered citizen, which allowed the
authors to geolocate each person. After geolocation, each person in the registration file was
associated with two geographic coordinates (latitude and longitude) which together indicate
the person’s residential address. Descriptive statistics for the main variables are presented
in Table 5.3.
In this geographic RD design, the unit of analysis is the individual who appears in
the 2008 registration file and lives in the WWP school district, the score is the latitude-
longitude vector that stores the geographic coordinates of the individual’s residence, Xi =
(latitudei , longitudei ), the treatment of interest is political television advertisements, and
the treatment assignment rule is a(Xi ) = 1((latitudei , longitudei ) ∈ APAdma ), where the set
APAdma collects all the geographic coordinates corresponding to locations inside the Philadel-
phia DMA. The data has been pre-processed using GIS software to include following infor-
mation:
• Distance between each observation’s coordinates and the closest point on the boundary
(i.e., perpendicular distance, for estimation of the normalized and pooled effect).
Figure 5.7 shows the raw scatter plot of longitude against latitude for all observations
in the replication dataset; the plot also shows the boundary that separates the treated
and control areas. Compared to the example in Figure 5.5a, Figure 5.7 depicts a real data
version of Figure 5.5b, where the assignment boundary is irregular. In contrast to non-
geographic applications of the Multi-Score RD Design, the boundary that separates treated
and control areas in a Geograhic RD design does not typically have a closed form expression.
The boundaries between geographic units (counties, school districts, DMAs) are typically
141
5 MULTI-DIMENSIONAL RD DESIGNS
decided by governmental or other administrative units; their precise location is given via a
collection of files, sometimes referred to collectively as shape files, that store the location
(and also attributes such as elevation) of geographic features (points, lines, and polygons).
These files are analyzed using geographic information systems (GIS) software. Thus, instead
of deriving the set B from the treatment assignment rule, as we did in the SPP example,
obtaining the boundary in a Geographic RD design requires external information. In this
case, the authors obtained the shape files with the polygons representing the New York
and Philadelaphia DMAs and the West Windsor-Plainsboro school district, and using GIS
software they obtained a set of 80 latitude-longitude points that are on the border between
the New York and Philadelaphia DMAs and inside the West Windsor-Plainsboro district.
Adding these 80 points as a line in the scatter plot in Figure 5.7 produces the boundary.
-74.55
-74.6
Longitude
-74.65
Control Treated
Boundary b1
b2 b3
-74.7
40.25 40.3 40.35
Latitude
Using GIS software it can be computed the latitude-longitude coordinates of three location-
specific points on the boundary where the RD treatment effects will be estimated: b1 =
(40.32489, −74.61789), b2 = (40.32037, −74.60335), and b3 = (40.31497, −74.59191). These
142
5 MULTI-DIMENSIONAL RD DESIGNS
points are represented by the solid circles in Figure 5.7. The points b1 and b3 were chosen
to split the boundary into three equal segments roughly 2.3 kilometers long; the point b2 is
the midpoint between b1 and b3 .
We illustrate how to estimate point-specific effects by analyzing the middle point, b2 =
(40.32037, −74.60335). Using the latitude and longitude coordinates of every observation as
inputs, we start by calculating the distance from each observation to b2 . Three typical dis-
tance measures are: the geodetic distance, the chordal distance, and the Euclidean distance.
See Banerjee (2005) for more discussion. We consider the chordal distance for specificity:
given any two points on the earth’s surface, the chordal distance between them is the length
of a line, passing through the three dimensional earth, to connect those two points. Figure 5.8
shows the histogram of the chordal distance to boundary point b2 measured in kilometers,
separately for treated and control observations. The density of observations near the bound-
ary is low, which is typical in geographic RD applications where the boundary splits less
populated areas. We also see that the distances do not get all the way to zero. For example,
the minimum distance in the treatment group is 0.30689 km and the minimum distance in
the control group is 0.4239642 km. It is important for researchers to check the density of the
distance measure; a very low or zero number of observations with distances near zero may
indicate that the areas surrounding the boundary are not sufficiently populated, which will
result in excessive extrapolation when estimating RD results.
900
1000
Frequency
Frequency
600
500
300
0 0
The outcome of interest is an indicator equal to one if the person voted in the 2008
general election. Recall that the treatment assignment indicator is equal to one if the person’s
residence is located in the Philadelphia DMA, where political TV ads were plentiful, and
zero if the person’s residence is in the New York DMA, where the volume of ads was very
143
5 MULTI-DIMENSIONAL RD DESIGNS
low. Under appropriate assumptions, the RD effect thus captures the effect of a high volume
of political TV ads on voter turnout, for voters in the WWP school district who reside right
at the boundary between the Philadelphia and New York DMAs.
We estimate the RD effect at the boundary point b2 = (40.32037, −74.60335) with local
polynomials implemented with rdrobust, using the chordal distance from each person’s
residence to this point as the score. (We omit the code that calculates the chordal distance
to conserve space, but provide it in the accompanying replication codes.)
R Snippet 5.12
> out <- rdrobust(data$e2008g, data$dist2)
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional -0.002 0.074 -0.027 0.978 [-0.148 , 0.144]
Robust - - -0.205 0.838 [-0.197 , 0.160]
=============================================================================
The local linear point estimate is very small, −0.002, with robust p-value of 0.838 and a
robust confidence interval roughly centered around zero. Thus, we see no effect of residing
in the Philadelphia DMA on voter turnout in 2008. The effects at the other two boundary
points, b1 and b3 , can be estimated analogously.
144
5 MULTI-DIMENSIONAL RD DESIGNS
The default implementation with rdrobust used above chooses the bandwidth optimally;
when the analysis is performed for each boundary point, this strategy may result in some
observations being included in the analysis of more than one point. If researchers want
to avoid reusing observations between boundary points, they can choose the bandwidth
manually. For example, in this application, the distance between b2 and each of the boundary
points b1 and b3 is roughly 1.15 km; thus, to ensure that each observation is used in exactly
one boundary point, researchers could set the bandwidth manually to 0.575 km (assuming
that there are enough observations). We omit this analysis because of space considerations.
We can also use the rdms command for analysis, but this will only be useful if the
researcher is interested in using Euclidean distance, as rdms only uses this metric to calculate
the distances between each observation’s location and each boundary point.
R Snippet 5.13
> out <- rdms(data$e2008g, data$latitude, data$lat_cutoff[1:3],
+ data$longitude, data$treat, data$long_cutoff[1:3])
================================================================================
Cutoff Coef. P-value 95% CI hl hr Nh
================================================================================
(40.32,-74.62) -0.031 0.682 -0.224 0.147 0.020 0.020 2843
(40.32,-74.60) 0.034 0.900 -0.197 0.224 0.014 0.014 1737
(40.31,-74.59) 0.035 0.711 -0.183 0.269 0.019 0.019 2307
================================================================================
The middle row corresponds to the effect for the boundary point b2 , which is 0.034. This
effect is different from the point estimate of −0.002 estimated above for the same point. The
discrepancy occurs because rdms uses the Euclidean metric to calculate the distances from
the raw latitute and longitude inputs passed to the function and the boundary point where
the analysis is being performed. In contrast, our result above obtained with rdrobust used
the chordal distance as the score, which we calculated manually. Despite the difference in
point estimate, the conclusions are the same, as both robust p-values are similar (between
0.8 and 0.9), and both robust confidence intervals are roughly symmetrical around zero,
indicating a lack of a treatment effect.
Finally, we also calculate the pooled RD effect for all observations together, using the
perpendicular distance to the boundary as the score (this distance was calculated by Keele
and Titiunik using GIS software), using both rdrobust and rdms via the option xnorm.
145
5 MULTI-DIMENSIONAL RD DESIGNS
R Snippet 5.14
> out <- rdms(data$e2008g, data$latitude, data$lat_cutoff[1:3],
+ data$longitude, data$treat, data$long_cutoff[1:3], xnorm = data$perp_dist)
================================================================================
Cutoff Coef. P-value 95% CI hl hr Nh
================================================================================
(40.32,-74.62) -0.031 0.682 -0.224 0.147 0.020 0.020 2843
(40.32,-74.60) 0.034 0.900 -0.197 0.224 0.014 0.014 1737
(40.31,-74.59) 0.035 0.711 -0.183 0.269 0.019 0.019 2307
--------------------------------------------------------------------------------
Pooled 0.050 0.396 -0.083 0.211 0.803 0.803 2086
================================================================================
R Snippet 5.15
> out <- rdrobust(data$e2008g, data$perp_dist)
> summary(out)
Sharp RD estimates using local polynomial regression.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.050 0.064 0.777 0.437 [-0.075 , 0.174]
Robust - - 0.848 0.396 [-0.083 , 0.211]
=============================================================================
146
5 MULTI-DIMENSIONAL RD DESIGNS
There is an important connection between RD designs with multiple running variables and
RD designs with multiple cutoffs. In the Multi-Cutoff RD design, we considered a discrete
set of J cutoff points, C = {c1 , c2 , . . . , cJ }, and defined the cutoff-specific RD effects as the
effects for the subset of observations exposed to each of those cutoffs. For implementation,
we suggested to keep all observations exposed to cutoff cj , and then perform standard single-
cutoff RD analysis using either the raw score and cutoff, or the distance to the cutoff with
a normalized cutoff of c = 0.
Similarly, in our discussion of Multi-Score RD designs, we described how to estimate
location-specific treatment effects by (i) selecting a grid of G two-dimensional points on
the boundary B, denoted Bgrid = {b1 , b2 , . . . , bG }, and then perform RD analysis for each
point at the time. Because the score is two-dimensional, we also discussed how to reduce
the dimensionality of the problem by considering the distance of each unit to the specific
cutoff under consideration. After calculating the distance between each observation’s score
and the boundary point, standard single-cutoff RD methods can be applied directly by using
the distance measure as the scalar score variable and a normalized cutoff of c = 0.
Putting the ideas above together, we can assign each boundary point on the grid Bgrid
to a cutoff in C = {c1 , c2 , . . . , cJ }, with J = G. Then, for each observation in the dataset we
can assign a running variable relative to each cutoff (boundary point), where this running
variable is equal to the distance between the observation’s score and the cutoff. With these
modifications, any Multi-Score RD design can be analyzed as a Multi-Cutoff RD design
over finitely many cutoff points on the boundary, which shows the natural close connections
between both multi-dimensional RD designs.
147
5 MULTI-DIMENSIONAL RD DESIGNS
distinction between cumulative and non-cumulative cutoffs. They also provide analogous
results for Fuzzy and Kink RD designs, and further discuss the connections between Multi-
Score and Multi-Cutoff RD designs in the supplemental appendix. Cattaneo, Keele, Titiunik,
and Vazquez-Bare (2021) proposed to use a Multi-Cutoff RD framework for extrapolation
of treatment effects; they present multi-cutoff extrapolation for both the continuity-based
and the local randomization approaches (the latter is covered in supplemental appendix).
Papay, Willett, and Murnane (2011), Reardon and Robinson (2012), and Wong, Steiner, and
Cook (2013) discuss generic Multi-Score RD settings, and Keele and Titiunik (2015) discuss
a generic geographic continuity-based RD design. Keele and Titiunik (2018) discuss the
application of a Geographic RD design to the study of all-mail voting in Colorado, considering
the possibility of spillovers between treated and control areas. Cattaneo, Titiunik, and Yu
(2023) develop uniform inference methods for boundary discontinuity designs. See Cattaneo
and Titiunik (2022) for more references.
148
6 FINAL REMARKS
6 Final Remarks
This monograph continues the practical discussion of RD analysis that we started in Founda-
tions. In that first volume, we focused on the canonical RD setup where the running variable
has a single dimension and is continuously distributed so that all observations have a dif-
ferent value, there is only one cutoff, compliance with the treatment assignment is perfect,
and all effects are defined at the cutoff and estimated via local polynomials based on extrap-
olation and continuity assumptions. This volume explore the implications of relaxing these
assumptions.
Section 2 presented the local randomization framework as an alternative way of analyz-
ing and interpreting RD designs. Instead of focusing on the cutoff at which the assignment
switches from control to treatment, this approach defines a window around the cutoff and de-
ploys assumptions akin to those in a randomized controlled experiment to define and analyze
treatment effects. Because these assumptions are stronger than the standard RD continuity
assumptions, we presented this alternative approach as a complement to the continuity-based
methods in Foundations. Still, local randomization methods are an important part of the RD
toolkit because it is common to justify the RD assumptions by invoking a similarity between
the RD treatment assignment and the way treatment is assigned in a true randomized ex-
periment. This similarity was invoked by Thistlethwaite and Campbell (1960) to justify the
RD design in their foundational 1965 paper. We hope that our discussion of the advantages
and limitations of the local randomization approach is useful to clarify the analogy between
RD designs and randomized experiments that is so often invoked in practice.
Section 3 focused on Fuzzy RD designs and discussed best practices for analysis when
the compliance with treatment is imperfect. This is relevant to many real world applications
of the RD design. For example, in social programs and other policies that are assigned via
RD rules, individuals are given an encouragement to receive a treatment when their score
is above a cutoff rather than being coerced to take a treatment. We introduced and discuss
several treatment effects in the context of Fuzzy RD designs, and explained how continuity-
based and local randomization methods can be effectively deployed in such context. We also
highlighted the role of validation and falsification methods.
Section 4 discussed the features and limitations of continuity-based and local randomiza-
tion methods when the running variable is discrete and thus different observations have the
same value of the score. Continuity-based methods are not applicable to the analysis of RD
designs with discrete running variables without further assumptions allowing extrapolation,
while local randomization methods are often more appropriate. Our discussion illustrated
149
6 FINAL REMARKS
how local randomization concepts and methods can be effectively deployed to estimate useful
treatment effects in this case, focusing on both standard parameters as well as new param-
eters arising form the fact that the score is discretely distributed.
Finally, Section 5 studied Multidimensional RD designs, covering Multi-Cutoff, Multi-
Score and Geographic RD designs. We discussed resources and examples RD designs exhibit
multiple cutoffs or multiple scores for assignment of a treatment. Although most of the main
conceptual distinctions and assumptions remain unchanged, the introduction of multiple
dimensions leads to many different parameters of potential interest. Our discussion outlined
how to define, analyze and interpret such parameters. We also offered several empirical
illustrations highlighting some of the nuisances of each multidimensional RD design.
We hope that the combination of Foundations and Extensions provides a useful practical
guide for empirical researchers, and contributes to the transparency and replicability of RD
analysis across all disciplines.
150
BIBLIOGRAPHY
Bibliography
Abadie, A., and M. D. Cattaneo (2018): “Econometric Methods for Program Evalua-
tion,” Annual Review of Economics, 10, 465–503.
Arai, Y., Y. Hsu, T. Kitagawa, I. Mourifié, and Y. Wan (2022): “Testing Identifying
Assumptions in Fuzzy Regression Discontinuity Designs,” Quantitative Economics, 13(1),
1–28.
Calonico, S., M. D. Cattaneo, and M. H. Farrell (2018): “On the Effect of Bias
Estimation on Coverage Accuracy in Nonparametric Inference,” Journal of the American
Statistical Association, 113(522), 767–779.
(2020): “Optimal Bandwidth Choice for Robust Bias Corrected Inference in Re-
gression Discontinuity Designs,” Econometrics Journal, 23(2), 192–210.
(2022): “Coverage Error Optimal Confidence Intervals for Local Polynomial Re-
gression,” Bernoulli, 28(4), 2998–3022.
151
BIBLIOGRAPHY
152
BIBLIOGRAPHY
Dong, Y. (2015): “Regression Discontinuity Applications with Rounding Errors in the Run-
ning Variable,” Journal of Applied Econometrics, 30(3), 422–446.
Feir, D., T. Lemieux, and V. Marmer (2016): “Weak identification in fuzzy regression
discontinuity designs,” Journal of Business & Economic Statistics, 34(2), 185–196.
Hahn, J., P. Todd, and W. van der Klaauw (2001): “Identification and Estimation of
Treatment Effects with a Regression-Discontinuity Design,” Econometrica, 69(1), 201–209.
Imbens, G., and D. B. Rubin (2015): Causal Inference in Statistics, Social, and Biomedical
Sciences. Cambridge University Press.
Keele, L., and R. Titiunik (2018): “Geographic natural experiments with interference:
The effect of all-mail voting on turnout in Colorado,” CESifo Economic Studies, 64(2),
127–149.
153
BIBLIOGRAPHY
Lee, D. S., and D. Card (2008): “Regression discontinuity inference with specification
error,” Journal of Econometrics, 142(2), 655–674.
Lindo, J. M., N. J. Sanders, and P. Oreopoulos (2010): “Ability, Gender, and Per-
formance Standards: Evidence from Academic Probation,” American Economic Journal:
Applied Economics, 2(2), 95–117.
154
BIBLIOGRAPHY
155