Project M and E New
Project M and E New
Evaluation
October, 2021
Discussion point
• What did you do this morning after you got dressed?
• Did you shave/put on make-up this morning?
• If yes, did you do so while looking in a mirror?
• What is the difference between shaving/putting on make-
up with and without looking in the mirror?
• Relate this to M and E.
2
Monitoring and Evaluation
The Current Context:
increasing call for results & value for money
shrinking budgets
yet increasing complexity in program/project settings
So, what have M & E to do with these issues?
We need monitoring & evaluation to:
identify projects that better meet needs & lead to
improvements in targeted social, economic &
environmental conditions
improve decision-making
achieve more outcomes
tell stories clearly in an evidence-based manner
enhance organizational learning
3
Monitoring
Monitoring: is an internal project activity
It is the continuous assessment of project
implementation in relation to:
schedules
input use
infrastructure
services
Monitoring:
provides continuous feedback
identifies actual or potential successes/ problems
4
Monitoring
Monitoring is very important in project planning and
implementation. It is like watching where you are going
while riding a bicycle; you can adjust as you go along and
ensure that you are on the right track.
5
Why M&E….
6
Why M&E ….
• Tracking resources
• Feedback on progress
• Improving project effectiveness
• Informing decisions
• Promoting accountability
• Demonstrating impact
• Identifying lessons learned
Why M&E ….
Monitoring provides information that will be useful in:
Exploring the situation in the community;
Determining whether the inputs in the project are well
utilized;
Identifying problems facing the community or project and
finding solutions;
Ensuring all activities are carried out properly by the right
people and in time;
Using lessons from one project experience on to another;
and
Determining whether the way the project was planned is
the most appropriate way of solving the problem at hand.
8
Earned Value Analysis as a tool of Monitoring
9
Evaluation
Evaluation: to establish the merit, significance, worth or
value of something.
It is a systematic, objective, deliberate, purposeful, critical,
trustworthy & ethical assessment of a project/program
It assesses: relevance, coherence, efficiency, effectiveness,
impact & sustainability
Interim evaluations: first review of progress, prognosis
of likely effects, and a way to identify necessary
adjustments in project design
Terminal evaluations: evaluate the project‘s effects and
potential sustainability (done at the end for project
completion reports)
10
Evaluation
• Evaluation is: “A periodic assessment of the relevance,
performance, efficiency, effectiveness, impact,
sustainability and relevance of a project in the context of
stated objectives.
• It is usually undertaken as an independent examination
with a view to drawing lessons that may guide future
decision-making” (European Commission).
11
Monitoring vs Evaluation
Monitoring: What are we doing?
Tracking inputs and outputs to assess whether programs
are performing according to plans
12
Monitoring vs Evaluation
Differences between monitoring and evaluation
Monitoring Evaluation
Frequency Regular Episodic
Main action Keeping track, oversight Assessment, more analytical
Basic purpose Improve efficiency, adjust Improve effectiveness, impact,
work plan future programming
Focus Inputs, outputs, process Effectiveness, relevance, impact,
outcomes, work plans cost-effectiveness
Information Routine systems, field Same as for monitoring, as well as
sources observations, progress surveys and studies
reports, rapid assessments
Undertaken by Program managers, Program managers, supervisors,
community workers, funders, external evaluators,
community (beneficiaries), community (beneficiaries)
supervisors, funders
16
Participatory Monitoring & Evaluation
• “It is a process of collaborative problem-solving through the
generation and use of knowledge. It is a process that leads to
corrective action by involving all levels of stakeholders in shared
decision-making.”
• It is a collaborative process that involves stakeholders at
different levels working together to assess a project or policy,
and take any corrective measure required
• Key Principles:
– Local people are active participants-not just sources of info
– Stakeholders evaluate, outsiders facilitate
– Focus on building stakeholder capacity for analysis and
problem solving
– Process builds commitment to implementing any
recommended corrective actions
Participatory Monitoring and Evaluation
Conventional M &E Participatory M &E
Who? External experts Stakeholders, including
communities and project staff;
outsiders:- facilitate
19
Who conducts M&E….?
Project
implementer
Stakeholders
Beneficiaries
Participatory process
Who needs and uses an M&E Information?
24
M and E
• What do we mean by?
– M and E System
– M and E Plan
– M and E Framework
• In most cases an M&E system refers to all the indicators, tools
and processes that you will use to measure if a program has
been implemented according to the plan (monitoring) and is
having the desired result (evaluation).
• An M&E system is often described in a document called an
M&E plan.
• An M&E framework is one part of an M&E plan,
which describes how the whole M&E system for the program
works.
25
Components of M&E Plan
• M and E plan includes:
– the what and how (clear & measurable objective
statements, indicators).
– M and E framework,
– roles & responsibilities (institutional arrangements),
– indicators & indicator reference sheets,
– budget, and
– M and E activity and data collection plan (annual, bi-
annual or quarterly), and proposal on how findings will be
an input for decisions.
26
M and E Plan
• Seven Steps for Developing an M&E Work Plan
1) Identify program goals and objectives
2) Determine M&E questions, indicators and their feasibility
3) Determine M&E methodology for monitoring the process
and evaluating the effects
4) Resolve implementation issues – who will conduct the
monitoring and evaluation? How will existing M&E data and
data from past evaluation studies be used?
5) Identify internal and external M&E resources and capacity
6) Develop an M&E work plan and timeline
7) Develop a plan to disseminate and use evaluation findings
27
M and E Framework
• Steps in Developing an M and E Framework
1. Getting started: part of the Logframe
2. Choosing indicators
3. Defining each indicator
4. Identifying the data source, frequency, responsibility and
reporting. This helps keep the consistency when indicators are
measured at different times.
5. Measuring the baseline and setting the target
28
M and E Framework
29
Results-based M&E
30
Guiding Principles of M & E
M&E is guided by the following key principles:
1. Systematic Inquiry – Staff conduct site-based inquiries
that gather both quantitative and qualitative data in a
systematic and high-quality manner
2. Honesty/Integrity – Staff display honesty and integrity in
their own behavior and contribute to the honesty and
integrity of the entire M&E process
3. Respect for People – Staff respect the security, dignity,
and self-worth of respondents, program participants,
clients, and other M&E stakeholders
4. Responsibilities to Stakeholders – Staff members
articulate and take into account the diversity of different
stakeholders’ interests and values that are relevant to
project M&E activities
31
Theory of Change vs Logframe
32
Theory of Change
33
Theory of Change
• ToC provides a comprehensive description of how and why a
desired change is expected to happen in a certain context.
• ToC takes the complexity and dynamic nature of a context into
account.
• It analyses the many facets involved and examines the
relationships between them.
• As such, it doesn’t force you to simplify the road to change in
a limited number of steps or to see it as a singular path of
cause and effect.
• TOC can help setting up a dialogue between stakeholders.
Once the ToC has been formulated (diagram and/or text) it
can be used to communicate your work clearly to other
colleagues, partners, donors, etc.
34
Theory of Change
• ToC gives you a better understanding of the links between
activities (or projects) and the change you want to achieve
(the goals).
• It allows you to understand what is necessary on top of / next
to your own efforts and how other factors can strengthen or
hinder your activities.
• As such, it fills in the gap (sometimes called the Missing
Middle) between a program or change initiative (your
activities) and the ultimate goals in terms of (societal) change.
• ToC makes assumptions and other unsaid things explicit.
• By explicitly dealing with long-held assumptions, Theory of
Change thinking can also support innovation and ‘out of the
box’ thinking
35
Theory of Change
• Because of this clearer understanding of the complete
picture, you will be able to do better planning – although ToC
doesn’t give you a ready-made planning method.
• ToC allows for better evaluation, because progress is
expressed in terms of the realization of the different
outcomes that are necessary to achieve the goals (rather than
just monitoring the progress on outputs of activities).
• ToC describes the story of how change is expected to happen
and because this is clearly described it is a good base to
assess any long-term change.
• However ToC doesn’t provide you with clear instructions on
how to do this impact assessment: you will still have to
identify the appropriate impact assessment/evaluation
methodologies.
36
Theory of Change
• Gives the big picture, including issues related to the
environment or context that you can’t control.
• Shows all the different pathways that might lead to change,
even if those pathways are not related to your program.
• Describes how and why you think change happens.
• Could be used to complete the sentence “if we do X then Y
will change because…”.
• Is presented as a diagram with narrative text.
37
Theory of Change
38
Logframe
• Gives a detailed description of the program showing how the
program activities will lead to the immediate outputs, and
how these will lead to the outcomes and goal (the
terminology used varies by organization).
• Could be used to complete the sentence “we plan to do
X which will give Y result”.
• Is normally shown as a matrix, called a logframe. It can also be
shown as a flow chart, which is sometimes called a logic
model.
• Is linear, which means that all activities lead to outputs which
lead to outcomes and the goal – there are no cyclical
processes or feedback loops.
• Includes space for risks and assumptions, although these are
usually only basic. Doesn’t include evidence for why you think
one thing will lead to another.
• Is mainly used as a tool for monitoring and evaluation 39
Logframe
41
Basic Logic Model
What should a M&E System Measure?
School enrollment
rates
Number of schools
built; textbooks, etc.
Building of schools
Distribution of textbooks,
etc.
Spending on primary
education
44
Logframe
45
Setting Objectives & Developing Indicators
53
Impact pathways evaluation
• Based on program theory evalaution and the logframe
– An explicit theory or model of how a project will or has
brought about impact
– Consists of a sequenced hierarchy of outcomes
– Represents a set of hypotheses about what needs to
happen for the project output to be transformed overtime
into impact on highly aggregated development indicators
– Can be highly complementary to conventional assessments
• Advantages of this approach
– Consideration of wider impact helps achieve impact
– Complements conventional economic assessment
54
Impact pathways evaluation
Two main phases in impact pathway evaluation
1st phase: using program theory evaluation to guide self-
monitoring and self-evaluation to establish the direct
benefits of the project outputs in its pilot site(s).
• Task: to develop a theory or model of how the project
sees itself achieving impact (called an impact pathway)
• Identifies steps the project should take to scale-out and -
up
– Scale-out: innovation spread from farmer to farmer or
from household to household within same stakeholder
groups
– Scale-up: an institutional expansion from grassroots
organizations to policymakers, donors, development
institutions, and other stakeholders to build an
enabling enviornment for change
55
Impact pathways evaluation
56
Impact pathways evaluation
2nd phase in impact pathway evaluation: An
independent ex-post impact assessment is carried out
some time (normally several years) after the project
has finished
57 57
Impact pathways evaluation
• Answers to the following questions are recorded in a
matrix for each identified outcome in the impact pathway:
– What would success look like?
– What are the factors that influence the achievement
of each outcome?
– Which of these can be influenced by the project?
– What is the program currently doing to address these
factors to bring about this outcome?
– What performance information should we collect?
– How can we gather this information?
58
Classification axes: the indicator axis
• Conceptual framework for helping guide the design of
an evaluation from Habicht et al. (1999)
• An evaluation may be aimed at 1+ categories of
decision makers, so the design must take into account
their different needs
• First axis refers to the indicators: whether one is
evaluating the performance of the intervention delivery
or its impact on indicators
• Second axis refers to the type of inference to be
drawn: the confidence level of the decision maker that
any observed effects were due to intervention
59
Classification axes: the indicator axis
• Indicators of provision, utilization, coverage and
impact… what is to be evaluated? What types of info is to
be sought?
• Outcomes of interest (“indicators”):
1. Provision: services must be provided (available and
accessible to the target pop. and of adequate quality)
2. Utilization: the population must accept and make use
of the services
3. Coverage: utilization will result in a given population
coverage, which represents the interface between
service delivery and outreach to the population
4. Impact: coverage may lead to an impact
• Choose indicators based on decision makers and cost
60
Classification axes: the indicator axis
• If a weak link is discovered, investigate why
• An impact can be expected only when the correct service is
provided in a timely manner and it is properly utilized by a
sufficiently large number of beneficiaries
• Example: project offering loans to smallholders with the
objective of increasing fertilizer use:
1. Provision: measure the availability of the loans to
smallholders,
2. Utilization: measure the disbursement of the loans to
smallholders,
3. Coverage: measure the proportion of smallholders that
have been able to take out a new loan, and
4. Impact: measure the impact of the project on the fertilizer
use.)
61
Types of inference
• Second classification axis: how confident decision
makers need to be that any observed effects are due to the
project or program,
• Both performance and impact evaluations may include
adequacy, plausibility or probability assessments as the
types of inference.
62
Types of inference
• There are 3 types of statements reflecting different
degrees of confidence end-users may require from the
evaluation results:
1) Adequacy assessment:
63
Types of inference
2) Plausibility assessment:
64
Types of inference
3) Probability assessment:
65
Types of inference: Probability
H0: no effect
Reject H0 if p-value is < alpha, 0.05
H1: effect
Fail to reject H0 if p-value is ≥ alpha, 0.05
66
Internal and External Validity
Internal Validity
• Internal validity: the credibility or reliability of an estimate of
project impact conditional on the context in which it was
carried out.
It means a strong justification that causally links the
independent variables to the dependent variables with the
ability to eliminate confounding variables within the study
Laboratory “true experiments” have high internal validity,
but may have weak external validity
Focus: whether observed changes are attributed to the
program and not to other possible causes.
67
Internal Validity
• Threats to internal validity
History: Did a current event affect the change in Y?
o Ex. In a short experiment designed to investigate the effect of
computer-based instruction, students missed some
instruction because of a power failure at the school)
Maturation: Were changes in Y due to normal development
process?
o The performance of first graders in a learning experiment
begins decreasing after 45 minutes because of fatigue.
Statistical regression: Were differences between the two groups
that could influence Y controlled for?
o In an experiment involving reading instruction, subjects
grouped because of poor pre-test reading scores show
considerably greater gain than do the groups who scored
average and high on the pre-test.
68
Internal Validity
• Threats to internal validity
Selection: refers to selecting participants for the various groups in
the study. Are the groups equivalent at the beginning of the study?
Were subjects self-selected?
o The experimental group in an instructional experiment
consisted of a high-ability class, while the comparison group
was an average- ability class.
Experimental mortality: Did some subjects drop out?
o In a health experiment designed to determine the effect of
various exercises, those subjects who find the exercise most
difficult stop participating.
Testing: Did a pre-test affect the post-test?
o In an experiment in which performance on a logical reasoning
test is the dependent variable, a pre-test clues the subjects
about the post-test.
69
Internal Validity
• Threats to internal validity
Instrumentation: did the measurement method change
during the research?
o Two examiners for an instructional experiment
administered the post-test with different instructions and
procedures.
Design contamination: did the control group find out about
the experimental treatment or were influenced?
o In an expectancy experiment, students in the experimental
and comparison groups “compare notes” about what they
were told to expect.
70
External Validity
• External validity: credibility or reliability of an estimate of
project impact when applied to a context different from
the one in which the evaluation was carried out.
It means the ability to generalize the study to other
areas,
Participatory methods have little, if any, external
validity,
• Inferences about a causal relation between two variables
have external validity if they may be generalized from the
unique and idiosyncratic settings, procedures and
participants to other populations and conditions,
• Factors to control internal validity may limit external
validity of the findings as well as using people from a
single geographic location who are volunteers
71
External Validity
Three often occuring issues which threaten the validity of
a randomized experiment or a quasi-experiment are:
Attrition
When some members of the treatment and/or control
group drop out from the sample
Spillover
When the program impact is not confined to the
program participants
Especially a concern when the program impacts a lot
of people or when the program provides a public good
Noncompliance
When some members of the treatment group do not
receive treatment or receive treatment improperly
72
Effectiveness Evaluation
Baseline.. End line
Baseline: This is the measurement of the initial conditions
(appropriate indicators) before the start of a
project/program.
Using baseline data is very common
E.g. recording your weight prior to a diet to monitor your
progress & later determine whether your diet made any
difference.
Baseline.. End line…
Baseline data provides a historical point of reference to:
1) Inform program/project planning, such as target
setting, and
2) Monitor and evaluate change for implementation &
impact assessment
Midterm evaluation and/or reviews:
Descriptive analysis
81
Inferential statistics
Inferential statistics (ttest, chi test-tabulate var1 var2, chi2 exact)
Use bweight data
The HR manager wants to know if a particular training
program had any impact in increasing the motivation level
of the employees.
82
Impact Evaluation
Why do development programs and policies are designed
and implemented?
Because change is needed in livelihood outcomes
To check whether these developmental outcomes are
achieved, we should do impact evaluation
The common practice at the project or program level is to
monitor and assess inputs and immediate outputs of a
program
But for evidence-based policy making, rigorous impact
evaluation is needed
So, the current move is towards measuring outcomes and
impacts in addition to inputs and processes
83
The evaluation problem
86
The logframe/ZOPP approach
• Indicators need to be structured to match the analysis of
problems the project is trying to overcome
• Logical framework/logframe/ZOPP approach:
– Is used to define inputs, outputs, timetables,
success assumptions and performance indicators
– Postulates a hierarchy of objectives for which
indicators are required
– Identifies problems the project cannot deal with
directly (risks)
87
Attribution Gap
• Objective: increase growth through MSEs
• Indicator: employment in MSEs
91
Problems in Impact Evaluation
92
Problems in Impact Evaluation
94
Treatment and selection effects
95
Estimating the Counterfactual
• On a conceptual level, solving the counterfactual problem
requires the evaluator to identify a “perfect clone” for
each program participant
96
Estimating the Counterfactual
• On a conceptual level, solving the counterfactual problem
requires the evaluator to identify a “perfect clone” for
each program participant
97
Estimating the Counterfactual
Specifically, the treatment & comparison groups must
be the same in at least 3 ways:
Type B HHs
Observed
difference
Y1 Y2 Treatment Period Y3 Y4
Years
Basic Problem of Impact Evaluation:
Selection Bias
Malaria Participants are often different than Non-participants
Rates
Type B HHs
Selection Bias
Observed
difference
Y1 Y2 Treatment Period Y3 Y4
Years
Problems in Impact Evaluation: effect heterogeneity
102
i 1TiYi
N
1 T Y
N
̂ naive i 1 i i
(1 T )
N N
11 i
T 11 i
Problems in Impact Evaluation
103
107
Randomized Control Trials
108
Experimental and quasi-experimental
designs
• Randomization: use randomization to obtain the
counterfactual = “the gold standard” by some:
Eligible participants are randomly assigned to a treatment
group who will receive program benefits while the control
group consists of people who will not receive program
benefits,
The treatment and control groups are identical at the
outset of the project, except for participation in the
project.
• Quasi-experimental designs: use statistical/non-experimental
research designs to construct the counterfactual.
109
(a) Experimental designs: 4 methods of randomization
in Randomized Controlled Trials (RCTs):
1) Oversubscription method:
Units are randomly assigned to the treatment and control groups
and everyone has an equal chance,
Appropriate when there is no reason to discriminate among
applicants and when there are limited resources or
implementation capacity (demand > supply of program),
Ex.: Colombia in the mid-1990s, the lottery design was used to
distribute government subsidies, which were vouchers to
partially cover the cost of private secondary school to eligible
students.
2) Randomized order of phase-in:
Randomize the timing of receiving the program,
Appropriate when program is designed to cover the entire
eligible population and there are budget/administrative
contraints.
110
(a) Experimental designs: 4 methods of randomization in
Randomized Controlled Trials (RCTs):
3) Within group randomization:
Provide the program to some subgroups in each area,
One of its problems is that it increases the likelihood that
the comparison group is contaminated.
4) Encouragement design:
Offer incentives to a randomly selected group of people,
Appropriate when everyone is eligible and enough funding,
The remaining population without the incentives is used as
the control group,
Challenge: the probability of participating is not 1 or 0 from
the encouragement or lack of encouragement.
111
(a) Experimental designs
Why RCTS?
To analyze whether an intervention had an impact, a
counterfactual is needed because it is Hard to ask
counterfactual questions (Ravallion, 2008),
Randomization guarantees statistical independence of the
intervention from preferences (observed and
unobserved)
Overcome selection bias of individuals receiving the
intervention.
Internal validity is high,
Involves less rigorous econometric approaches,
Led by MIT Poverty Action Lab and World Bank,
It is criticized by others (see Rodrik, 2008),
112
(a) Experimental designs
• Potential drawbacks of RCTs:
Evaluator must be present at a very early stage,
Intended random assignments can be compromised,
External validity,
Political influences on where to have the intervention,
Site effects (aspects of a program’s setting, such as
geographic or institutional aspects, interact with the
treatment),
Tendency to estimate abstract efficacy,
Impractically of maintaining treatment and control groups,
Are not possible for some policy questions.
113
(a) Experimental designs
Conclusions on RCTs:
RCTs can provide evidence before expanding programs, but
need to weigh pros and cons,
114
Field experiments
115
Key assumptions: randomized
assignment
116
Randomized Assignment & Treatment
Effects
117
Estimating Impact under Randomized Assignment
• s
118
Example: PROGRESA comparison groups
Treatment Control
villages villages
Eligible
A B
households
Non-eligible
C D
households
119
ATE, ITT & TOT in Experimental Designs
120
ATE, ITT & TOT (ATET) in Experimental Designs
121
(b) Quasi-experimental designs
Impact assessment techniques:
Propensity score matching: Match program participants
with non-participants, typically using individual observable
characteristics,
Difference-in-differences/double difference: Observed
changes in outcome before and after for a sample of
participants and non-participants,
Regression discontinuity design: Individuals just on other
side of the cut-off point = counterfactual,
Instrumental variables: When program placement is
correlated with participants' characteristics, need to correct
bias by replacing the variable characterizing the program
placement with another variable.
122
A) Propensity Score Matching
The evaluation problem: recap
Evaluation question: what is the effect of a programme?
124
The evaluation problem: recap
125
e.g. For an unemployed undertaking counselling the outcome variable
Y1 is realized, and is thus the factual outcome. The counterfactual
outcome for the same individual represents the labour market state Y0
s/he would have, had s/he not undertaken counselling.
127
Unconfounded assignment
128
Identification
129
Intuition of Matching:The Perfect Clone
Beneficiary Clone
130
Intuition of Matching:The Perfect Clone
Treatment Comparison
131
Principle of Matching
132
Matching treated to controls
133
Propensity score matching (1)
134
Propensity score matching (2)
135
Propensity score matching: overview
average
136
Estimating the propensity score
137
Range of common support
138
Range of common support
139
Range of common support
0
140
Matching methods
141
Nearest neighbour matching (1)
142
Nearest neighbour matching (2)
143
Which neighbour?
We can match to more than one neighbour
5 nearest neighbours? Or more?
Radius matching: all neighbours within specific range
Kernel matching: all neighbours, but close neighbours have larger
weight than far neighbours.
Best approach?
Look at sensitivity to choice of approach
How many neighbours?
Using more information reduces bias
Using more control units than treated increases precision
But using control units more than once decreases precision
144
Example: effect of health insurance on health
spending
Insured Treatment
group
Similar
Initial OOP budget
Propensity
sample Score share
Not insured
Control
group
145
Nearest Neighbour Matching
146
Caliper matching
147
Kernel matching
148
Weighting by the propensity score
149
Propensity Score Matching
What are propensity scores for?
We want to know the effect of something
We do not have random assignment
We do not have on pre-project characteristics that
determined whether or not the individuals received the
treatment
Example
An NGO has built clinics in several villages
Villages were not selected randomly
We have data on village characteristics before the project
was implemented
What is the effect of the project on infant mortality?
150
Propensity Score Matching
What is the effect of the project on infant mortality?
T imrate
treated 10 The easiest and straightforward answer to this
treated 15 question is to compare average mortality rates in the
two groups
treated 22
treated 19 (10+15+22+19)/4-(25+19+4+8+6)/5= 4.1
control 25
control 19
What does this mean? Does it mean that clinics have
control 4 increased infant mortality rates?
control 8
NO!
control 6 Pre-project characteristics of the two groups is very
important to answer the above question
151
Propensity Score Matching
T imrate povrate pcdocs
treated 10 0.5 0.01
treated 15 0.6 0.02
treated 22 0.7 0.01
treated 19 0.6 0.02
control 25 0.6 0.01
control 19 0.5 0.02
control 4 0.1 0.04
control 8 0.3 0.05
control 6 0.2 0.04
How similar are the treated and control groups?
On average, the treated group has higher poverty rate and few doctors per capita
152
Propensity Score Matching
The Basic Idea
1. Create a new control group
For each observation in the treatment group, select the
control observation that looks most like it based on the
selection variables (aka background characteristics)
2. Compute the treatment effect
Compare the average outcome in the treated group with
the average outcome in the control group
153
Propensity Score Matching
Macth using Macth using
S. No T imrate povrate pcdocs povrate pcdocs
1 treated 10 0.5 0.01
2 treated 15 0.6 0.02
3 treated 22 0.7 0.01
4 treated 19 0.6 0.02
5 control 25 0.6 0.01
6 control 19 0.5 0.02
7 control 4 0.1 0.04
8 control 8 0.3 0.05
9 control 6 0.2 0.04
• Take povrate and pcdocs one at a time to match the treated group
with that of the control one
• Then take the two at a time. What do you observe?
154
Propensity Score Matching
Predicting Selection
What is propensity score?
Propensity score is the conditional probability that an individual
chooses the treatment
Which model do we use to estimate pscores?
1 xxxxxxxxxxx • Linear model cannot be fitted because the
score will be more than 1 and greater than
Ti zero
0 xxxxxxxxxxx
• So, we use limited dependent variable model: logit or probit as indicated in the
graph
• We consider two conditions: CIA and Propensity score theorem
• CIA: outcomes are independent of treatment assignment given xi
• Whereas propensity score theorem states that outcomes are independent of
treatment assignment given propensity score, i.e., p(xi)
155
Propensity Score Matching
Predicting Selection
How do we actually match treatment observations to
control groups?
In stata, we use logictic or probit regression to predict:
Prob(T=1/X1, X2,…,Xk)
In our example, the X variables are povrate and pcdocs
So, we run logistic regrsssion and save the predicted
probability of the treatment
We call this propensity score
The commands are:
Logistic T povrate pcdocs
Predict ps1 or any name you want the propensity score to
have
156
Propensity Score Matching Predicted probability
of treatment or
pcdoc Propensity score
S. No T imrate povrate s ps1 Match
1 treated 10 0.5 0.01 0.4165713
2 treated 15 0.6 0.02 0.7358171
3 treated 22 0.7 0.01 0.9284516
4 treated 19 0.6 0.02 0.7358171
5 control 25 0.6 0.01 0.752714
6 control 19 0.5 0.02 0.395162
7 control 4 0.1 0.04 0.0016534
8 control 8 0.3 0.05 0.026803
9 control 6 0.2 0.04 0.0070107
Exercise: Use the propensity score to match the treated group with the control
one
Find out the average treatment effect on the treated
((10+15+22+19)/4)-((19+25+25+25)/4)=-7
157
Propensity Score Matching
How do we know how well matching worked?
1. Look at covariate balance between the treated and the
new control groups. They should be similar.
2. Compare distributions of propensity scores in the treated
and control groups. They should be similar
3. Compare distributions of the propensity scores in the
treated and original control groups
If the two overlap very much, then matching might not
work very well.
158
Propensity Score Matching
• Go to stata and let us do the exercise
• Use psm exercise 2.dta
159
Summarize: how to do PSM
160
Final comments on PSM and OLS
161
Difference-in-Difference
Methods
Difference methods
Key assumption: parallel trends
Difference-in-difference
Fixed effects
163
Identifying Assumption
• Whatever happened to the control group over
time is what would have happened to the
treatment group in the absence of the
program.
Pre Post
165
Identifying Assumption
• Whatever happened to the control group over
time is what would have happened to the
treatment group in the absence of the
program.
Effect of program using
only T & C comparison
from post-intervention
(ignoring pre-existing
differences between T &
C groups).
169
Differences-in-Differences
First application DID: John Snow (1855)
Cholera epidemic in London in mid-nineteenth century
Prevailing theory: “bad air”
Snow’s hypothesis: contaminated drinking water
Compared death rates from cholera in districts served by
two water companies
In 1849 both companies obtained water from the dirty
Thames
In 1852, one of them moved water works upriver to an area
free of sewage
Death rates fell sharply in districts served by this water
company!
170
What does panel data add?
171
Example: effect of health insurance on health spending
Before After
OOP budget
share
OOP budget
share
172
Observed and potential outcomes
173
Difference-in-Differences
174
Difference-in-Differences
The DID estimate is clearly presented in a 3 x 3 table
175
Difference-in-Differences
The DID estimate is clearly presented in a 3 x 3 table
176
Example: effect of health insurance on health spending
Insured Treatment
group
Similar
Initial OOP budget
Propensity
share
sample Score
Not insured
Control
group
177
Difference-in-Difference
When can we use diff-in-diff?
We want to evaluate a program, project, intervention
We have treatment and control groups
We observe them before and after treatment
But:
Treatment is not random
Other things are happening while the project is in effect
We cannot control for all the potential confounders
Key Assumptions:
Trend in control group approximates what would have
happened in the treatment group in the absence of the
treatment
178
Difference-in-Difference
179
DiD using stata
the use of the DID method can remove bias from the
unmeasured pre-program covariates, assuming that the
comparison groups exhibit the same trend over time in the
absence of the program
Difference-in-Difference
Assume that there was a free lunch program in Place A.
The free lunch was assumed to improve student outcomes
D1 = Dprogram + Dtrend
Place A 2010
2008
D4 D2 D3
D3= Dprogram + Ddifference because of factors other than the program 181
Difference-in-Difference
Y Pre (2008) Post (2010) Diff
Treated (A) 20 90 70
Control (B) 30 70 40
100
Diff -10 20 Treated (A)
30
80
Control (B)
60
40
20
182
Difference-in-Difference: data
Name Y (score) Dtr Dpost
1 40 0 0
2 80 1 1
3 20 0 0
4 100 1 1
5 30 0 0
6 0 1 0
7 60 0 1
8 40 1 0
9 60 0 1
10 90 0 1
Dtr= dummy variable with a value of 1 if individuals are in the treated group (A)
Dpost= time dummy variable with a value of 1 if individuals take the test in 2010
(post)
183
Difference-in-Difference: data
Y (score) Dpost =0 Dpost =1
(pre; 2008) (post; 2010)
Dtr=0
(Control) 0 0 1
Dtr=1
(Treated) 0 2 0 1 2 3
Difference
184
Difference-in-Difference: using stata
185