0% found this document useful (0 votes)
95 views183 pages

Project M and E New

Uploaded by

tesfayeme14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views183 pages

Project M and E New

Uploaded by

tesfayeme14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 183

Project Monitoring and

Evaluation

Yom Postgraduate College

Addis Ababa, Ethiopia

October, 2021
Discussion point
• What did you do this morning after you got dressed?
• Did you shave/put on make-up this morning?
• If yes, did you do so while looking in a mirror?
• What is the difference between shaving/putting on make-
up with and without looking in the mirror?
• Relate this to M and E.

2
Monitoring and Evaluation
 The Current Context:
 increasing call for results & value for money
 shrinking budgets
 yet increasing complexity in program/project settings
 So, what have M & E to do with these issues?
 We need monitoring & evaluation to:
 identify projects that better meet needs & lead to
improvements in targeted social, economic &
environmental conditions
 improve decision-making
 achieve more outcomes
 tell stories clearly in an evidence-based manner
 enhance organizational learning
3
Monitoring
 Monitoring: is an internal project activity
 It is the continuous assessment of project
implementation in relation to:
 schedules
 input use
 infrastructure
 services
 Monitoring:
 provides continuous feedback
 identifies actual or potential successes/ problems

4
Monitoring
 Monitoring is very important in project planning and
implementation. It is like watching where you are going
while riding a bicycle; you can adjust as you go along and
ensure that you are on the right track.

5
Why M&E….

6
Why M&E ….
• Tracking resources
• Feedback on progress
• Improving project effectiveness
• Informing decisions
• Promoting accountability
• Demonstrating impact
• Identifying lessons learned
Why M&E ….
 Monitoring provides information that will be useful in:
Exploring the situation in the community;
Determining whether the inputs in the project are well
utilized;
Identifying problems facing the community or project and
finding solutions;
Ensuring all activities are carried out properly by the right
people and in time;
Using lessons from one project experience on to another;
and
Determining whether the way the project was planned is
the most appropriate way of solving the problem at hand.

8
Earned Value Analysis as a tool of Monitoring

Knowing where you are on


schedule?

Knowing where you are on


budget?

Knowing where you are on


work accomplished?

9
Evaluation
 Evaluation: to establish the merit, significance, worth or
value of something.
 It is a systematic, objective, deliberate, purposeful, critical,
trustworthy & ethical assessment of a project/program
 It assesses: relevance, coherence, efficiency, effectiveness,
impact & sustainability
 Interim evaluations: first review of progress, prognosis
of likely effects, and a way to identify necessary
adjustments in project design
 Terminal evaluations: evaluate the project‘s effects and
potential sustainability (done at the end for project
completion reports)

10
Evaluation
• Evaluation is: “A periodic assessment of the relevance,
performance, efficiency, effectiveness, impact,
sustainability and relevance of a project in the context of
stated objectives.
• It is usually undertaken as an independent examination
with a view to drawing lessons that may guide future
decision-making” (European Commission).

11
Monitoring vs Evaluation
Monitoring: What are we doing?
Tracking inputs and outputs to assess whether programs
are performing according to plans

Evaluation: What have we achieved?


Attributing changes in outcomes to a particular
program/intervention requires one to rule out all other
possible explanations.

12
Monitoring vs Evaluation
Differences between monitoring and evaluation
Monitoring Evaluation
Frequency Regular Episodic
Main action Keeping track, oversight Assessment, more analytical
Basic purpose Improve efficiency, adjust Improve effectiveness, impact,
work plan future programming
Focus Inputs, outputs, process Effectiveness, relevance, impact,
outcomes, work plans cost-effectiveness
Information Routine systems, field Same as for monitoring, as well as
sources observations, progress surveys and studies
reports, rapid assessments
Undertaken by Program managers, Program managers, supervisors,
community workers, funders, external evaluators,
community (beneficiaries), community (beneficiaries)
supervisors, funders

Reporting to Program managers, Same as for monitoring, as well as


community workers, policy makers
community (beneficiaries),
13
supervisors, funders
Monitoring vs Evaluation
Evaluation means information on:
Strategy
 Whether we are doing the right things
– Rationale/justification
– Clear theory of change
Operation
 Whether we are doing things right
– Effectiveness in achieving expected outcomes
– Efficiency in optimizing resources
– Client satisfaction
Learning
• Whether there are better ways of doing it
– Alternatives
– Best practices
– Lessons learned
15
Planning, Monitoring & Evaluation

16
Participatory Monitoring & Evaluation
• “It is a process of collaborative problem-solving through the
generation and use of knowledge. It is a process that leads to
corrective action by involving all levels of stakeholders in shared
decision-making.”
• It is a collaborative process that involves stakeholders at
different levels working together to assess a project or policy,
and take any corrective measure required
• Key Principles:
– Local people are active participants-not just sources of info
– Stakeholders evaluate, outsiders facilitate
– Focus on building stakeholder capacity for analysis and
problem solving
– Process builds commitment to implementing any
recommended corrective actions
Participatory Monitoring and Evaluation
Conventional M &E Participatory M &E
Who? External experts Stakeholders, including
communities and project staff;
outsiders:- facilitate

What? Predetermined indicators, to Indicators identified by


measure inputs and outputs stakeholders, to measure
process as well as outputs or
outcomes

How? Questionnaire surveys, by outside Simple, qualitative or


“neutral” evaluators, distanced quantitative methods, by
from project stakeholders themselves

Why? To make project and staff To empower stakeholders to


accountable to funding agency take corrective action; mutual
accountability
But mutual accountability doesn’t have to be like this!

19
Who conducts M&E….?

 Project
implementer
 Stakeholders
 Beneficiaries

Participatory process
Who needs and uses an M&E Information?

 To improve program Managers


implementation…
Donors
 To inform and Governments
improve future Technocrats
programs
Donors
 Inform stakeholders Governments
Communities
Beneficiaries
Types of Evaluations & Questions
Operational vs. Impact Evaluation
 Operational/process evaluation relates to ensuring effective
implementation of a program in accordance with the
program’s initial objectives
 Impact evaluation (IE) is an effort to understand whether the

changes in well-being are indeed due to project or program


intervention
 Specifically, IE tries to determine whether it is possible to

identify the program effect &


 to what extent the measured effect can be attributed to the

program & not to some other causes


Operational vs Impact Evaluation…
 IE is not imperative for each and every project

 IE is time & resource intensive & should therefore be applied


selectively

 Policy makers may decide whether to carry out an IE on the


basis of the following criteria:

 The program/project intervention is innovative & of


strategic importance
 The impact evaluation exercise contributes to the
knowledge gap of what works and what does not
Impact Evaluation
 Two categories: prospective and retrospective.
 Prospective evaluations: developed at the same time as the
program is being designed and are built into program
implementation
 Baseline data are collected prior to program implementation
for both treatment and comparison groups
 Retrospective evaluations: done to assess program impact
after the program has been implemented, generating
treatment and comparison groups

24
M and E
• What do we mean by?
– M and E System
– M and E Plan
– M and E Framework
• In most cases an M&E system refers to all the indicators, tools
and processes that you will use to measure if a program has
been implemented according to the plan (monitoring) and is
having the desired result (evaluation).
• An M&E system is often described in a document called an
M&E plan.
• An M&E framework is one part of an M&E plan,
which describes how the whole M&E system for the program
works.

25
Components of M&E Plan
• M and E plan includes:
– the what and how (clear & measurable objective
statements, indicators).
– M and E framework,
– roles & responsibilities (institutional arrangements),
– indicators & indicator reference sheets,
– budget, and
– M and E activity and data collection plan (annual, bi-
annual or quarterly), and proposal on how findings will be
an input for decisions.

26
M and E Plan
• Seven Steps for Developing an M&E Work Plan
1) Identify program goals and objectives
2) Determine M&E questions, indicators and their feasibility
3) Determine M&E methodology for monitoring the process
and evaluating the effects
4) Resolve implementation issues – who will conduct the
monitoring and evaluation? How will existing M&E data and
data from past evaluation studies be used?
5) Identify internal and external M&E resources and capacity
6) Develop an M&E work plan and timeline
7) Develop a plan to disseminate and use evaluation findings

27
M and E Framework
• Steps in Developing an M and E Framework
1. Getting started: part of the Logframe

2. Choosing indicators
3. Defining each indicator
4. Identifying the data source, frequency, responsibility and
reporting. This helps keep the consistency when indicators are
measured at different times.
5. Measuring the baseline and setting the target

28
M and E Framework

29
Results-based M&E

30
Guiding Principles of M & E
M&E is guided by the following key principles:
1. Systematic Inquiry – Staff conduct site-based inquiries
that gather both quantitative and qualitative data in a
systematic and high-quality manner
2. Honesty/Integrity – Staff display honesty and integrity in
their own behavior and contribute to the honesty and
integrity of the entire M&E process
3. Respect for People – Staff respect the security, dignity,
and self-worth of respondents, program participants,
clients, and other M&E stakeholders
4. Responsibilities to Stakeholders – Staff members
articulate and take into account the diversity of different
stakeholders’ interests and values that are relevant to
project M&E activities

31
Theory of Change vs Logframe

32
Theory of Change

33
Theory of Change
• ToC provides a comprehensive description of how and why a
desired change is expected to happen in a certain context.
• ToC takes the complexity and dynamic nature of a context into
account.
• It analyses the many facets involved and examines the
relationships between them.
• As such, it doesn’t force you to simplify the road to change in
a limited number of steps or to see it as a singular path of
cause and effect.
• TOC can help setting up a dialogue between stakeholders.
Once the ToC has been formulated (diagram and/or text) it
can be used to communicate your work clearly to other
colleagues, partners, donors, etc.

34
Theory of Change
• ToC gives you a better understanding of the links between
activities (or projects) and the change you want to achieve
(the goals).
• It allows you to understand what is necessary on top of / next
to your own efforts and how other factors can strengthen or
hinder your activities.
• As such, it fills in the gap (sometimes called the Missing
Middle) between a program or change initiative (your
activities) and the ultimate goals in terms of (societal) change.
• ToC makes assumptions and other unsaid things explicit.
• By explicitly dealing with long-held assumptions, Theory of
Change thinking can also support innovation and ‘out of the
box’ thinking

35
Theory of Change
• Because of this clearer understanding of the complete
picture, you will be able to do better planning – although ToC
doesn’t give you a ready-made planning method.
• ToC allows for better evaluation, because progress is
expressed in terms of the realization of the different
outcomes that are necessary to achieve the goals (rather than
just monitoring the progress on outputs of activities).
• ToC describes the story of how change is expected to happen
and because this is clearly described it is a good base to
assess any long-term change.
• However ToC doesn’t provide you with clear instructions on
how to do this impact assessment: you will still have to
identify the appropriate impact assessment/evaluation
methodologies.

36
Theory of Change
• Gives the big picture, including issues related to the
environment or context that you can’t control.
• Shows all the different pathways that might lead to change,
even if those pathways are not related to your program.
• Describes how and why you think change happens.
• Could be used to complete the sentence “if we do X then Y
will change because…”.
• Is presented as a diagram with narrative text.

37
Theory of Change

38
Logframe
• Gives a detailed description of the program showing how the
program activities will lead to the immediate outputs, and
how these will lead to the outcomes and goal (the
terminology used varies by organization).
• Could be used to complete the sentence “we plan to do
X which will give Y result”.
• Is normally shown as a matrix, called a logframe. It can also be
shown as a flow chart, which is sometimes called a logic
model.
• Is linear, which means that all activities lead to outputs which
lead to outcomes and the goal – there are no cyclical
processes or feedback loops.
• Includes space for risks and assumptions, although these are
usually only basic. Doesn’t include evidence for why you think
one thing will lead to another.
• Is mainly used as a tool for monitoring and evaluation 39
Logframe

• A logframe shows the conceptual foundation upon which the


project’s M&E system is built

• the logframe is a matrix that specifies what the project is


intended to achieve (objectives) & how this achievement will
be measured (indicators)

• It is essential to understand the differences between project


inputs, outputs, outcomes, & impact !

• the indicators to be measured under the M&E system reflect


this hierarchy.
Indicators and targets are used at each point along the project hierarchy as illustrated
below:

41
Basic Logic Model
What should a M&E System Measure?

The Results Chain Indicative Example:


Improve literacy

School enrollment
rates

Number of schools
built; textbooks, etc.

Building of schools
Distribution of textbooks,
etc.

Spending on primary
education

Source: Adapted from ADB (2006) Introduction to Results Management, p. 7


World Bank (2001) PRSP Sourcebook, p. 108.
Logframe: questions
• What are we trying to achieve and why?
• How will we measure success? OVIs and MOV)
• What other conditions must exist? Assumptions
• How do we get there?
• N.B: Do not start driving before knowing where do
you want to go.

44
Logframe

45
Setting Objectives & Developing Indicators

• An Indicator is….a variable that measures one aspect of a


program/ project
• They tell us what we want to measure→ units of measure

• An appropriate set of indicators includes at least one


indicator per significant element of the program or project
(input, output, outcome, impact).
• It measures the value of the change in units that are
significant for the management of the program & comparable
to past & future units & values
• indicators enable :

• reducing a large amount of data down to its simplest form


(e.g. % of clients who tested after receiving pre-test counselling,
prevalence rate; stunt rate …)

• When related to targets, indicators can signal:

• the need for corrective management action,


• evaluate the effectiveness of various management actions,
• provide evidence as to whether objectives are being achieved.
An indicator can be a:
• Number
• Ratio
• Percentage
• Average
• Index (composite of indicators)
Steps in Selecting Indicators

Step 1: Clarify the Results Statements

• Identify what needs to be measured

• Good indicators start with good results statements. Start with


the overall objective or goal and work backwards.

Step 2: Develop a List of Possible Indicators

• Brainstorm indicators at each level of results. Use: Internal


brainstorming (involvement)

• Consultation with references (experts, documents)


• Experience of other similar organizations
Step 3: Assess Each Possible Indicator
1) Measurable (can be quantified & measured by some scale).
2) Practical (data can be collected on a timely basis & at reasonable
cost)

3) Reliable (can be measured repeatedly with precision by different


people)

4) Relevant-Attributable (the extent to which a result is caused by


YOUR activities).

5) Management Useful (project staff & audiences feel the


information provided by the measure is critical to decision-
making)
Step 3: Assess Each Possible Indicator …..

6) Direct (the indicator closely tracks the result it is intended to


measure)

7) Sensitive (serves as an early warning of changing conditions)

8) Capable of being Disaggregated (data can be broken down by


gender, age, location, or other dimension where appropriate)
Step 4: Select the “Best” Indicators

• Based on your analysis, narrow the list to the final indicators


that will be used in the monitoring system

• They should be the optimum set that meets management


needs at a reasonable cost

• Limit the number of indicators used to track each objective or


result to a few (two or three)

• Remember your target audiences (information users)


Self-check Exercise
• Assume that an organization firmly believes that the
livelihood of the poor is improved through enhancing their
capacity through trainings and provision of seed money to
help them get involved in Income-Generating Activities (IGAs).
Based on this information, develop theory of change &
logframe
Hierarchy Summary OVI MoVs Assumptions
Goal/Impact
Outcome
Output
Activities
Inputs

53
Impact pathways evaluation
• Based on program theory evalaution and the logframe
– An explicit theory or model of how a project will or has
brought about impact
– Consists of a sequenced hierarchy of outcomes
– Represents a set of hypotheses about what needs to
happen for the project output to be transformed overtime
into impact on highly aggregated development indicators
– Can be highly complementary to conventional assessments
• Advantages of this approach
– Consideration of wider impact helps achieve impact
– Complements conventional economic assessment

54
Impact pathways evaluation
Two main phases in impact pathway evaluation
1st phase: using program theory evaluation to guide self-
monitoring and self-evaluation to establish the direct
benefits of the project outputs in its pilot site(s).
• Task: to develop a theory or model of how the project
sees itself achieving impact (called an impact pathway)
• Identifies steps the project should take to scale-out and -
up
– Scale-out: innovation spread from farmer to farmer or
from household to household within same stakeholder
groups
– Scale-up: an institutional expansion from grassroots
organizations to policymakers, donors, development
institutions, and other stakeholders to build an
enabling enviornment for change
55
Impact pathways evaluation

56
Impact pathways evaluation
 2nd phase in impact pathway evaluation: An
independent ex-post impact assessment is carried out
some time (normally several years) after the project
has finished

– Begins by establishing the extent to which the


impact pathway was valid in the pilot site(s) and
the extent to which scaling occurred

– An attempt to bridge the attribution gap, using


phase 1 results as a foundation

57 57
Impact pathways evaluation
• Answers to the following questions are recorded in a
matrix for each identified outcome in the impact pathway:
– What would success look like?
– What are the factors that influence the achievement
of each outcome?
– Which of these can be influenced by the project?
– What is the program currently doing to address these
factors to bring about this outcome?
– What performance information should we collect?
– How can we gather this information?

58
Classification axes: the indicator axis
• Conceptual framework for helping guide the design of
an evaluation from Habicht et al. (1999)
• An evaluation may be aimed at 1+ categories of
decision makers, so the design must take into account
their different needs
• First axis refers to the indicators: whether one is
evaluating the performance of the intervention delivery
or its impact on indicators
• Second axis refers to the type of inference to be
drawn: the confidence level of the decision maker that
any observed effects were due to intervention

59
Classification axes: the indicator axis
• Indicators of provision, utilization, coverage and
impact… what is to be evaluated? What types of info is to
be sought?
• Outcomes of interest (“indicators”):
1. Provision: services must be provided (available and
accessible to the target pop. and of adequate quality)
2. Utilization: the population must accept and make use
of the services
3. Coverage: utilization will result in a given population
coverage, which represents the interface between
service delivery and outreach to the population
4. Impact: coverage may lead to an impact
• Choose indicators based on decision makers and cost

60
Classification axes: the indicator axis
• If a weak link is discovered, investigate why
• An impact can be expected only when the correct service is
provided in a timely manner and it is properly utilized by a
sufficiently large number of beneficiaries
• Example: project offering loans to smallholders with the
objective of increasing fertilizer use:
1. Provision: measure the availability of the loans to
smallholders,
2. Utilization: measure the disbursement of the loans to
smallholders,
3. Coverage: measure the proportion of smallholders that
have been able to take out a new loan, and
4. Impact: measure the impact of the project on the fertilizer
use.)

61
Types of inference
• Second classification axis: how confident decision
makers need to be that any observed effects are due to the
project or program,
• Both performance and impact evaluations may include
adequacy, plausibility or probability assessments as the
types of inference.

62
Types of inference
• There are 3 types of statements reflecting different
degrees of confidence end-users may require from the
evaluation results:

1) Adequacy assessment:

 Determines whether some outcome occured, as


expected,

 This assessment is relevant for evaluating


process indicators (provision, utilization,
coverage),

 For this, no control group is needed.

63
Types of inference
2) Plausibility assessment:

 Permit determination of whether change can be


attributed to the project,

 Here control group needs to be used, internal or


external,

 Note: selection bias (control groups often do not


exhibit identical characteristics to the beneficiary
group).

64
Types of inference
3) Probability assessment:

 Ensures there is a small, known probability that


differences between project and control areas
were due to confounding, systematic bias/chance.

65
Types of inference: Probability
 H0: no effect
 Reject H0 if p-value is < alpha, 0.05
 H1: effect
 Fail to reject H0 if p-value is ≥ alpha, 0.05

We reject H0 We fail to reject


H0
H0 in reality Type I error No error made
true (chance is alpha)
H0 in reality No error made Type II error
false (chance is Beta)

66
Internal and External Validity
Internal Validity
• Internal validity: the credibility or reliability of an estimate of
project impact conditional on the context in which it was
carried out.
 It means a strong justification that causally links the
independent variables to the dependent variables with the
ability to eliminate confounding variables within the study
 Laboratory “true experiments” have high internal validity,
but may have weak external validity
 Focus: whether observed changes are attributed to the
program and not to other possible causes.

67
Internal Validity
• Threats to internal validity
 History: Did a current event affect the change in Y?
o Ex. In a short experiment designed to investigate the effect of
computer-based instruction, students missed some
instruction because of a power failure at the school)
 Maturation: Were changes in Y due to normal development
process?
o The performance of first graders in a learning experiment
begins decreasing after 45 minutes because of fatigue.
 Statistical regression: Were differences between the two groups
that could influence Y controlled for?
o In an experiment involving reading instruction, subjects
grouped because of poor pre-test reading scores show
considerably greater gain than do the groups who scored
average and high on the pre-test.
68
Internal Validity
• Threats to internal validity
 Selection: refers to selecting participants for the various groups in
the study. Are the groups equivalent at the beginning of the study?
Were subjects self-selected?
o The experimental group in an instructional experiment
consisted of a high-ability class, while the comparison group
was an average- ability class.
 Experimental mortality: Did some subjects drop out?
o In a health experiment designed to determine the effect of
various exercises, those subjects who find the exercise most
difficult stop participating.
 Testing: Did a pre-test affect the post-test?
o In an experiment in which performance on a logical reasoning
test is the dependent variable, a pre-test clues the subjects
about the post-test.
69
Internal Validity
• Threats to internal validity
 Instrumentation: did the measurement method change
during the research?
o Two examiners for an instructional experiment
administered the post-test with different instructions and
procedures.
 Design contamination: did the control group find out about
the experimental treatment or were influenced?
o In an expectancy experiment, students in the experimental
and comparison groups “compare notes” about what they
were told to expect.

70
External Validity
• External validity: credibility or reliability of an estimate of
project impact when applied to a context different from
the one in which the evaluation was carried out.
 It means the ability to generalize the study to other
areas,
 Participatory methods have little, if any, external
validity,
• Inferences about a causal relation between two variables
have external validity if they may be generalized from the
unique and idiosyncratic settings, procedures and
participants to other populations and conditions,
• Factors to control internal validity may limit external
validity of the findings as well as using people from a
single geographic location who are volunteers

71
External Validity
 Three often occuring issues which threaten the validity of
a randomized experiment or a quasi-experiment are:
 Attrition
 When some members of the treatment and/or control
group drop out from the sample
 Spillover
 When the program impact is not confined to the
program participants
 Especially a concern when the program impacts a lot
of people or when the program provides a public good
 Noncompliance
 When some members of the treatment group do not
receive treatment or receive treatment improperly

72
Effectiveness Evaluation
Baseline.. End line
Baseline: This is the measurement of the initial conditions
(appropriate indicators) before the start of a
project/program.
Using baseline data is very common
E.g. recording your weight prior to a diet to monitor your
progress & later determine whether your diet made any
difference.
Baseline.. End line…
Baseline data provides a historical point of reference to:
1) Inform program/project planning, such as target
setting, and
2) Monitor and evaluate change for implementation &
impact assessment
Midterm evaluation and/or reviews:

These are important reflection events to assess and


inform ongoing project/program implementation
Baseline.. End line…
Final/terminal/end line evaluation:
This occurs after project/program completion to assess how
well the project/program achieved its intended objectives and
what difference this has made
Key Evaluation Issues: Evaluation can focus on any of the following issues:
Issue Question (s)
Programme relevance Is the intervention still relevant in relation to the
original problem? If yes how? And if not, why not?
Programme design and effectiveness To what extent has the intervention met the stated
objectives? What are the internal workings of the
project?
Programme efficiency How cost-effective were the various activities that
were implemented? Could there be an alternative
strategy to implement the intervention? If yes, how?
And if not, why not?
Programme accountability How well did the intervention adhere to established
guidelines, procedures and policies during the
implementation of the activities?
Programme impact Did the intervention result in changing the
circumstances of the beneficiary group in a
substantial way? If yes, how? And if not, why not?
Catalytic effect Did the programme result in changes that were not
anticipated?
Programme sustainability Can the intervention survive when funding ceases?
If yes, how? If not why not? Does the community
support the programme?
76
77
78
79
Comparing Results to Program
Targets
• Number of advocacy workshops conducted
• Number of training courses conducted
• Number of service visits by young adults

Descriptive analysis

Go to excel data sample (desktop)

And next slide


80
Effectiveness Analysis
Indicators Base Line Target End line Change
(%) (%) (%) (%)
Have Dish Rack 29 78
Have Garbage Pit 23 65.4
Garbage Disposal practices
Within Compound 38.6 31.3
Road Side 36.4 8.9
Anywhere 25 10.7
Where do Households take shower?
Inside Home Compound 66 39.7
River 25.6 25.9
Inside House 7.8 23.5
Source of about hygiene
Health post, health workers
77.5 93.8
&media

81
Inferential statistics
Inferential statistics (ttest, chi test-tabulate var1 var2, chi2 exact)
Use bweight data
The HR manager wants to know if a particular training
program had any impact in increasing the motivation level
of the employees.

Use Employee data

82
Impact Evaluation
 Why do development programs and policies are designed
and implemented?
 Because change is needed in livelihood outcomes
 To check whether these developmental outcomes are
achieved, we should do impact evaluation
 The common practice at the project or program level is to
monitor and assess inputs and immediate outputs of a
program
 But for evidence-based policy making, rigorous impact
evaluation is needed
 So, the current move is towards measuring outcomes and
impacts in addition to inputs and processes

83
The evaluation problem

 Impact evaluation helps policy makers decide whether


programs/projects are generating intended effects: evidence-
based policy

 Evaluation seeks to prove that changes in targets are due only


to the specific interventions undertaken

 The key cognitive question: does a treatment (participation in


a program/project) have any causal effect on the observed
outcome of the population?
The evaluation problem
 Do improved roads increase access to labor markets & raise
households’ income, and if so, by how much?
 Does class size influence student achievement, and if it does,
by how much?
 What is the impact on access to health care of contracting
out primary care to private providers?

 Along with some ad hoc examples, we will consider a running


example in order to clarify:
(i) the impact evaluation problems and
(ii) ways to respond to them (the so-called identification of
the effect)
Impact Evaluation
 What about a simple before and after analysis?
 Assume: nobody had a job before but everyone got a job after a training
 Can we attribute jobs to the training program? Why or why not?
 What if we compare people who enrolled in the program with people who
did not?
 The main challenge across different types of impact evaluation is to find a
good counterfactual—namely, the situation a participating subject would
have experienced had he or she not been exposed to the program.

86
The logframe/ZOPP approach
• Indicators need to be structured to match the analysis of
problems the project is trying to overcome
• Logical framework/logframe/ZOPP approach:
– Is used to define inputs, outputs, timetables,
success assumptions and performance indicators
– Postulates a hierarchy of objectives for which
indicators are required
– Identifies problems the project cannot deal with
directly (risks)

87
Attribution Gap
• Objective: increase growth through MSEs
• Indicator: employment in MSEs

Can this change be attributed to the project?


88
The logframe/ZOPP approach
• GTZ (GIZ) Impact model
• Attribution gap:
– Caused by the existence of too many other significant factors
– Cannot be plausibly spanned using a linear, causal bridge

Source: Douthwaite et al. (2003: 250)


89
Problems in Impact Evaluation
 Causal Inference & the Problem of the
Counterfactual
 Whether “X” (an intervention) causes “Y” (an outcome variable) is
very difficult to determine
 The main challenge is to determine what would have happened to
the beneficiaries if the intervention had not existed
Evaluation question: what is the effect of a project?
Outcome A: Outcome B:
Effect =
with programme without programme
Problem: we only observe individuals that
 participate: A B
or
 do not participate : A B
... but never A and B for everyone! 90
Definition of treatment effects

91
Problems in Impact Evaluation
92
Problems in Impact Evaluation

94
Treatment and selection effects

Here, we subtract and add


the non-treated outcome for
the treated group

95
Estimating the Counterfactual
• On a conceptual level, solving the counterfactual problem
requires the evaluator to identify a “perfect clone” for
each program participant

• Although no perfect clone exists for a single individual,


statistical tools exist

• These tools will be used to generate 2 groups of


individuals that, if their numbers are large enough, are
statistically indistinguishable from each other

96
Estimating the Counterfactual
• On a conceptual level, solving the counterfactual problem
requires the evaluator to identify a “perfect clone” for
each program participant

• Although no perfect clone exists for a single individual,


statistical tools exist

• These tools will be used to generate 2 groups of


individuals that, if their numbers are large enough, are
statistically indistinguishable from each other

97
Estimating the Counterfactual
 Specifically, the treatment & comparison groups must
be the same in at least 3 ways:

1. TG & CG must be identical in the absence of the


program. On average the characteristics of T & C
groups should be the same. E.g. the average age in the
TG should be the same as the average age in the CG

2. TG & CG should react to the program in the same


way. E.g, the incomes of units in the TG should be as
likely to benefit from training as the incomes of the CG

3. T &G groups cannot be differentially exposed to


other interventions during the evaluation period.
98
Basic Problem of Impact Evaluation:
Selection Bias

Malaria Comparing Project Beneficiaries (Type A) to


Rates
Neighbors (Type B)

Type B HHs

Observed
difference

Type A HHs with


Project

Y1 Y2 Treatment Period Y3 Y4
Years
Basic Problem of Impact Evaluation:
Selection Bias
Malaria Participants are often different than Non-participants
Rates

Type B HHs
Selection Bias
Observed
difference

True Impact Type A Households

Type A HHs with


Project

Y1 Y2 Treatment Period Y3 Y4
Years
Problems in Impact Evaluation: effect heterogeneity
102

i 1TiYi
N
 1  T Y
N

̂ naive   i 1 i i

  (1  T )
N N
11 i
T 11 i
Problems in Impact Evaluation
103

ATE Effect heterogeneity


Basic Problem of Impact Evaluation: Program
Placement
Malaria High Rainfall
rate

TRUE IMPACT: E-D


E
“Treatment group
A

2001 2002 Treatment Period 2003 2004 Years


Basic Problem of Impact Evaluation: Program
Placement
Malaria Underestimated Impact when using region B
rate
comparison group: High Rainfall
E-A > C-B : Region B
affected less by rainfall
Region B: comparison
C
B
D

TRUE IMPACT: E-D


E
“Treatment group
A

2001 2002 Treatment Period 2003 2004 Years


Methods of ex post Impact Evaluation: building the CF

1. Field Experiments/Randomized evaluations


2. Matching methods, specifically propensity score
matching (PSM)
3. Double-difference (DID) methods
4. Instrumental variable (IV) methods
5. Regression discontinuity (RD) design
6. Endogenous Switching Regression
Field Experiments/Randomized
evaluations

107
Randomized Control Trials

108
Experimental and quasi-experimental
designs
• Randomization: use randomization to obtain the
counterfactual = “the gold standard” by some:
 Eligible participants are randomly assigned to a treatment
group who will receive program benefits while the control
group consists of people who will not receive program
benefits,
 The treatment and control groups are identical at the
outset of the project, except for participation in the
project.
• Quasi-experimental designs: use statistical/non-experimental
research designs to construct the counterfactual.

109
(a) Experimental designs: 4 methods of randomization
in Randomized Controlled Trials (RCTs):

1) Oversubscription method:
 Units are randomly assigned to the treatment and control groups
and everyone has an equal chance,
 Appropriate when there is no reason to discriminate among
applicants and when there are limited resources or
implementation capacity (demand > supply of program),
 Ex.: Colombia in the mid-1990s, the lottery design was used to
distribute government subsidies, which were vouchers to
partially cover the cost of private secondary school to eligible
students.
2) Randomized order of phase-in:
 Randomize the timing of receiving the program,
 Appropriate when program is designed to cover the entire
eligible population and there are budget/administrative
contraints.
110
(a) Experimental designs: 4 methods of randomization in
Randomized Controlled Trials (RCTs):
3) Within group randomization:
 Provide the program to some subgroups in each area,
 One of its problems is that it increases the likelihood that
the comparison group is contaminated.
4) Encouragement design:
 Offer incentives to a randomly selected group of people,
 Appropriate when everyone is eligible and enough funding,
 The remaining population without the incentives is used as
the control group,
 Challenge: the probability of participating is not 1 or 0 from
the encouragement or lack of encouragement.
111
(a) Experimental designs
 Why RCTS?
 To analyze whether an intervention had an impact, a
counterfactual is needed because it is Hard to ask
counterfactual questions (Ravallion, 2008),
 Randomization guarantees statistical independence of the
intervention from preferences (observed and
unobserved)
 Overcome selection bias of individuals receiving the
intervention.
 Internal validity is high,
 Involves less rigorous econometric approaches,
 Led by MIT Poverty Action Lab and World Bank,
 It is criticized by others (see Rodrik, 2008),
112
(a) Experimental designs
• Potential drawbacks of RCTs:
 Evaluator must be present at a very early stage,
 Intended random assignments can be compromised,
 External validity,
 Political influences on where to have the intervention,
 Site effects (aspects of a program’s setting, such as
geographic or institutional aspects, interact with the
treatment),
 Tendency to estimate abstract efficacy,
 Impractically of maintaining treatment and control groups,
 Are not possible for some policy questions.

113
(a) Experimental designs
Conclusions on RCTs:
 RCTs can provide evidence before expanding programs, but
need to weigh pros and cons,

 Sample size needs to be considered,

 Understanding the context before is critical,

 Was the intervention the right one for the problem?

 Best to combine RCTs with other methods.

114
Field experiments

• Key assumptions: randomized assignment


• Treatments effects: individual and intention-to-treat
• Internal and external validity
• Examples and exercises

115
Key assumptions: randomized
assignment

116
Randomized Assignment & Treatment
Effects

117
Estimating Impact under Randomized Assignment

• s

Source: Gertler et al. (2011:61)

118
Example: PROGRESA comparison groups

Treatment Control
villages villages
Eligible
A B
households
Non-eligible
C D
households

• Impact effect : A–B


• Spill-over effects : C – D

119
ATE, ITT & TOT in Experimental Designs

120
ATE, ITT & TOT (ATET) in Experimental Designs

121
(b) Quasi-experimental designs
Impact assessment techniques:
 Propensity score matching: Match program participants
with non-participants, typically using individual observable
characteristics,
 Difference-in-differences/double difference: Observed
changes in outcome before and after for a sample of
participants and non-participants,
 Regression discontinuity design: Individuals just on other
side of the cut-off point = counterfactual,
 Instrumental variables: When program placement is
correlated with participants' characteristics, need to correct
bias by replacing the variable characterizing the program
placement with another variable.

122
A) Propensity Score Matching
The evaluation problem: recap
Evaluation question: what is the effect of a programme?

Effect = Outcome A: Outcome B:


with programme without programme

Problem: we only observe individuals that


 participate:
or A B
 do not participate :
A B

... but never A and B for everyone!


123
The evaluation problem: recap
Evaluation question: what is the effect of a programme?

Effect = Outcome A: Outcome B:


with programme without programme

Problem: we only observe individuals that


 participate: A B
or
 do not participate : A B

... but never A and B for everyone!

124
The evaluation problem: recap

125
e.g. For an unemployed undertaking counselling the outcome variable
Y1 is realized, and is thus the factual outcome. The counterfactual
outcome for the same individual represents the labour market state Y0
s/he would have, had s/he not undertaken counselling.

Hence, the identity that relates Yi , the outcome that is observed on


each population member, to the potential outcomes of the same
member is:

Source: Trivellato (2012)


Single difference estimation
 In this section we will look at single difference methods
that rely on the assumption of unconfounded assignment
Linear regression (OLS)
Matching methods
 Typically applied to cross section data
 Two crucial assumptions:
1. We observe the factors that determine selection
2. The intervention causes no spill-over effects

127
Unconfounded assignment

128
Identification

129
Intuition of Matching:The Perfect Clone

Beneficiary Clone

130
Intuition of Matching:The Perfect Clone
Treatment Comparison

 Matching identifies a control group that is as similar as


possible to the treatment group!

131
Principle of Matching

132
Matching treated to controls

133
Propensity score matching (1)

134
Propensity score matching (2)

135
Propensity score matching: overview

average

136
Estimating the propensity score

137
Range of common support

138
Range of common support

139
Range of common support
0

.125 .25 .375 .5 .625 .75 .875


Propensity Score

Untreated Treated: On support

Treated: Off support

140
Matching methods

141
Nearest neighbour matching (1)

142
Nearest neighbour matching (2)

143
Which neighbour?
 We can match to more than one neighbour
 5 nearest neighbours? Or more?
 Radius matching: all neighbours within specific range
 Kernel matching: all neighbours, but close neighbours have larger
weight than far neighbours.
 Best approach?
 Look at sensitivity to choice of approach
 How many neighbours?
 Using more information reduces bias
 Using more control units than treated increases precision
 But using control units more than once decreases precision

144
Example: effect of health insurance on health
spending

Insured Treatment
group
Similar
Initial OOP budget
Propensity
sample Score share

Not insured
Control
group

145
Nearest Neighbour Matching

146
Caliper matching

147
Kernel matching

148
Weighting by the propensity score

149
Propensity Score Matching
What are propensity scores for?
We want to know the effect of something
We do not have random assignment
We do not have on pre-project characteristics that
determined whether or not the individuals received the
treatment
Example
An NGO has built clinics in several villages
Villages were not selected randomly
We have data on village characteristics before the project
was implemented
What is the effect of the project on infant mortality?

150
Propensity Score Matching
What is the effect of the project on infant mortality?
T imrate
treated 10 The easiest and straightforward answer to this
treated 15 question is to compare average mortality rates in the
two groups
treated 22
treated 19 (10+15+22+19)/4-(25+19+4+8+6)/5= 4.1
control 25
control 19
What does this mean? Does it mean that clinics have
control 4 increased infant mortality rates?
control 8
NO!
control 6 Pre-project characteristics of the two groups is very
important to answer the above question

151
Propensity Score Matching
T imrate povrate pcdocs
treated 10 0.5 0.01
treated 15 0.6 0.02
treated 22 0.7 0.01
treated 19 0.6 0.02
control 25 0.6 0.01
control 19 0.5 0.02
control 4 0.1 0.04
control 8 0.3 0.05
control 6 0.2 0.04
How similar are the treated and control groups?
On average, the treated group has higher poverty rate and few doctors per capita

152
Propensity Score Matching
 The Basic Idea
1. Create a new control group
 For each observation in the treatment group, select the
control observation that looks most like it based on the
selection variables (aka background characteristics)
2. Compute the treatment effect
 Compare the average outcome in the treated group with
the average outcome in the control group

153
Propensity Score Matching
Macth using Macth using
S. No T imrate povrate pcdocs povrate pcdocs
1 treated 10 0.5 0.01
2 treated 15 0.6 0.02
3 treated 22 0.7 0.01
4 treated 19 0.6 0.02
5 control 25 0.6 0.01
6 control 19 0.5 0.02
7 control 4 0.1 0.04
8 control 8 0.3 0.05
9 control 6 0.2 0.04
• Take povrate and pcdocs one at a time to match the treated group
with that of the control one
• Then take the two at a time. What do you observe?
154
Propensity Score Matching
 Predicting Selection
 What is propensity score?
Propensity score is the conditional probability that an individual
chooses the treatment
 Which model do we use to estimate pscores?
1 xxxxxxxxxxx • Linear model cannot be fitted because the
score will be more than 1 and greater than
Ti zero

0 xxxxxxxxxxx

• So, we use limited dependent variable model: logit or probit as indicated in the
graph
• We consider two conditions: CIA and Propensity score theorem
• CIA: outcomes are independent of treatment assignment given xi
• Whereas propensity score theorem states that outcomes are independent of
treatment assignment given propensity score, i.e., p(xi)
155
Propensity Score Matching
Predicting Selection
How do we actually match treatment observations to
control groups?
In stata, we use logictic or probit regression to predict:
Prob(T=1/X1, X2,…,Xk)
In our example, the X variables are povrate and pcdocs
So, we run logistic regrsssion and save the predicted
probability of the treatment
We call this propensity score
The commands are:
Logistic T povrate pcdocs
Predict ps1 or any name you want the propensity score to
have

156
Propensity Score Matching Predicted probability
of treatment or
pcdoc Propensity score
S. No T imrate povrate s ps1 Match
1 treated 10 0.5 0.01 0.4165713
2 treated 15 0.6 0.02 0.7358171
3 treated 22 0.7 0.01 0.9284516
4 treated 19 0.6 0.02 0.7358171
5 control 25 0.6 0.01 0.752714
6 control 19 0.5 0.02 0.395162
7 control 4 0.1 0.04 0.0016534
8 control 8 0.3 0.05 0.026803
9 control 6 0.2 0.04 0.0070107
Exercise: Use the propensity score to match the treated group with the control
one
Find out the average treatment effect on the treated
((10+15+22+19)/4)-((19+25+25+25)/4)=-7
157
Propensity Score Matching
 How do we know how well matching worked?
1. Look at covariate balance between the treated and the
new control groups. They should be similar.
2. Compare distributions of propensity scores in the treated
and control groups. They should be similar
3. Compare distributions of the propensity scores in the
treated and original control groups
 If the two overlap very much, then matching might not
work very well.

158
Propensity Score Matching
• Go to stata and let us do the exercise
• Use psm exercise 2.dta

159
Summarize: how to do PSM
160
Final comments on PSM and OLS

161
Difference-in-Difference
Methods
Difference methods
 Key assumption: parallel trends
 Difference-in-difference
 Fixed effects

163
Identifying Assumption
• Whatever happened to the control group over
time is what would have happened to the
treatment group in the absence of the
program.

Pre Post 164


Identifying Assumption
• Whatever happened to the control group over
time is what would have happened to the
treatment group in the absence of the
program. Effect of program using
only pre- & post- data
from T group (ignoring
general time trend).

Pre Post

165
Identifying Assumption
• Whatever happened to the control group over
time is what would have happened to the
treatment group in the absence of the
program.
Effect of program using
only T & C comparison
from post-intervention
(ignoring pre-existing
differences between T &
C groups).

Pre Post 166


Identifying Assumption
• Whatever happened to the control group over
time is what would have happened to the
treatment group in the absence of the
program.

Pre Post 167


Identifying Assumption
• Whatever happened to the control group over
time is what would have happened to the
treatment group in the absence of the
program.
Effect of program
difference-in-difference
(taking into account pre-
existing differences
between T & C and
general time trend).

Pre Post 168


Differences-in-differences

169
Differences-in-Differences
 First application DID: John Snow (1855)
 Cholera epidemic in London in mid-nineteenth century
 Prevailing theory: “bad air”
 Snow’s hypothesis: contaminated drinking water
 Compared death rates from cholera in districts served by
two water companies
 In 1849 both companies obtained water from the dirty
Thames
 In 1852, one of them moved water works upriver to an area
free of sewage
 Death rates fell sharply in districts served by this water
company!

170
What does panel data add?

171
Example: effect of health insurance on health spending

Before After

OOP budget
share

OOP budget
share

172
Observed and potential outcomes

173
Difference-in-Differences

174
Difference-in-Differences
 The DID estimate is clearly presented in a 3 x 3 table

175
Difference-in-Differences
 The DID estimate is clearly presented in a 3 x 3 table

 All we need to estimate is 4 averages


◦ Non-parametric regression
◦ Take differences and double differences

176
Example: effect of health insurance on health spending

Insured Treatment
group
Similar
Initial OOP budget
Propensity
share
sample Score

Not insured
Control
group

177
Difference-in-Difference
When can we use diff-in-diff?
We want to evaluate a program, project, intervention
We have treatment and control groups
We observe them before and after treatment
But:
Treatment is not random
Other things are happening while the project is in effect
We cannot control for all the potential confounders

Key Assumptions:
Trend in control group approximates what would have
happened in the treatment group in the absence of the
treatment

178
Difference-in-Difference

179
DiD using stata
 the use of the DID method can remove bias from the
unmeasured pre-program covariates, assuming that the
comparison groups exhibit the same trend over time in the
absence of the program
Difference-in-Difference
 Assume that there was a free lunch program in Place A.
 The free lunch was assumed to improve student outcomes

D1 = Dprogram + Dtrend

Place A 2010
2008

D4 D2 D3

Place B 2008 2010

DID= D1-D2 or D3-D4

D3= Dprogram + Ddifference because of factors other than the program 181
Difference-in-Difference
Y Pre (2008) Post (2010) Diff

Treated (A) 20 90 70

Control (B) 30 70 40
100
Diff -10 20 Treated (A)
30
80
Control (B)
60

40

20

Pre (2008) Post(2010)

182
Difference-in-Difference: data
Name Y (score) Dtr Dpost
1 40 0 0
2 80 1 1
3 20 0 0
4 100 1 1
5 30 0 0
6 0 1 0
7 60 0 1
8 40 1 0
9 60 0 1
10 90 0 1

Dtr= dummy variable with a value of 1 if individuals are in the treated group (A)

Dpost= time dummy variable with a value of 1 if individuals take the test in 2010
(post)
183
Difference-in-Difference: data
Y (score) Dpost =0 Dpost =1
(pre; 2008) (post; 2010)
Dtr=0
(Control) 0  0  1
Dtr=1
(Treated) 0  2  0  1   2   3

Difference

DiD with regression

Y =  0 +  1 Dpost +  2 Dtr +  3 Dpostx Dtr [+  4 X]+ 

184
Difference-in-Difference: using stata
185

• Go to stata exercise and use diff-in-diff exercise.dta


• diff score, t(Dtr) p(Dpost)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy