Chen 2012
Chen 2012
1 Introduction
The tenet of theory-driven evaluation is that the design and application of evalua-
tion needs to be guided by a conceptual framework called program theory (Chen
1990, 2005). Program theory is defined as a set of explicit or implicit assump-
tions by stakeholders about what action is required to solve a social, educational
or health problem and why the problem will respond to this action. The purpose
of theory-driven evaluation is not only to assess whether an intervention works
or does not work, but also how and why it does so. The information is essential
for stakeholders to improve their existing or future programs.
Theory-driven evaluation is sharply different from another type of evalua-
tion, called black-box evaluation. Black-box evaluation mainly assesses whether
an intervention has an impact on outcomes. It does not interest in the transforma-
tion process between the intervention and outcomes. Similarly, theory-driven
R. Strobl et al. (Hrsg.), Evaluation von Programmen und Projekten für eine demokratische
Kultur, DOI 10.1007/978-3-531-19009-9_2, © Springer Fachmedien Wiesbaden 2012
18 Huey T. Chen
Program
domains/ Normative Actual
dimensions
Goal/outcome Reduction of student drug use to Reduction of drug use, but uri-
be verified through urinalysis nalysis collection environment
not controlled
Treatment Primary: provide quality coun- Primary: counseling mainly
seling to abusers involved use of threats, admon-
Secondary: basic drug education ishment, and/or encouragement
not to use
Secondary: basic drug education
Implementation
Environment
Target group All drug abusing students Only those drug abusing stu-
dents who were easy to reach
Implementors Teachers provided with adequate Teachers lacked adequate drug
drug treatment training and treatment skills and information
information
Mode of deli- Compulsory individual counse- Compulsory individual counsel-
very ling ing; but with problems such as
lack of plan, format and objec-
tive
Implementing All schools that can adequately Smaller schools had difficulties
organisation implement the program implementing the program
Inter-organi- Effective centralized school Communication gap, mistrust
sational proce- system between Ministry of Education
dures and the schools
Micro-context Eliminate video game arcades Video game arcades still exist
Macro-context Strong public support Strong public support, but prob-
lematic education system (elit-
ism)
The program plan entailed mixing research methods – both quantitative and
qualitative – to collect data. For example, quantitative methods were applied to
rate teachers’ satisfaction with a workshop on drug counseling skills sponsored
Theory-driven evaluation: Conceptual framework, application and advancement 23
by the education ministry, whereas qualitative methods were used to probe con-
textual issues of the teachers’ opinions of the workshop. The right side of Table
1 displays empirical findings for the program’s real-world implementation; com-
parison of the program theory to the implementation reveals large discrepancies.
The program had been carried out, but the quality of services and the system of
implementation were far from being impressive. The discrepancies between plan
and implementation resulted from a lack of appropriate counseling training, the
overburdening of teachers with counseling work with no change to their usual
teaching responsibilities, and lack of communication as well as mistrust between
an authoritarian ministry and the teachers. The evaluation results created doubt
about how a program without strong implementation achieved a 96 % decrease
in drug abuse in schools.
Kristal and colleagues found that the intervention did enhance predisposing fac-
tors as well as the likelihood of entering and remaining in the subsequent stages
of change. They also found that the intervention did not affect enabling factors.
The program was failing because the intervention was failing to activate one of
the three determinants.
Multiple-Determinant Model With Sequential Order. The model containing
two or more determinants aligned in a causal order is a multiple-determinant
model with a sequential order. That is, certain determinants affect others in a
particular sequential order. An example of this kind of linear model is found in
an evaluation of a school-based antismoking campaign (Chen, Quane, & Garland
1988). The intervention contained components such as an antismoking comic
book, discussions of the health messages the comic book delivered, and parental
notification about the intervention program. The determinants of the model, in
sequence, were the number of times the comic book was read, and knowledge of
the comic book’s story and characters. The sequential order indicates that re-
peated reading of the comic book changed the extent of knowledge about the plot
and characters. The sequence is illustrated in Figure 4.
26 Huey T. Chen
model and the change model, so that it helps evaluators achieve a balanced view
from which to assess the worth of a program.
Comprehensiveness of information needed to improve programs: A theory-
driven evaluation that examines how a program’s structure, implementation
procedure, and causal mechanisms actually work in the field will provide useful
information to stakeholders for program improvements.
Balance of scientific and practical concerns: Researchers are greatly con-
cerned about the scientific rigor of an evaluation, while stakeholders desire an
evaluation that addresses service issues. Since the conceptual framework of pro-
gram theory uses an action model to address service issues and tackle rigorous
issues in the change model, it has potential for greater dialogue and collaboration
between academic and practical communities and for narrowing the gap between
scientific and service communities.
Advancement of evaluation theory and methodology: Theory-driven evalua-
tion has been applied in addressing scientific and service issues for a few dec-
ades. Lessons learned from the applications can be applied to further advance
evaluation theory and methodology. The rest of this article will introduce recent
developments of theory-driven evaluation in areas such as the integrative validity
model and bottom-up evaluation approach.
Stakeholders are clients and users of evaluation results, evaluators must under-
stand and address their view and need in evaluation. Because of working stake-
holders intensively, theory-driven evaluation recognizes that stakeholders have a
great interest in intervention programs that are capable of accomplishing two
functions: goal attainment and system integration. Goal attainment means an
intervention can activate causal mechanisms for attaining its prescribed goals as
illustrated in the change model. System integration refers to an intervention is
compatible or even synergic with other components in a system. These compo-
nents include organizational missions and capacity, service delivery routine,
implementers’ capability, relationships with partners, clients’ acceptance, and
community norms as discussed in the action model. Stakeholders value goal
attainment, but they are equally or even more interested in system integration
because they are responsible for delivering services in the real world. Note also
that although goal attainment and system integration are related outcomes attrib-
utable to an intervention, they do not necessarily go hand-in-hand. An effica-
cious or effective intervention does not mean that it is suitable for a community-
Theory-driven evaluation: Conceptual framework, application and advancement 29
based organization to implement it or vice versa (Chen 2010; Chen & Garbe P.
2011).
Stakeholders are greatly keen in evaluative evidence in system integration
and goal attainment, but this interest has often not been satisfactorily met in
evaluations. Traditionally, evaluators have applied the Campbellian validity
typology (Campbell & Stanley 1963; Cook & Campbell 1979; Shadish, Cook, &
Campbell 2002) for outcome evaluation. It is essential to note that the Campbel-
lian validity typology was developed for research rather than evaluation purposes
(Chen H.T., Donaldson, & Mark 2011). Its primary aim is for researchers to
provide credible evidence in examining causal relationships among variables.
Evaluators have found the typology is also very useful for outcome evaluation
and have intensively applied it in addressing goal attainment issues. The typol-
ogy has made a great contribution to program evaluation. However, the applica-
tion of the typology as a major framework or standard for outcome evaluation
has added evaluators’ neglect of system integration issues. Since it is neither the
scope nor intention of the Campbellian validity typology to be used for the pur-
pose of designing well-balanced evaluations for meeting stakeholders’ evalua-
tion needs, it is up to evaluators to develop a more comprehensive perspective
for systematically addressing both goal attainment and system integration issues.
Theory-driven evaluation proposes an integrative validity model (Chen 2010;
Chen & Garbe P. 2011) to take on this challenge. Building on Campbell and
Stanley’s (Campbell & Stanley 1963) distinction of internal and external valid-
ity, the integrative validity model proposes three types of validity for evaluation:
effectual, viable, and transferable.
Effectual validity is the extent to which an evaluation provides credible evi-
dence that an intervention causally affects specified outcomes. This validity is
similar to the concept of internal validity proposed by Campbell and Stanley
(1963). According to the Campbellian validity typology, randomized experi-
ments are the strongest design in enhancing effectual validity. The next is quasi-
experimental methods. Effectual validity is crucial for addressing goal attain-
ment issues.
The integrative validity model proposes viable validity to address stake-
holders’ interest in system integration. Viable validity is the extent to which an
intervention is successful in the real world. Here, viable validity refers to stake-
holders’ views and experiences regarding whether an intervention program is
practical, affordable, suitable, evaluable, and helpful in the real world. More
specifically, viable validity means that ordinary practitioners – rather than re-
search staff – can implement an intervention program adequately, and that the
intervention program is suitable for coordination or management by a service
delivery organization such as a community clinic or a community-based organi-
30 Huey T. Chen
ting to another targeted setting. This definition stresses that transferability for
program evaluation has a boundary – the real world.
Evaluation approaches with strong effectual validity tend to be low in trans-
ferable validity. For example, efficacy evaluation provides the most rigorous
evidence on effectual validity, but it maximizes effectual validity at the expense
of transferable validity. Efficacy evaluation applies randomized controlled trials
(RCTs) that create an ideal and controlled environment in order to rigorously
assess intervention effect. Manipulation and control used in maximizing effec-
tual validity greatly reduce evaluation results’ transferable validity to the real
world. For example, to maximize effectual validity, RCTs usually use highly
qualified and enthusiastic counselors as well as homogenous and motivated cli-
ents that hardly resemble real-world operations. Stakeholders may regard evi-
dence provided in efficacy evaluation to be irrelevant to what they are doing.
Effectiveness evaluation is superior to efficacy evaluation for addressing
transferable validity issues. Effectiveness evaluation estimates intervention ef-
fects in ordinary patients in real-world, clinical practice environments. To reflect
the real world, recruitment and eligibility criteria are loosely defined to create a
heterogeneous and representative sample of the targeted populations. Interven-
tion delivery and patient adherence are less tightly monitored and controlled than
in efficacy evaluations. The central idea is that to enhance transferability, effec-
tiveness studies must resemble real-world environments. RCTs that require an
intensive manipulation of setting are not suitable for effectiveness evaluation –
evaluators often need to resort to non-RCT methods. Through scarifying some
level of effectual validity, effectiveness evaluation enhances transferable valid-
ity.
Theory-driven evaluation argues effectiveness evaluation’s transferable va-
lidity can be further enhanced by incorporating contextual factors and causal
mechanisms as described in the action-change framework in the assessment
(Chen 1990, 2005). In addition, theory-driven evaluation proposes the concepts
of exhibited or targeted generalization for facilitating evaluators to address trans-
ferability issues (Chen 2010). Exhibited generalization of an evaluation itself
provides sufficient contextual factors for an intervention to be effective in real-
world applications. Potential users can adapt the information on the effectiveness
of the intervention together with the contextual factors. Users can thereby assess
its generalization potential with regard to their own populations and settings and
decide whether to apply the intervention in their communities. Exhibited gener-
alization can be achieved through the „action model-change model“ framework
in the theory-driven approach (Chen 1990, 2005) as previously discussed. Stake-
holders sometimes have a particular real-world target population or setting to
which they want to transfer the evaluation results. This is targeted generalization;
32 Huey T. Chen
that is, the extent to which evaluation results can be transferred to a specific
population and real world setting. Targeted generalization is achieved through
methods such as sampling (Shadish et al. 2002), Cronbach’s UTOS approach
(Cronbach 1982), or the dimension test (Chen 1990). Thus through exhibited or
targeted generalization, transferable validity adds a workable evaluation concept
to program evaluation.
Furthermore, it is important to stress that transferable validity can mean ei-
ther transferability of effectuality or transferability of viability. Transferability of
effectuality has been the focus of the literature discussing external validity or
generalizability. Transferability of viability, however, is an emerging concept
that asks the question „To what extent can evaluation findings of an interven-
tion’s viability be transferred from one real-world setting to another targeted
setting?“ The distinction is important; that an intervention’ effectuality might
transfer to another setting does not guarantee that an intervention’s viability will
similarly be transferable.
The bottom-up approach has a number of advantages over the top-down ap-
proach:
Assure intervention’s usefulness to stakeholders and avoid wasting money.
The traditional top-down approach usually begins with an expensive and time-
consuming efficacy evaluation to assess an innovative intervention. After mil-
lions of dollars are spent on an efficacy evaluation, it might be found that the
efficacious intervention is very difficult to implement in the real world, not of
interest to stakeholders, or may not be real-world effective. This kind of ap-
proach tends to waste money.
By contrast, the bottom-up approach starts from viability evaluation. This
first assesses the viability of an intervention as proposed by researchers or stake-
holders. Because interventions with low viability are screened out in the begin-
ning, this approach could save funding agencies considerable money and re-
sources. The bottom-up approach encourages funding agencies to fund many
viability evaluations and to select highly viable interventions for further rigorous
studies.
Provide an opportunity to revise and improve an intervention in the real
world before its finalization. One top-down approach limitation is finalizing the
intervention protocol or package before or during efficacy evaluation – the pro-
tocol is not supposed to change after the evaluation. And when an intervention
protocol is finalized at such an early stage, it prevents the intervention from
gaining feedback from the real-world implementation or stakeholders’ inputs for
improvement. This approach seriously restricts an intervention’s generalizability
to the real world.
By contrast, the bottom-up approach affords an opportunity to improve an
intervention during the viability evaluation. Intervention protocols refined from
stakeholder inputs and implementation experience increase their real-world rele-
vancy and contribution.
Provide an Alternative Perspective for Funding. In theory, funding agencies
are interested in both scientific and viability issues. They want to see their fund-
ed projects be successful in communities or to have the capability of solving
real-world problems. In practice, however, many agencies tend to heavily em-
phasize scientific factors such as RCTs or other randomized experiments as a
qualification criterion for grant application (Donaldson, Christie, & Mark 2008;
Huffman & Lawrenz 2006), while paying insufficient attention to viability is-
sues. As discussed previously, if funding policy excessively stresses internal
validity issues, it could waste money on projects that might be rigorous and in-
novative but that have little practical value. The bottom-up approach provides an
alternative perspective for funding agencies to address scientific and viability
issues in funding process. This perspective suggests three levels of funding:
Theory-driven evaluation: Conceptual framework, application and advancement 35
Funding for viability evaluation: This funding level provides funds for as-
sessing the viability of existing or innovative interventions. It will formally rec-
ognize a stakeholder’s contribution in developing real-world programs. Re-
searchers can also submit their innovative interventions for viability testing. In
doing so, however, they will have to collaborate with stakeholders in addressing
practical issues.
Funding for effectiveness evaluation: The second level of funding is an ef-
fectiveness evaluation for viable and popular interventions. Ideally, these evalua-
tions should address both effectual and transferable validity issues.
Funding for efficacy evaluation: The third level of funding is efficacy eval-
uation for those interventions proven viable, effective, and transferable in the
real world. Efficacy evaluation provides the strongest evidence of an interven-
tion’s precise effect, with practical value as an added benefit.
These three levels of funding will promote collaborations between stake-
holders and researchers and ensure that evaluation results meet both scientific
and practical demands.
After the original mission of finding a route to India was replaced with the new
mission of discovering a new world and tasks were adjusted accordingly, the
team and many others judged the expedition an enormous success. Fluid com-
plexity makes an important contribution by bringing evaluators’ attention to
environmental influences and the dynamics of program processes. This approach
may be useful for program planning and management, but in its current form, it
has limitations in evaluation. Not many existing quantitative methods or statisti-
cal models are capable of analyzing such complicated transformation processes
and interaction effects. Whether qualitative methods could meet the challenge
remains to be seen. Furthermore, if a program is extremely complex and dy-
namic, then it lacks an entity for meaningful evaluation. In this case, consultants
are more suitable than evaluators for offering opinions on how to address prob-
lems or issues generated from the constantly fluid and ever changing system.
The theory-driven evaluation’s view on a program represents a synthesis of
reductionism and fluid complexity. Theory-driven evaluation postulates that a
program must address both change and stability forces as described by these two
contrasting views. On the one hand, a program’s political, social, or economic
environment can create uncertainties that pressure the program for making
changes. On the other hand, a program has to maintain some level of stability in
order to provide a venue for transforming an intervention for desirable outcomes.
Many programs address these opposite forces through taking proactive measures
to reduce or even managing uncertainties. The action model and change models
discussed previously provide a conceptual framework for understanding where
proactive measures take place. For example, program managers and staff can
build partnerships to buffer political pressure, strengthen organizational ties with
funding agencies to increase chances to get funds, provide implementers training
and incentive to reduce turnover, mobilize its community bases to generate
community support for reducing criticisms, select a robust intervention for re-
ducing potential implementation problems, and so on. A problem can be solved
by reducing uncertainties and manipulating components as specified in the action
and change models.
By synthesizing reductionism and fluid complexity, theory-driven evalua-
tion may have the benefits of both worlds. It agrees with fluid complexity on the
influences of uncertainties on a program, but argues that uncertainties could be
reduced through anticipatory action such as better planning and information
feedback. In addressing change and stability forces, the theory-driven evalua-
tion’s program view as expressed in program theory is more complicated than
the reductionism’s view of a program, but its scope is manageable and analyz-
able within the capability of existing quantitative and qualitative methods. There
are programs suitable for applying either reductionism or fluid complexity, but
38 Huey T. Chen
the majority of intervention programs may be more applicable with the theory-
driven evaluation’s program view. Theory-driven evaluation provides an alterna-
tive for assessing these programs.
9 Discussion
Program evaluation is a young applied science. At its infancy stage, it had heav-
ily borrowed concepts, methods, approaches, and theories from matured sci-
ences. These methodologies and theories have been applied to evaluate and
found their usefulness. They will continue to make contributions to program
evaluation in the future. However, since these imported methodologies and theo-
ries were not developed for evaluation, I believe there are limits to how far they
can help to advance evaluation. To further advance program evaluation, we may
need more in-born evaluation theories and methodologies dedicating mainly for
evaluation causes to energize the field. The development of theory-driven
evaluation as demonstrated in this chapter represents an endeavour in this direc-
tion.
References
Bickman, L. (Ed.). (1987). Using program theory in evaluation. San Francisco Jossey-
Bass .
Bickman, L. (Ed.). (1990). Advances in program theory. San Francisco: Jossey-Bass.
Campbell, D. T., & Stanley, J. (1963). Experimental and quasi-experimental designs for
research. Chicago: RandMcNally.
Chen H.T./Donaldson, S. L./Mark, M. M. (2011). Validity Frameworks for Outcome
Evaluation In C. HT, S. L. Donaldson & M. M. Mark (Eds.), Advancing Validity.
In: Outcome Evaluation: Theory and Practice (Vol. Forthcoming ). San Francisco:
Jossey-Bass.
Chen, H. T. (1988). Validity in evaluation research: a critical assessment of current issues.
Policy and Politics, 16(1), S. 1-16.
Chen, H. T. (1990). Theory-driven evaluations. Thousand Oak, CA: Sage.
Chen, H. T. (1997). Normative evaluation of an anti-drug abuse program. Evaluation and
Program Planning, 20(2), S. 195-204.
Chen, H. T. (2005). Practical program evaluation: assessing and improving planning,
implementation, and effectiveness. Thousand Oak, CA: Sage.
Chen, H. T. (2010). The bottom-up approach to integrative validity: a new perspective for
program evaluation. Eval Program Plann, 33(3), S. 205-214. doi: S0149-
7189(09)00101-3 [pii]10.1016/j.evalprogplan.2009.10.002
Theory-driven evaluation: Conceptual framework, application and advancement 39
Chen, H. T./Garbe P. (2011). Assessing Program Outcomes from the Bottom-Up Ap-
proach: An Innovative Perspective to Outcome Evaluation. In: H. T. Chen, S. L.
Donaldson & M. M. Mark (Eds.), Advancing Validity in Outcome Evaluation: The-
ory and Practice (Vol. Forthcoming ). San Franscisco Jossey-Bass.
Chen, H. T./Quane, J./Garland, T. N. (1988). Evaluating an antismoking program. Evalua-
tion and the Health Professions 11(4), S. 441-464.
Chen, H. T./Rossi, P. H. (1983). The theory-driven approach to validity. Evaluation and
Program Planning, 10, S. 95-103.
Connell, J. P./Kubisch, A. C./Schorr, L. B./Weiss, C. H. (1995). New approaches to eval-
uating community initiatives: Concepts, methods and contexts. Washington, DC:
Aspen Institute.
Cook, T. D./Campbell, D. T. (1979). Quasi-Experimentation: Design and Analysis Issues
for Field Settings. Chicago: Rand McNally.
Coryn, C. L. S./Noakes, L. A./Westine, C. D./Schoter, D. (2011). A systematic review of
theory-driven evaluation practice from 1990 to 2009. American Journal of Evalua-
tion, 32(2), S. 199-266.
Coxe, S./West, S. G./Aiken, L. S. (2009). The analysis of count data: a gentle introduction
to poisson regression and its alternatives. J Pers Assess, 91(2), 121-136. doi:
908606900 [pii]10.1080/00223890802634175
Cronbach, L. J. (1982). Designing Evaluations of Educational and Social Programs. San
Francisco: Jossey-Bass.
Donaldson, S. L./Christie, C. A./Mark, M. M. E. (2008). What counts as credible evidence
in applied and evaluation pracrtice? Newbury Park, CA: sage.
Fulbright-Anderson, K./Kubisch, A. C./Connell, J. P. (Eds.). (1998). New approaches to
evaluating community innitiatives. Vol. 2: Theory, measurement and analysis.
Washington, D.C.: Aspen Institute.
Glasgow, R. E.,/Lichtenstein, E./Marcus, A. C. (2003). Why don’t we see more transla-
tion of health promotion research to practice? Rethinking the efficacy-to-
effectiveness transition. Am J Public Health, 93(8), S. 1261-1267.
Greene, J./Caracelli, V. J. (Eds.). (1997). Advanced in mixed-method evaluation: Teh
chanllenge and benefits of integarting diverse paradigm (Vol. 74). San Francisco:
Jossey-Bass.
Hansen, M. B./Vedung, E. (2010). Theory-Based Stakeholder Evaluation. American
Journal of Evaluation, 31(3), 295-313. doi:10.1177/1098214010366174
Huffman, D./Lawrenz, F. (Eds.). (2006). Critical Issues in STEM Evaluation. San Fran-
cisco: Jossey-Bass.
Kristal, A. R./Glanz, K./Tilley, B. C./Li, S. (2000). Mediating factors in dietary change:
Understanding the impact of a worksite nutrition intervention. Health Education &
Behavior, 27(1), S. 112-125.
Miller, W. R./Toscova, R. T./Miller, J. H./Sanchez, V. (2000). A theory-based motiva-
tional approach for reducing alcohol/drug problems in college. [Evaluation Studies
Multicenter Study Research Support, U.S. Gov’t, Non-P.H.S. Research Support,
U.S. Gov’t, P.H.S.]. Health Educ Behav, 27(6), S. 744-759.
Patton, M. Q. (1997). Utilization-Focused Evaluation (3d ed. ed.). Thousand Oaks, CA.
Pawson, R., /Tilly, N. (1997). Realistic evaluation. Thousand Oaks, CA: Sage.
40 Huey T. Chen
Posavac, E. J., /Carey, R. G. (2007). Program Evaluation: Methods and Case Studies.
Upper Saddle River, New Jersey: Pearson Prentice Hall.
Rogers, P. J./Hasci, T. A./Petrosino, A./Huebner, T. A. (Eds.). (2000). Program theory in
evaluation: Challenges and Opportunites (Vol. 87). San Francisco: Jossey-Bass.
Rossi, P. H./Lipsey, M. W./Freeman, H. E. (2004). Evaluation: A systematic approach.
Thousand Oaks, CA: Sage.
Shadish, W. R./Cook, T. D./Campbell, D. T. (2002). Experimental and quasi-experimental
designs for generalized causal inference. Boston: Houghton Mifflin.
Tashakkori, A.,/Teddlie, c. (Eds.). (2003). Handbook of Mixed Methods in Social and
Behavioral Research. thousand Oaks, CA: Sage.
Weiss, C. (1998). Evaluation (2nd edition ed.). Englewood Cliffs, New Jersey: Prentice
Hall.
Wholey, J. S. (Ed.). (1987). Using program theory in evaluation (Vol. 33). San Francisco:
Jossey-Bass.