Meta-Analysis For Psychologists
Meta-Analysis For Psychologists
Richard Cooke
Meta-Analysis for
Psychologists
Richard Cooke
School of Health, Education, Policing and Sciences
University of Staffordshire
Stoke-on-Trent, Warwickshire, UK
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Palgrave Macmillan imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
vii
viii Preface
I would like to thank Gina Halliwell, Emma Davies, Julie Bayley, Sarah Rose,
Helen McEwan, Joel Crawford, and Andy Jones for their helpful feedback on draft
chapters. I also want to thank Stefanie Williams for allowing me to reproduce a
figure from one of her papers.
I would like to thank Palgrave Macmillan for their consistent support and
patience throughout the production of this book. Particular thanks to Beth Farrow,
Bhavya Rattan, Esther Rani, and Grace Jackson for seeing this book from inception
to publication.
Finally, and most importantly, I want to thank Professor Paschal Sheeran for his
guidance on how to conduct meta-analyses more than 25 years ago. I learned so
much from Paschal in three years that has taken me almost 25 years to appreciate all
he taught me. Without Paschal this book would not exist.
I have made every effort to trace copyright holders in the production of this book.
If, however, any have been overlooked, the publishers will be willing to make the
required arrangements to address this at the earliest opportunity.
ix
Contents
xi
xii Contents
Summary���������������������������������������������������������������������������������������������������� 29
Tasks������������������������������������������������������������������������������������������������������ 30
References�������������������������������������������������������������������������������������������������� 30
Index�������������������������������������������������������������������������������������������������������������������� 189
List of Figures
xvii
xviii List of Figures
xix
Introduction to Meta-Analysis
for Psychologists 1
Meta-analysis is the name for a set of statistical techniques used to pool (synthesise)
results from a set of studies on the same topic. For example, in Cooke and Sheeran
(2004), we extracted correlations reported by studies testing relationships from
Ajzen’s (1991) Theory of Planned Behaviour, for example, the correlation between
attitudes and intentions and the correlation between intentions and behaviour. We
used meta-analysis to pool results to provide a precise estimate of the magnitude
(size) of these correlations. Such estimates can be used in many ways. We used them
to test research questions in a later study (Cooke & Sheeran, 2013). Results from a
meta-analysis can also be used to inform study design. Once you know the magni-
tude of a correlation you can use this information in combination with other infor-
mation, including significance level and power, to determine the sample size needed
for future studies. Pooled results can be compared to other pooled results to test
theoretical questions too.
This book is also suitable for postgraduate students and researchers in other
social science disciplines, such as criminology, economics, or geography who have
had teaching on research methods and statistics as part of an undergraduate degree.
Any discipline that uses survey data to correlates variables, or compares groups on
an outcome, will be able to follow the examples in this book.
The book is organised into four parts. Part I, ‘Introduction to meta-analysis’, intro-
duces the basic ideas behind meta-analysis and comprises Chaps. 2 and 3. Chapter
2 outlines what meta-analysis is, what it involves, and why you should run a meta-
analysis. Chapter 3 goes into more detail about effect sizes as understanding them
is critical when conducting and interpreting meta-analyses. Part II, ‘Preparing to
conduct a meta-analysis’, comprises Chaps. 4, 5, 6, and 7. Chapter 4 provides a
brief overview of systematic review essentials around setting a research question,
inclusion/exclusion criteria, and searching and screening. Chapter 5 covers data
extraction for meta-analysis, with an emphasis on extracting effect sizes. Chapter 6
introduces quality appraisal for meta-analysis. Chapter 7 covers the idea of data
synthesis, which is synonymous with meta-analysis. Part III, ‘Conducting meta-
analysis in jamovi’, covers the steps you need to follow to conduct a meta-analysis
References 3
in the open-source software jamovi and comprises Chaps. 8, 9, and 10. Chapter 8
introduces you to jamovi, outlining the steps you need to follow to be ready for
meta-analysis. Chapter 9 provides an example of how to run a meta-analysis of cor-
relations in jamovi and Chap. 10 provides an example of how to conduct a meta-
analysis of effect size differences. Part IV, ‘Further issues in meta-analysis’, covers
additional issues in meta-analysis. Chapter 11 compares fixed-effect versus ran-
dom-effects in meta-analyses. Chapter 12 covers moderator (sub-group) analyses.
Chapter 13 discusses publication bias. Chapter 14 covers extensions to meta-analy-
sis. Finally, Chapter 15 provides tips on how to write up your meta-analysis.
References
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision
Processes, 50, 179–211. https://doi.org/10.1016/0749-5978(91)90020-T
Cooke, R., & Sheeran, P. (2004). Moderation of cognition-intention and cognition-behav-
iour relations: A meta-analysis of properties of variables from the theory of planned
behaviour. The British Journal of Social Psychology, 43(Pt 2), 159–186. https://doi.
org/10.1348/0144666041501688
Cooke, R., & Sheeran, P. (2013). Properties of intention: Component structure and consequences
for behavior, information processing, and resistance. Journal of Applied Social Psychology,
43(4), 749–760. https://doi.org/10.1111/jasp.12003
Part I
Introduction to Meta-Analysis
What Is a Meta-Analysis and Why Should
I Run One? 2
What Is a Meta-Analysis?
The first meta-analysis I read was authored by my PhD supervisor, Paschal Sheeran,
and Sheina Orbell (Sheeran & Orbell, 1998). This meta-analysis estimated the mag-
nitude of the intention–behaviour relationship for condom use, that is, how big is the
correlation between people’s intentions to use a condom and their self-reported con-
dom use across a set of studies? Using data from 28 samples identified following a
systematic review, Paschal and Sheina conducted a meta-analysis that reported a
sample-weighted average correlation of r+ = 0.44. Following Cohen’s (1992) guide-
lines for interpreting the magnitude of correlations (See Chap. 3) this correlation is
interpreted as medium-sized.
What does this result tell us? It shows that across almost 30 samples, there is a
significant, positive correlation between intentions and self-reported condom use;
those who report intending to use a condom are more likely to report using a con-
dom in the future than those who report they do not intend to use a condom. As
psychology studies are littered with correlations between variables and outcomes
that are inconsistent, Paschal and Sheina’s meta-analysis increases our confidence
that intentions are a reliable correlate of later condom use. Such results are useful
from both an applied and a theoretical perspective. Showing intentions consistently
correlate with self-reported condom use gives interventionists seeking to increase
condom use a target to modify in interventions. Results from meta-analysis also
provide support for theorising, increasing confidence that intentions are a key pre-
dictor of behaviour as has been proposed for over 50 years (Ajzen & Fishbein,
1973) and continues to be proposed today (Michie et al., 2011).
The next meta-analysis I read was Chris Armitage and Mark Conner’s (Armitage
& Conner, 2001) meta-analysis of theory of planned behaviour studies. It is the
dictionary definition of a highly cited paper, with more than 15,000 citations at the
time of writing. The sheer scale of this meta-analysis still blows me away; Chris and
Mark meta-analysed data from 185 studies—185! I’ve never meta-analysed more
than 40 studies, and that took ages. It remains an impressive achievement and a use-
ful resource for researchers looking for an overview of the theory of planned behav-
iour, although it is outdated now because many more studies have been published
since this paper came out. Over time, meta-analyses need updating to ensure they
reflect the results from a literature.
So good was Armitage and Conner’s (2001) meta-analysis, I continued to cite it
until the publication of Rosie McEachan and colleagues’ (McEachan et al., 2011)
meta-analysis. Rosie’s meta-analysis focused on health behaviours, which meant
that researchers could determine theory of planned behaviour relationships with
reference to specific behaviours, like dietary behaviours, physical activity, smoking,
et cetera. This is a key improvement over Chris and Mark’s meta-analysis because
they pooled results from across a range of different behaviours, which means their
results provide a clearer picture of theory relationships rather than answering ques-
tions about the magnitude of relationships for specific behaviours.
As a health psychologist, it is helpful to know about specific relationships, for
example, “How well do intentions predict drinking behaviour?” “How well do
intentions predict dietary behaviours?” because it is these specific questions I am
looking to answer in my own primary research studies. You must not assume that
relationships are the same across behaviours. Indeed, Rosie’s meta-analysis clearly
highlights that intention–behaviour correlations vary between health behaviours
(see Table 2 in her paper). This extra layer of specificity allows researchers to dis-
cern more clearly what the most important relationships are for different behaviours
and estimate the effects they are likely to find in their own studies. A key goal of
meta-analysis is to provide a precise estimate of effect sizes. A meta-analysis of
studies for a single behaviour is necessarily more precise than one for multiple
behaviours.
Rosie’s meta-analysis remains the best health psychology meta-analysis I’ve
read. It’s a brilliant piece of work, providing behaviour-specific results that inspired
my meta-analysis of theory of planned behaviour alcohol studies (Cooke et al.,
2016). Rosie was only able to identify a small number of alcohol studies, so she
pooled results for these studies with results for other substance use behaviours. I
Meta-Analyses Conducted by Psychologists 9
There are certain key concepts or ideas that you need to know to understand what
happens in a meta-analysis. I’m going to introduce you to four concepts: (1) effect
sizes; (2) pooling (synthesising); (3) sample-weighting; (4) publication bias. These
concepts are important regardless of the type of meta-analysis you want to run and
we’ll return to them in greater detail later in the book.
Effect sizes are the building blocks, currency, or units of meta-analyses. They are
the result of a statistical analysis, like a correlation between variables, or an effect
size difference testing the effectiveness of an intervention on an outcome or an
experimental manipulation on a variable. Much of the time, meta-analyses are run
using effect sizes reported by authors in a published paper. Sometimes, effect sizes
are accessed from unpublished sources, including PhD theses, reports, or directly
from the authors of the original studies. There are also occasions where the meta-
analyst calculates the effect size themselves. In most psychological meta-analyses,
12 2 What Is a Meta-Analysis and Why Should I Run One?
the effect sizes are either correlations between psychological variables and out-
comes, or effect size differences comparing an outcome between a control (com-
parison) group and an experimental/intervention group. I’ve already reported
multiple effect sizes in this chapter including Sheeran and Orbell’s sample-weighted
average correlation of r+ = 0.44 between intentions and condom use, and Ashford
et al.’s effect size difference d+ = 0.16 for interventions aiming to increase physical
activity self-efficacy.
The choice of study designs you search for when you set out to conduct a meta-
analysis means you will almost inevitably find studies containing effect sizes or the
statistical information needed to calculate them—the only time this happens is
when you cannot find any studies! So, a natural consequence of searching for stud-
ies that have correlated intentions with condom use is that such studies are highly
likely to have reported the correlations you need for meta-analysis. Similarly,
searching for studies using an experimental design or testing a behaviour change
intervention means looking for studies that are likely to report the descriptive statis-
tics, like means and standard deviations, you need to calculate effect size differ-
ences. So, if you set out to search for study designs associated with statistical
information, that is, correlations are commonly reported in survey studies, descrip-
tive statistics are commonly reported in experimental/intervention studies, you are
well on your way to running a meta-analysis. Chapter 3 contains more information
about effect sizes and Chaps. 4, 5, 6, and 7 cover all aspects of systematic searching,
screening, data extraction, quality appraisal, and pooling results in meta-analysis.
you have, although you still report the total number of participants when writing up
your results. Meta-analysts differentiate how many studies they included from how
many participants were recruited into those studies by using different letters. You
will be familiar with the idea of using N to report the total number of participants,
and n to report a sub-sample, for example, how many young people we recruited in
our total sample. In meta-analysis, we use K for the total number of samples, and k
for sub-samples.
For example, in a meta-analysis, you might write: K = 17 studies were found that
investigated the effects of behaviour change interventions seeking to promote health
eating in primary age (4–11) school children. Studies were conducted in several
countries, including England (k = 4), Scotland (k = 3), USA (k = 3), the Netherlands
(k = 3), Norway (k = 2), and Slovakia (k = 2).
Pooling effect sizes achieves several important goals. Most obviously, it tells you
the magnitude of the effect size across studies. Almost always, effect sizes vary
between studies, and it is also typically the case that the pooled effect size falls
between the highest and lowest effect sizes you find. Visualising all the effect sizes,
typically by using a forest plot, that is, a graph showing the effect sizes from all
samples with the pooled effect size at the bottom, helps us to spot outliers. That
shiny finding, which is typically reported in a high impact factor journal, and often
generates media buzz, will stand out like a sore thumb if all other samples find small
or null effects. Running a meta-analysis will help you to avoid being too influenced
by eye-catching results in particular samples, when evaluating your research litera-
ture. As we will see below when we talk about publication bias, it’s very easy to be
dazzled by a finding showing a huge effect of an intervention on an important out-
come. Sadly, this bias pervades journals and psychologists are particularly prone to
it (Chambers, 2017). Pooling results using meta-analysis is part of the solution to
publication bias; if we pool results across studies, we can accurately assess the true
pattern of findings from across the literature.
Pooling also increases the precision with which we can conduct future research
studies. We can use the pooled effect size to estimate the sample size we need for
future studies, leading to studies with greater power, that increase our confidence in
results. We can use our pooled results to clarify to non-academic audiences what is
going on and counter the sensationalism that sometimes surrounds research results.
When we pool results, we can either treat them as equally important or assign
greater weight (influence) to some results and lesser weight (influence) to others. To
treat all results as equally important is straightforward. Extract the correlations from
studies you identify in a systematic review, then use any software package to aver-
age the correlations. As discussed by Borenstein and colleagues (Borenstein et al.,
2009, 2021), such an approach has several disadvantages.
14 2 What Is a Meta-Analysis and Why Should I Run One?
Meta-analysis is based on the idea of assigning greater weight to effect sizes with
larger samples. This is called sample-weighting and is one of the key strengths of
meta-analysis, helping you to see the wood from the trees. It’s based on a straight-
forward idea from statistics; the larger your sample size, the more likely your effect
size will generalize to the population effect size.
Here’s a simple example. I often run studies to predict binge-drinking in English
university students using psychological variables drawn from Ajzen’s (1991) theory
of planned behaviour. In these studies, I’m looking to correlate psychological vari-
ables with binge-drinking intentions. Imagine I have three research assistants help-
ing me out with data collection. Each research assistant uses different methods to
recruit students. Lee asks his mates to complete questionnaires and ends up with a
sample of ten students. Debbie decides to advertise her study on social media and
recruits a sample of 100 students. Ivy decides to ask her friends, advertise on social
media, reach out to influencers, nag the chair of the university sports teams, and use
every recruitment trick in the book to end up with a sample of 1000 students.
So, we have three samples that can be used to compute the correlation between
drinking intentions and drinking behaviour.
Results look like this:
Which correlation should we view as most likely to generalise to the wider popu-
lation of English university students? Based on the idea that the larger the sample
the more representative it is of the population, we should trust Ivy’s results more
than Debbie’s or Lee’s, because Ivy’s effect size (r = 0.50) is based on data from
1000 students.
Meta-analysis takes this principle to heart, which is why you end up with a sam-
ple-weighted average correlation rather than an average correlation. In practice, this
means that each sample included in your meta-analysis is not treated equally; larger
samples are given greater weight (influence) over the overall effect size relative to
smaller samples. We’ll return to this issue in later chapters, but a useful heuristic is
that studies with larger samples sizes are assigned greater weight over the overall
effect size calculated in a meta-analysis relative to studies with smaller sample sizes.
emerged from our meta-analysis. We did not set out to generate this question, it
resulted from our analyses.
Alternatively, in Cooke et al. (2016), we found that adolescents reported smaller
attitudes–intention and subjective norm–intention correlations compared with
adults. A recent paper by Kyrrestad et al. (2020), independently confirmed that atti-
tudes and subjective norms have small-sized effects on intentions in a large sample
of Norwegian adolescents. Once again, a result of a meta-analysis set up a primary
research question. While you might not change the world with your own meta-
analysis, at the end of the process you will be much better informed than when you
started. I think that is a good goal for any research study.
The simple answer to the question of how many studies you need to run a meta-
analysis is two, otherwise you cannot pool results. However, I agree with Martin
Hagger (Hagger, 2022) that unless both studies used robust study designs, there is
not much value in pooling results together. I have been told a meta-analysis based
on over 30 studies was ‘premature’ and received feedback on other meta-analyses
saying that results would be more convincing if based on more studies. Not very
helpful, having spent two years searching for studies!!!
I would say that you should aim to include effect sizes from between 30 and 40
studies (samples) for a meta-analysis, though this reflects the number of studies I
have typically included in my meta-analyses rather than a definitive figure. One
source of support for this claim comes from researchers using a moderator tech-
nique called meta-CART (see Chap. 14), who argue that to test moderation using
this technique you need at least 40 effect sizes and that results work best with at
least 120. In my experience, it’s unlikely you will find 120 effect sizes, unless you
are looking to update an existing meta-analysis, so, you will likely have to make do
with the studies you can find. The more the better though.
Summary
In this chapter, I’ve introduced you to what meta-analysis is, what it involves, and
why you should run a meta-analysis. In the next chapter, I’ll discuss effect sizes in
more detail as these are the building blocks of meta-analysis.
References
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision
Processes, 50, 179–211. https://doi.org/10.1016/0749-5978(91)90020-T
18 2 What Is a Meta-Analysis and Why Should I Run One?
Ajzen, I., & Fishbein, M. (1973). Attitudinal and normative variables as predictors of specific
behaviors. Journal of Personality and Social Psychology, 27, 41–57.
Armitage, C. J., & Conner, M. (2001). Efficacy of the theory of planned behaviour: A
meta-analytic review. British Journal of Social Psychology, 40(4), 471–499. https://doi.
org/10.1348/014466601164939
Ashford, S., Edmunds, J., & French, D. P. (2010). What is the best way to change self-efficacy
to promote lifestyle and recreational physical activity? A systematic review with meta-
analysis. British Journal of Health Psychology, 15(2), 265–288. https://doi.org/10.134
8/135910709X461752
Bandura, A. (1977). Self-efficacy: toward a unifying theory of behavioral change. Psychological
review, 84(2), 191–215
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (Eds.). (2009). Introduction to meta-
analysis (1st ed.). Wiley.
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2021). Introduction to meta-anal-
ysis (2nd ed.). Wiley.
Chambers, C. (2017). The seven deadly sins of psychology: A manifesto for reforming the culture
of scientific practice / Chris Chambers. Princeton University Press.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
Epton, T., Harris, P. R., Kane, R., van Koningsbruggen, G. M., & Sheeran, P. (2015). The impact
of self-affirmation on health-behavior change: A meta-analysis. Health Psychology, 34(3),
187–196. https://doi.org/10.1037/hea0000116
Gollwitzer, P. M., & Sheeran, P. (2006). Implementation intentions and goal achievement: A meta-
analysis of effects and processes. Advances in Experimental Social Psychology, 38, 69–119.
https://doi.org/10.1016/S0065-2601(06)38002-1
Hagger, M. S. (2022). Meta-analysis. International Review of Sport and Exercise Psychology,
15(1), 120–151. https://doi.org/10.1080/1750984X.2021.1966824
Kyrrestad, H., Mabille, G., Adolfsen, F., Koposov, R., & Martinussen, M. (2020). Gender differ-
ences in alcohol onset and drinking frequency in adolescents: An application of the theory
of planned behavior. Drugs: Education, Prevention and Policy, 1–11. https://doi.org/10.108
0/09687637.2020.1865271
McEachan, R. R. C., Conner, M., Taylor, N. J., & Lawton, R. J. (2011). Prospective prediction
of health-related behaviours with the Theory of Planned Behaviour: A meta-analysis. Health
Psychology Review, 5, 97–144. https://doi.org/10.1080/17437199.2010.521684
Michie, S., Van Stralen, M. M., & West, R. (2011). The behaviour change wheel: A new method for
characterising and designing behaviour change interventions. Implementation Science, 6(1),
42. https://doi.org/10.1186/1748-5908-6-42
Sheeran, P., Listrom, O., & Gollwitzer, P. M. (2024). The when and how of planning: Meta-analysis
of the scope and components of implementation intentions in 642 tests. European Review of
Social Psychology, 1–33. https://doi.org/10.1080/10463283.2024.2334563
Sheeran, P., & Orbell, S. (1998). Do intentions predict condom use? Metaanalysis and examination
of six moderator variables. British Journal of Social Psychology, 37(2), 231–250. https://doi.
org/10.1111/j.2044-8309.1998.tb01167.x
Identifying Your Effect Size
3
A key question to answer when writing the protocol for the systematic review that
will inform your meta-analysis is “What effect size will be used to pool results from
included studies?” Effect sizes are summary statistics reported in primary studies,
like the correlation between two variables (r) or the effect size difference (d) between
two groups in an outcome. Meta-analysis involves pooling (synthesising) effect
sizes, so, until you identify your effect size, you won’t be able to conduct a
meta-analysis.
Part of the process of identifying your effect size is knowing what statistics go
hand-in-hand with different study designs. In many papers, the study design used by
researchers determines the effect size reported; observational (survey) designs usu-
ally report correlations between variables, whereas experimental (intervention/trial)
designs usually report descriptive statistics (mean, standard deviation) for an out-
come. The two most reported effect sizes that psychologists use in meta-analyses
are correlations between variables and effect size differences in an outcome. I’ll use
two of my meta-analyses as examples to help you start thinking about your
effect size:
ies testing the theory of planned behaviour’s ability to predict drinking intentions
or drinking behaviour, we extracted correlations between variables and drinking
intentions or drinking behaviour before using meta-analysis to pool correlations
into overall effect sizes for each theory relationship.
• Cooke et al. (2023) is a meta-analysis of studies testing the impact of forming
implementation intentions—if-then plans specifying how you plan to change
your behaviour (Gollwitzer, 1999)—on future alcohol consumption. Researchers
used prospective designs to test effects of forming (versus not forming) imple-
mentation intentions on future behaviour. Participants report baseline perfor-
mance of behaviour before being assigned to form (or not form) an implementation
intention, creating intervention and control groups. The impact of forming (or
not forming) implementation intentions is assessed by measuring behaviour at
some point in the future, typically between one week and six weeks after forming
implementation intentions. Our goal was to conduct a meta-analysis of the effect
of forming implementation intentions on studies measuring alcohol use out-
comes, for example, weekly drinking or heavy episodic drinking (also known as
binge drinking). After systematically searching for studies that tested the effect
of forming implementation intentions on alcohol use outcomes, we extracted
means and standard deviations for intervention and control groups to allow us to
calculate effect size differences between the two groups in alcohol use outcomes
before using meta-analysis to pool effect size differences.
in what they are doing. I’ll do my best in this book to help increase your confidence
in using effect sizes as it is essential you understand them if you want to run a meta-
analysis. The next section will take a slight detour from our focus on effect sizes to
illustrate how significance is only one of three key statistical dimensions when it
comes to interpreting the results of any statistical test.
• What is the direction of the effect size? Is it a positive, negative, or null relation-
ship between two variables of interest? Which group has the higher/lower mean
score on the outcome you’re interested in, or are the means of the two groups
similar?
• What is the magnitude of the effect size? How large is the correlation between
the two variables? How large is the difference in the outcome between two groups?
• What is the significance of the effect size? How likely is it that your result
occurred by chance?
and magnitude of effects and less emphasis on significance. I would say that the
significance of a result tends to be the least commented dimension of a statistic in a
meta-analysis. What matters more is direction and magnitude.
The direction of an effect size statistic can be reported using one of three categories:
positive, negative, or null. These are reported slightly differently for correlations
and effect size differences. For instance, a correlation’s direction can be described
as a (1) positive correlation, for example, r = 0.45, as attitudes towards binge drink-
ing get more positive so do drinking intentions; (2) negative correlation, for exam-
ple, r = −0.45, as intentions to limit binge drinking episodes increase, the frequency
of binge drinking episodes decreases; or (3) null correlation, r = 0.00, there is no
linear relationship between perceptions of control and binge drinking episodes.
Alternatively, a positive effect size difference is where the intervention group do
better on the outcome than the control group, for example, d = 0.35, intervention
participants drank less alcohol than control participants, a negative effect size differ-
ence is where the control group do better on the outcome than the intervention
group, for example, d = −0.35, control participants drank less alcohol than interven-
tion participants, while a null effect size difference, for example, d = 0.00, means
that control and intervention participants drank similar amounts of alcohol. The
direction of an effect size is not always made explicit in psychology papers. Perhaps
this is because some see it as obvious. I must admit, that outside of writing up meta-
analysis results, I rarely focus on the direction of an effect size unless it is unex-
pected, for example, when you expect a positive correlation and find a negative one
instead. Regardless of reporting of direction in primary papers, in meta-analyses,
the direction of effect sizes is important to report as it helps you infer what is hap-
pening across studies. I will talk more about direction when we discuss data synthe-
sis in Chap. 7.
The magnitude (size) of an effect size is the most important statistical dimension in
meta-analysis; the main reason for running any meta-analysis is to precisely deter-
mine the overall effect size (sometimes called a point estimate) for your set of stud-
ies (samples). Calculating the effect size across studies enables you to speak with
greater confidence about the effect size you are interested in than the authors of
primary papers testing the effect size because by pooling results from multiple
papers, your results are based on a meta-analysis of multiple studies (samples). This
gives you greater authority about the effect size.
Magnitude is usually thought of in terms of three categories: small, medium, or
large. Values for these categories differ depending on the effect size you are inter-
ested in; go back to Table 1 in Cohen (1992) to see what I mean. So, while it is true
The Correlation Coefficient (r)—An Effect Size Familiar to Psychologists 23
results across studies were heterogeneous and were almost as dispersed as those
reported by Ashford et al. (2010) (see Fig. 2.1).
cc Box 3.1 Why Are You Unlikely to Find Papers Reporting Perfect Negative,
Perfect Positive or Null Relationships in Peer Review Journal
Articles? Although it is possible for psychologists to publish papers
reporting perfect negative (r = −1) or perfect positive (r = 1) correlations,
it is unlikely you will find correlations like these in the published
literature. One reason is that most psychologists who conduct
correlational analyses are aware of the principle of multicollinearity,
which is where two variables are so highly correlated they are essentially
measuring the same thing. This makes the variables redundant as
predictor variables in a regression model because they account for the
same variance in the outcome as each other. If you include either in your
model you will get similar results. Indeed, having both variables in your
model reduces model fit, because you are adding two variables that
account for the same variance in the outcome variable, and models fit
better when predictors are (relatively) independent of one another.
Statistical textbooks recommend a cut-off of r = 0.80 for multicollinearity
when conducting regression analyses. Due to this, psychology papers
rarely report results that exceed r = 0.80, because they know they will
be criticised by reviewers for analyses that are multicollinear and hard
to interpret as a result. This makes reporting perfect negative (r = −1) or
perfect positive (r = 1) extremely rare in the psychological literature.
Null relationships are not necessarily as rare as perfect negative or
perfect positive relations, but would appear to reflect a failure in
research design—why would you want to report results of a study that
shows no linear relationship between two variables you believed (before
conducting the study) to be linearly related? Most of the studies I have
included in meta-analyses of correlations have reported non-null
relationships, although my meta-analysis of the relationships between
perceived control and drinking intentions (Cooke et al., 2016) did
include several studies that reported null relationships. Null relationships
are likely to be rare in your literature of interest because researchers are
more likely to report results that show significant linear relationships.
Reporting null relationships is unlikely to help your paper get published
(see Chap. 13), although the move to publish results on the Open
Science Framework should help address this issue somewhat.
It is worth noting that the sign (+ or −) before your correlation makes no differ-
ence to the magnitude of the correlation; correlations of −0.20 and 0.20 are both
small sized. Box 3.2 provides an example of how to interpret a correlation from a
primary study to show how to work out the direction and magnitude of individual
correlations.
Often, the result reported in Box 3.2 would be used to inform further primary
studies. For instance, you might want to increase physical activity intentions and
believe that if you make attitudes more positive, you will increase intentions—this
reasoning underlies Ajzen’s (1991) theory of planned behaviour and other health
psychology models of behaviour. So you design an intervention providing informa-
tion to make physical activity attitudes more positive. Alternatively, the result could
be the first step of a meta-analysis, a secondary study. By computing the correlation,
you’ve taken your first step towards conducting a meta-analysis of studies that cor-
relate physical activity attitudes with physical activity intentions. You have your
first effect size needed for a meta-analysis, that is, the correlation r = 0.35. Your next
step would be to systematically search for similar correlations reported by other
researchers in academic journals, grey literature, and open science portals like the
Open Science Framework (see Chap. 4). After completing the search, you identify
included studies, extract the correlations and sample sizes (see Chap. 5), quality-
appraise studies (see Chap. 6) before pooling results using meta-analysis (see
Chap. 7).
The aim of this section was to outline how to interpret the direction and magni-
tude of correlations—the same process for interpreting a single correlation in terms
of direction and size is used when interpreting a correlational meta-analysis. Once
you have computed your sample-weighted average correlation, you interpret this
with reference to Cohen’s (1992) guidelines covered on the previous page. Thus,
how you interpret a correlation from a single paper is identical to how you interpret
26 3 Identifying Your Effect Size
the result of a meta-analysis of correlations (see Chap. 7). Having covered a statistic
you are familiar with I’ll next move on to one you are probably less familiar with—
the effect size difference (d).
negative effect size differences, but, because effect size differences are unbounded
there is no such thing as a perfect positive or perfect negative effect size difference.
A positive effect size difference usually means intervention participants have per-
formed better on the outcome than control participants, for example, d = 0.35, could
mean intervention participants self-reporting more physical activity three months
after receiving the intervention compared to control participants. A negative effect
size difference often means that scores on the outcome are better in the control
group than the intervention group, for example, d = −0.35, could mean control par-
ticipants self-reporting more physical activity three months after receiving the inter-
vention compared to intervention participants. This may not be expected, but it is
possible—maybe control participants were offered a free gym membership during
the study.
Unlike correlations, where the sign almost always means the same direction,
with effect size differences the sign depends on the order in which you enter the
mean values for the two groups. For example, if you hypothesise there will be a
positive effect size difference on physical activity of receiving the intervention, you
are saying you expect that intervention participants will increase their physical
activity at follow-up more than control participants. If this is the case, enter your
intervention mean first which has the effect of producing a positive result because
the control mean will be subtracted from the intervention mean like this:
There are also times when you expect to calculate negative effect size differ-
ences, for example, if you hypothesise a greater reduction in an outcome in the
intervention group than the control (e.g. the intervention group report drinking less
alcohol than the control group). For example, in Cooke et al. (2023) we expected the
effect of the intervention was to reduce drinking behaviour more in participants who
formed implementation intentions. Here is an example of what happens when you
do this:
The key is to decide prior to conducting your meta-analysis which direction you
want to compute effect size differences when you report results and then set up your
analysis to be consistent when conducting the meta-analysis. Always ensure you
enter data in a consistent way otherwise it will be hard to interpret your
meta-analysis.
28 3 Identifying Your Effect Size
Cohen’s (1992) guidelines for interpreting effect size differences are as follows:
• Effect size differences between d = 0.20 and d = 0.49 are SMALL sized.
• Effect size differences between d = 0.50 and d = 0.79 are MEDIUM sized.
• Effect size differences of d = 0.80 or higher are LARGE sized.
It is worth noting that the sign (+ or −) before your effect size difference makes
no difference to the magnitude of the effect size difference; d = −0.30 and d = 0.30
are both small sized. However, as discussed above, be mindful of how you decided
to calculate your effect size differences because unlike correlations, the sign (+ or
−) you decide on can affect the direction you report (see above). Box 3.3 provides
an example of how you can interpret the direction and size of an individual effect
size difference from a primary study.
Box 3.3 provides lots of useful information. We can see that the intervention had
a positive effect on intentions because d = 0.25; after receiving the intervention,
people in the intervention group reported higher physical activity intentions com-
pared to people in the control group. We can also see the effect size difference is
small sized, meaning that our researcher’s intervention did not produce large
changes in intentions. This might make our researcher decide they need to refine
their intervention to make it more effective.
The result from Box 3.3 could also provide the start of a meta-analysis, including
this effect size difference alongside other published examples of interventions tar-
geting physical activity intentions. Your next step would be to systematically search
for studies that also test the effects of interventions aiming to change intentions by
making attitudes towards physical activity more positive published in academic
journals, grey literature, and open science portals like the Open Science Framework
(see Chap. 4). After completing the search, you identify included studies, extract the
means, standard deviations, and sample sizes (see Chap. 5) for both intervention and
Summary 29
control groups, quality-appraise the studies (see Chap. 6) before pooling the results
using meta-analysis (see Chap. 7).
The aim of this section was to outline how to interpret the direction and magni-
tude of effect size differences—the same process for interpreting a single effect size
difference in terms of direction and size is used when interpreting a meta-analysis
of effect size differences. Once you have your sample-weighted average effect size
difference, you interpret this with reference to Cohen’s (1992) guidelines mentioned
above. Thus, how you interpret an effect size difference from a single paper is iden-
tical to how you interpret the result of a meta-analysis of effect size differences (see
Chap. 7). I will end this section by covering a common error in using Cohen’s
guidelines.
Summary
The goal of this chapter was to introduce you to effect sizes, clarifying information
you already know about correlations and helping introduce effect size differences.
Because effect sizes are so instrumental to understanding meta-analysis, we will
return to them throughout the book. You may want to revisit the material covered in
this chapter as it often takes me a few go’s reading through material on new statisti-
cal information for the penny to fully drop. The tasks on the next page should also
help you to develop your knowledge and confidence about effect sizes.
30 3 Identifying Your Effect Size
Having introduced effect sizes in this chapter, we are next going to move on to
cover systematic review essentials in Chap. 4, followed by chapters on data extrac-
tion (Chap. 5), quality appraisal (Chap. 6), and data synthesis (Chap. 7).
Tasks
Complete these tasks to reinforce your learning of the principles covered in this
chapter.
1. What are the three statistical dimensions you can use to interpret the result of a
statistical test?
2. A correlation of r = 0.55 is what magnitude according to Cohen’s (1992)
guidelines?
3. You conduct a meta-analysis of studies assessing the magnitude of the correla-
tion between intentions and physical activity, which produces a result of r = 0.38.
Describe this effect size in terms of direction and magnitude.
4. An effect size difference of d = 0.55 is what magnitude according to Cohen’s
(1992) guidelines?
5. You conduct a meta-analysis of studies testing the impact of receiving a goal-
setting intervention to increase physical activity. The meta-analysis produces a
value of d = 0.35. Describe this effect size in terms of direction and magnitude.
6. Is it easier to find a large-sized correlation or a large-sized effect size difference?
References
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision
Processes, 50, 179–211. https://doi.org/10.1016/0749-5978(91)90020-T
Ashford, S., Edmunds, J., & French, D. P. (2010). What is the best way to change self-efficacy
to promote lifestyle and recreational physical activity? A systematic review with meta-
analysis. British Journal of Health Psychology, 15(2), 265–288. https://doi.org/10.134
8/135910709X461752
Black, N., Mullan, B., & Sharpe, L. (2016). Computer-delivered interventions for reducing alcohol
consumption: Meta-analysis and meta-regression using behaviour change techniques and the-
ory. Health Psychology Review, 10, 341–357. https://doi.org/10.1080/17437199.2016.1168268
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
Cooke, R., & Sheeran, P. (2004). Moderation of cognition-intention and cognition-behav-
iour relations: A meta-analysis of properties of variables from the theory of planned
behaviour. The British Journal of Social Psychology, 43(Pt 2), 159–186. https://doi.
org/10.1348/0144666041501688
References 31
All meta-analyses should be based on the results of a systematic review of the litera-
ture, which means it is important I cover some essential information about what
conducting a systematic review involves. It is beyond the scope of this book to do
more than introduce these ideas. For more information on systematic reviewing, I
recommend reading one of the many excellent books on this topic (e.g. Boland
et al., 2017). My focus in this chapter and the remainder of this section of the book
(i.e. Chaps. 5, 6 and 7) is to provide a brief guide to systematic reviewing when
conducting a meta-analysis. This chapter will emphasise the similarities between
meta-analysis and systematic reviewing, with Chaps. 5, 6 and 7 focusing on differ-
ences between meta-analysis and systematic reviewing when thinking about Data
Extraction (Chap. 5) Quality Appraisal (Chap. 6) and Data Synthesis (Chap. 7).
When I teach postgraduate psychology students about systematic reviewing, I
begin by outlining the six steps I follow when conducting a systematic review:
To complete a meta-analysis, you also need to follow these steps, although how
you complete them sometimes differs between meta-analysis and systematic review-
ing. The main aim of this chapter is to talk you through the first three steps as they
apply to conducting a meta-analysis. While much of this advice also applies to con-
ducting a systematic review, I wanted to give explicit examples for those running a
meta-analysis following these steps. If you are familiar with these steps, you are
welcome to head to Chap. 5 on Data Extraction.
As is the case with any research study, it is important to specify a research question
in advance of conducting the study. For meta-analyses (and systematic reviews) I
think of these as review questions, which are the equivalent of research questions in
primary papers when you are conducting a secondary analysis. Clearly specifying
your review question helps you to screen out irrelevant papers. Below are review
questions my meta-analyses of correlations:
• Cooke and Sheeran (2004) ‘The main aim of the present study is to provide the
first quantitative review of the properties of cognitions as moderators of cogni-
tion-intention and cognition-behaviour relations.’
• Cooke and French (2008) ‘The present study examines the strength of five rela-
tionships within the TRA/TPB—attitude-intention, subjective norm-intention,
PBC-intention, intention-behaviour, PBC-behaviour—in the context of individu-
als attending a health screening programme’.
• Cooke et al. (2016). ‘The present study examines the size of nine relationships
within the TPB in the context of alcohol consumption: attitude-intention, subjec-
tive norm-intention, PBC-intention, SE-intention, PC-intention, intention-
behaviour, PBC-behaviour, SE-behaviour, PC-behaviour’.
As you can see, there is some consistency in how these questions are phrased—
all three refer to relations or relationships, explicitly telling the reader we meta-
analysed correlations, because correlations are used to test relationships between
variables. The 2008 and 2016 metas (short for meta-analysis) both specify a behav-
iour type, health screening programme, alcohol consumption, respectively, to nar-
row the focus of the meta-analysis. As the aim of the 2004 meta was to meta-analyse
properties of cognition (e.g. how accessible attitudes were in memory, how stable
intentions were over time) as moderators of cognition–intention (e.g. attitude–inten-
tion) and cognition–behaviour (intention–behaviour) relations, we did not specify a
behaviour type in this question. We wanted to include all studies we could find.
Reflecting on these questions, when constructing a review question for a correla-
tional meta-analysis, I would include the word relation or relationship or association
in the question. You may also specify a behaviour type(s) for correlations you are
interested in, like in the 2008 and 2016 papers. If you are interested in meta-
analysing results from studies testing a theory, you could mention this in the ques-
tion, like we did in the 2008 meta.
Below are examples of review questions from meta-analyses of experimental
studies I authored or co-authored:
How Many Review Questions Should I Specify? 37
• Newby et al. (2021). ‘What is the overall effect of digital automated behaviour
change interventions on self-efficacy?’
• Cooke et al. (2023). ‘The primary aim of the present systematic review and meta-
analysis is to estimate the effect of forming implementation intentions on weekly
alcohol consumption’.
Both questions mention ‘the effect of…’ something which speaks to including
data from studies testing an experimental manipulation or evaluating an interven-
tion designed to cause a change in an outcome of interest, which meta-analysts will
recognise as a meta of effect size differences (see Chap. 3). Both questions mention
an intervention type: digital automated behaviour change; implementation inten-
tions. A difference between the questions is in the focus (or not) on a behaviour type.
These questions broadly map onto the PICO (Population, Intervention,
Comparator, Outcome) framework (described in more detail in ‘Step 2. Defining
your inclusion criteria’). They both refer to an Intervention type (digital behaviour
change; implementation intentions) and an Outcome (self-efficacy; alcohol con-
sumption). Moreover, intervention studies often include a Comparator (control/
comparison) group. No Population though! PICO was developed for studies testing
RCTs, which by design have an Intervention group, at least one Comparator (con-
trol/comparison) group, an Outcome, and often, a target Population. I think PICO is
more helpful for psychologists running meta-analyses of effect size differences than
meta-analyses of correlations for several reasons, including - meta-analyses of cor-
relations including data from samples who did not receive an intervention (or con-
trol) material, and there not being an obvious outcome, as, you are interested in
pooling results from a correlation not an effect size difference based on an outcome.
This is not to say you cannot use PICO in crafting correlational meta-analysis ques-
tions, only that it is not easy to do so.
• Cooke et al. (2016). ‘The second aim is to assess the extent to which several
moderator variables affect the size of TPB relationships: (a) pattern of consump-
tion, (b) gender of participants and (c) age of participants’.
• Newby et al. (2021). ‘Does the overall effect of automated digital behaviour
change interventions on self-efficacy vary as a function of the behaviour being
addressed?’
38 4 Systematic Review Essentials
• Cooke et al. (2023). ‘The secondary aim is to investigate the impact of sample
type, mode of delivery, intervention format and time frame as moderators of
effect size differences’.
In each case, we are telling the reader we believe that the effect sizes of included
studies may differ due to a moderator variable(s) and that we need to test these
effects. This is common practice in meta-analysis, as, it is almost always the case
that there is significant heterogeneity in effect sizes between studies in the overall
effect size. When completing pre-registration of the systematic review that informs
your meta-analysis, for example, by pre-registering your review protocol with
PROSPERO, it is a good idea to mention moderators too. I will discuss pre-
registration at the end of this chapter.
• ‘Studies had to report the sample size for both control and intervention groups
and the mean and SD (standard deviation) for the outcome variable(s)…to allow
for calculation of the effect size difference (d)’.
Specifying a statistical inclusion criterion tells the reader that a lack of statistical
information is grounds to exclude a study from a meta-analysis. When statistical
information is not reported in the paper that meets all other inclusion criteria, my
next step is to email the authors to request the information. I first had to do this when
completing Cooke and Sheeran (2004) and while it felt quite intimidating as a PhD
student to contact academics about their work, my experience of doing so has been
overwhelmingly positive. In one case, an author went and dug out data from a set of
old files! In other cases, authors report how happy they are that someone has shown
interest in their research—I’m always delighted to be asked to provide data from
one of my studies and do my best to respond to all requests I get.
How Many Inclusion Criteria Should I Have in a Meta-Analysis? 39
The meta-analyses I have authored or co-authored have had between three and six
inclusion criteria, apart from my first (Cooke & Sheeran, 2004). There is one crite-
rion I almost always use: all my metas refer to measurement of an outcome (Cooke
et al., 2023; Newby et al., 2021) or a relationship between two variables (Cooke &
French 2008; Cooke & Sheeran, 2004; Cooke et al., 2016). I would argue that this
will be the case with your meta-analysis too—I find it hard to imagine how you
could conduct a meta-analysis without referring to measurement in the inclusion
criteria because measurement, either of outcomes measured following exposure to
40 4 Systematic Review Essentials
I use the Table 4.1 when introducing PICO and acknowledge that it can be a really
useful way to think about a systematic review, especially if you are reviewing a lit-
erature that uses RCTs or experiments, to evaluate effects on an outcome.
I also think that PICO can be useful for meta-analyses of experimental/interven-
tion studies, with a slight modification. I teach my students that I prefer PICOS to
PICO because when I conduct systematic reviews, it is likely there will be hetero-
geneity in study designs. In PICO, it is assumed that all studies will use an RCT
study design, which makes sense if you are looking at things from a public health
perspective—why would you want to include studies that use weaker study designs
when you are evaluating important public health issues? To the humble psycholo-
gist, however, we must accept that there aren’t likely to be as many RCTs in our
field. Instead, we are likely to find a range of different study designs, including
factorial designs with two or more groups (control, experimental/intervention), pre-
post designs, quasi-experiments, and some RCTs. We are also unlikely to have
A search strategy is your search terms plus your search sources. Ideally, your strat-
egy should use terms that mainly access relevant papers for your meta-analysis, but
in reality, you often end up with a lot of irrelevant studies. I’ll start by discussing
search sources and then move on to search terms.
Search Sources
Search sources are the places you search to identify papers to include in your meta-
analysis, for example, bibliographic databases, journals, websites that publish
reports, et cetera. The choice of sources determines what you will find. Most sys-
tematic reviews and meta-analyses search from a rather restrictive pool of data-
bases, with apparently little justification or rationale for this decision. I suspect that
many researchers do what I do which is to use databases we are familiar with or
those that have been used in previously published review articles. There is some
merit in this idea, because to my knowledge no one has set down any criteria for
which databases one should search or how many one should search. Without crite-
ria, researchers tend to do what they think makes sense, with an eye to existing
papers as a guide.
42 4 Systematic Review Essentials
Looking back over my meta-analyses, I searched between two and six databases.
I always searched Web of Science, because this was the database I was taught how
to use during my PhD. Web of Science always seemed to produce a set of relevant
results from the psychological literature. Since Cooke and French (2008), I’ve also
always searched PubMed (sometimes as part of MEDLINE). This reflects my move
from social to health psychology; PubMed is a good journal for checking medicine
and public health journals and is free to use because it is maintained by the National
Institute for Health in the USA. Searching PubMed and Web of Science using the
same search terms is a good way to see what two databases, which index different
journals return in response. If, after your search, you find a paper in both databases
then you know they both index that journal. If you find a relevant reference in one
but not the other database, you can be sure that both databases don’t index that jour-
nal. This is a key reason to search more than one database. If you only search one,
you might miss relevant papers for your meta.
In addition to these two mainstays of my search sources, I often searched another
database that I was confident would index psychology articles, like PsychArticles or
PsychInfo. This was done to check that my Web of Science results were not missing
relevant studies published in journals not indexed by Web of Science. I’ve also con-
tributed to searches where we have used Scopus, and that seems like a good option
for psychologists, if you don’t have access to Web of Science. In Cooke et al. (2023),
because we were searching for intervention studies, I also searched the Cochrane
database. If you are doing a meta-analysis of effect size differences, it is worth
searching the Cochrane database to ensure someone else has not already done the
meta you are planning to do! Other databases that might be relevant for your meta-
analysis include CINAHL (a database of nursing research), EMBASE (a database of
medical research, that requires a subscription). There are probably other databases
relevant for educational, organisational, forensic, and other psychological sub-
disciplines, but, as I’ve not done reviews in these areas, I do not know what they are.
I have also searched Google Scholar when conducting a meta-analysis and while
I think it is ok to include this as one of several search sources, I would not exclu-
sively search Google Scholar when conducting your search because it is unclear
whether all papers have been peer reviewed. Nevertheless, searching Google Scholar
can identify papers missed by other databases, and indexes citations in languages
other than English, so it is worth considering as one of your sources.
In Cooke and Sheeran (2004) and Cooke et al. (2023), I also searched grey litera-
ture, also known as unpublished literature. In both cases, I searched databases that
index PhD theses and this was done to ensure that I found as many papers as pos-
sible because there was a lack of research studies. In the UK and Ireland, the data-
base of all PhD theses is called EThOS. It is open access, so feel free to search for
completeness or if you find a lack of papers.
Sometimes, I am asked how many sources should I search? I have seen some
reviews where teams have searched more than ten databases, which seems excessive
to me. Searching is a resource-intensive activity and while searching ten databases
seems better than searching four or five, my thought on this is how many additional,
unique, papers does searching the additional five databases generate (and is it worth
A Brief Section on PICO 43
doing)? The law of diminishing returns would suggest you are better off searching
fewer databases that are likely to contain the most relevant studies for your
meta-analysis.
I’ll end with some advice about which sources to search. First, always search a
source that indexes the psychological literature: Scopus, Web of Science, or any
database with the word Psych in the title are good options to meet this criterion.
Second, always search a source that indexes a related, but relevant, area. As a Health
Psychologist, PubMed fulfils this criterion for me. If you are an educational (or
developmental) psychologist, then look for databases of educational research to
search. If you are unsure how to find these, look at published meta-analyses or sys-
tematic reviews from your area to see which databases were searched. Third, always
search at least two databases—this protects you against missing papers due to
indexing issues with the databases. No database indexes all journals. Finally, think
carefully about whether to search the grey literature or not. If you are conducting a
search for studies on a popular topic, for example, interventions to promote healthy
eating in children, you will likely find enough studies for a meta-analysis without
searching the grey literature. Similarly, if you are interested in RCTs, you are
unlikely to find too many of these in the grey literature. If, on the other hand, you
are searching for studies on more obscure topics, like in Cooke and Sheeran (2004),
it makes sense to look at the grey literature, especially PhD theses, because this
might help increase your set of studies. I think the key question is to ask yourself
‘Why should I search the grey literature?’ and if you can come up with any plausible
answer, then go ahead and search!
Search Terms
Search terms are the words or phrases used to identify relevant studies when search-
ing an electronic database. While your set of search terms can end up being really
lengthy, as you come up with all the synonyms you can think of, I usually generate
my list of terms by going back to the review question. Let’s use Cooke and French
(2008) as an example. Here’s the review question again:
• Review Question ‘The present study examines the strength of five relationships
within the TRA/TPB—attitude-intention, subjective norm-intention, PBC-
intention, intention-behaviour, PBC-behaviour—in the context of individuals
attending a health screening programme.’
• Search Terms
• ‘theory of reasoned action’, ‘theory of planned behavio*1’, ‘screening’, mam-
mograph’, ‘cervical’, ‘health check/screening’ and ‘attend’.
1
The asterisk is known as a wildcard. It means that when searching any paper that contains the
letters before the * will be included in the search list regardless of what follows. In this case, we
inserted the * because we wanted to make sure our output contained papers using both the English
44 4 Systematic Review Essentials
This set of search terms probably seems very short, but it captures the essentials
of the review question—we searched for papers on the theory of reasoned action
and theory of planned behaviour in screening contexts. Other terms (mammograph,
cervical, health check/screening) were included in case a more specific type of
screening (e.g. cervical) was mentioned in the abstract. Because we were interested
in papers that tested theories, we ended up with a small set of search terms. If either
theory was not mentioned in the abstract, then we excluded the paper. This is an
example of how jargon can be advantageous. You can also see we did not make any
reference to study type in our search terms. As most studies testing these theories
used correlational designs, there was no need to specify a design. Please note this
may not be the case with your meta-analysis, so, be mindful of study design when
thinking about your own search terms.
Let’s next look at the review question and search terms for Cooke et al. (2023):
• Review Question: ‘The primary aim of the present systematic review and meta-
analysis is to estimate the effect of forming implementation intentions on weekly
alcohol consumption.’
• Search Terms
• ‘implementation intentions’, ‘alcohol’, ‘binge drink*’.
Even fewer search terms than the ones above!!! Once again, we took advantage
of jargon, specifically, the phrase implementation intentions. We were able to
exclude many papers because they did not mention implementation intentions. The
meta-analyses I have co-authored (Birdi et al., 2020; Newby et al., 2021), have typi-
cally used broader search terms, so, I am not saying you must be as minimalist as I
am. However, there can be a cost to having too many search terms; finding 1000s (or
10,000s) of irrelevant papers. I’ve read abstracts where researchers report results of
a systematic review that screened 30,000 or 40,000 results. That seems wasteful to
me, especially as these comprehensive reviews have not yielded many more papers
for inclusion than my approach. There is a balance to strike between being compre-
hensive in your searching, by using broad terms that capture all the relevant papers
but also a lot of irrelevant papers, and more specific searching that may miss a few
relevant papers but also excludes most of the irrelevant papers you would have
excluded anyway. In a meta-analysis, I would always favour more specific search
terms because your goal is to pool results testing an effect size. Unless papers report
a test of a specific effect size, you are probably not going to be able to include
the paper.
Another thing to bear in mind when doing a meta-analysis, as opposed to a sys-
tematic review, is that the meta-analysis requires the study to report statistical infor-
mation to be included, whereas a systematic review does not. Based on my
experience of searching for papers to include in meta-analyses, you should expect
to find a relatively small pool of studies to full-text screen. I typically find between
and American spellings of the word behaviour (behaviour, behavior). Locating the * after the o, the
last letter than is common to both spellings, achieved that aim.
PROSPERO 45
150 and 300 papers to screen. I would say, if you have more than 1000 hits after
removing duplications for a meta-analysis, you should probably go back to your
search terms and see if there is a way to reduce this number. While I think it’s rea-
sonable to have more than 1000 hits for a systematic review, I would be wary of
screening more than 1000 hits for a meta-analysis because I think it’s highly unlikely
you will find sufficient papers have included the statistical information needed for
your meta. To quote Hagger et al.’s exceptionally impressive meta-analysis of the
common sense model of illness (2017):
The literature research identified 333 articles that met inclusion criteria on initial screen-
ing…A substantial proportion of eligible articles (k = 172) did report the necessary data for
analysis.
PROSPERO
already exists on your topic, or a closely related topic, you might want to rethink
your review area.
I first used PROSPERO to pre-register Cooke et al. (2023) and found it to be a
simple process: you complete an online form, outlining your review questions,
review team, inclusion criteria, et cetera. It is set up for traditional systematic
reviews of RCTs, and maps onto PICO, but it is fairly easy to navigate. For meta-
analyses, the PROSPERO team are keen on you providing precise information on
how you plan to meta-analyse your data: What is your effect size? What method of
pooling do you plan to use? How will you report heterogeneity? How will you iden-
tify publication bias? I will cover all these issues in detail in Chap. 7, when we cover
data synthesis.
Three more things to say about PROSPERO. First, registration is an active pro-
cess. The PROSPERO team will ask you to make changes to your submission if they
do not feel it is sufficiently clear. While this can be frustrating it usually yields a
better protocol. Second, when writing up your findings for submission to a journal,
go back to your protocol for handy information, like What was your review ques-
tion??! Or your inclusion criteria? The process of screening papers, data extraction,
quality appraisal, and data synthesis takes time, so, it is worth going back to your
protocol to remind yourself what the plan was at the start, as in my experience, it is
all too easy to get side-tracked from what you originally proposed. Also, include
your PROSPERO registration number in your publication as this is required by
many journals. You may have to blind this information if you submit your meta to a
journal that uses blind peer review, otherwise, a reviewer could look up who did the
review! Finally, when you have finished your review, go back to PROSPERO and
update the status of your review. This can seem daunting—admitting you are nearly
finished with your review means losing some control over it—but it is essential to
provide this update because it keeps the register current. PROSPERO contains
‘ghost’ registrations, for reviews that have been started but not published—don’t
add to their number. It is really satisfying to update your registration to published,
so, don’t forget to do this either. I recommend all meta-analyses are pre-registered
as this increases the transparency of reviews.
PRISMA
Once you have decided on your final set of studies you can complete your
PRISMA flowchart, which can be downloaded from the PRISMA website, because
you know how many studies have been found in your searches and how many you
have included/excluded at each step of screening. The flowchart illustrates this
information in a really clear way.
The second part of the PRISMA process is to complete the PRISMA checklist,
which is done right before submission of the paper to a journal. You do this last
because the checklist asks you to specify on which page numbers you have reported
various aspects of your meta-analysis. Doing this at the end means you know which
page(s) in the final version of the paper contain these pieces of information. It won’t
take you more than 30 minutes to complete the checklist. We’ll talk more about
completing PRISMA processes when discussing writing up meta-analysis later in
Chap. 15.
Summary
In this chapter, I have covered the essentials of the first three steps of conducting a
systematic review as part of completing a meta-analysis. I have used some of my
past review questions, inclusion/exclusion criteria and search terms and sources to
show how I have completed systematic reviews as part of my meta-analyses, and I
hope that offering these examples will help you complete the searching and screen-
ing part of your own systematic review prior to completing your meta-analysis. The
next chapter will cover data extraction when conducting a meta-analysis.
References
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision
Processes, 50, 179–211. https://doi.org/10.1016/0749-5978(91)90020-T
Ajzen, I., & Fishbein, M. (1973). Attitudinal and normative variables as predictors of specific
behaviors. Journal of Personality and Social Psychology, 27, 41–57.
Birdi, G., Cooke, R., & Knibb, R. C. (2020). Impact of atopic dermatitis on quality of life in adults:
A systematic review and meta-analysis. International Journal of Dermatology, 59(4). https://
doi.org/10.1111/ijd.14763
Boland, A., Cherry, G., & Dickson, R. (2017). Doing a systematic review: A student’s guide (2nd
ed.). SAGE Publications.
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., & French, D. P. (2008). How well do the theory of reasoned action and theory of
planned behaviour predict intentions and attendance at screening programmes? A meta-analy-
sis. Psychology & Health, 23(7), 745–765. https://doi.org/10.1080/08870440701544437
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
48 4 Systematic Review Essentials
Cooke, R., & Sheeran, P. (2004). Moderation of cognition-intention and cognition-behaviour rela-
tions: A meta-analysis of properties of variables from the theory of planned behaviour. The British
Journal of Social Psychology, 43(Pt 2), 159–186. https://doi.org/10.1348/0144666041501688
Hagger, M. S., Koch, S., Chatzisarantis, N. L. D., & Orbell, S. (2017). The common sense model
of self-regulation: Meta-analysis and test of a process model. Psychological Bulletin, 143(11),
1117–1154. https://doi.org/10.1037/bul0000118
Hagger, M. S., & Orbell, S. (2003). A meta-analytic review of the common-sense
model of illness representations. Psychology & Health, 18(2), 141–184. https://doi.
org/10.1080/088704403100081321
Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P. A., Clarke,
M., Devereaux, P. J., Kleijnen, J., & Moher, D. (2009). The PRISMA statement for report-
ing systematic reviews and meta-analyses of studies that evaluate health care interventions:
Explanation and elaboration. Journal of Clinical Epidemiology, 62, e1–e34. https://doi.
org/10.1016/j.jclinepi.2009.06.006
Newby, K., Teah, G., Cooke, R., Li, X., Brown, K., Salisbury-Finch, B., Kwah, K., Bartle, N.,
Curtis, K., Fulton, E., Parsons, J., Dusseldorp, E., & Williams, S. L. (2021). Do automated digi-
tal health behaviour change interventions have a positive effect on self-efficacy? A systematic
review and meta-analysis. Health Psychology Review, 15(1), 140–158. https://doi.org/10.108
0/17437199.2019.1705873
Todd, J., Kothe, E., Mullan, B., & Monds, L. (2016). Reasoned versus reactive prediction of behav-
iour: A meta-analysis of the prototype willingness model. Health Psychology Review, 10(1),
1–24. https://doi.org/10.1080/17437199.2014.922895
van Lettow, B., de Vries, H., Burdorf, A., & van Empelen, P. (2016). Quantifying the strength
of the associations of prototype perceptions with behaviour, behavioural willingness and
intentions: A meta-analysis. Health Psychology Review, 10(1), 25–43. https://doi.org/10.108
0/17437199.2014.941997
Data Extraction for Meta-Analysis
5
A key difference between meta-analysis and systematic review is that when con-
ducting a meta you are particularly interested in the statistical information reported
by studies because, if a study does not report statistical information, it cannot easily
be included in a meta-analysis. This difference between meta-analysis and system-
atic review has an important impact on how you think about applying your inclusion
criteria during data extraction. In a systematic review, you typically specify an out-
come of interest—blood pressure, physical activity, condom use, educational per-
formance. As long as this outcome is measured in some way, you can often include
the study in your systematic review, assuming other inclusion criteria are also met.
In contrast, in a meta-analysis, your focus shifts from the outcome being present in
some form in the paper to being used as part of a statistical test, that is, the outcome
being correlated with a predictor of interest, or the outcome being compared
between two groups at follow-up, or the same group over time. This shift in focus
means that when you full-text screen papers for a meta-analysis, you must be able
to extract statistical information to enable you to pool effect sizes in a meta-analysis.
Failure to report statistical information is a valid reason for excluding a paper from
a meta-analysis.
Imagine that you have completed your systematic review and identified papers to
include in your meta-analysis, that is, you’ve completed the first three steps of a
systematic review as outlined in Chap. 4. The next step is to create a study charac-
teristics table with the author names and years for the included studies in the
leftmost column and additional (blank) columns for the information you will extract.
I also recommend you create a data extraction form at this stage to help consistently
extract information from the included studies as discussed in the next section.
I would start by creating a data extraction form that contains fields for key informa-
tion. In almost all forms, there are common fields: authors and year of publication,
country of study, sample details. Inclusion of other fields depends on the study
design and effect size. For instance, in a meta-analysis of effect size differences, you
need to extract study design information, including the design (RCT, quasi-
experiment, pre-post), follow-up time frame as well as information about the exper-
imental manipulation/intervention content. I also like to code information about the
content of the control condition as such information can often help you interpret the
results of your meta-analysis. For the meta-analysis, you’ll need to extract either (1)
the means and standard deviations for the outcome(s), plus the sample sizes, for
both the experimental/intervention and control/comparison groups or (2) the effect
size and a measure of variance (sample error or study variance), if the authors have
reported these. Either set of statistics is required to conduct the meta-analysis (see
Chaps. 7 and 8). Alternatively, a form for a meta-analysis of correlations needs to
extract the overall (total) sample size and extract at least one, and often, multiple
correlations; in Cooke et al. (2016), we reported tests of eight theoretical relation-
ships based on independent correlations. In the next two sections, I am going to
show you how I extracted data from one study included in a correlational meta-
analysis (Cooke et al., 2016), and separately, how I extracted data from one study
included in an experimental meta-analysis (Cooke et al., 2023). I’ll use the tem-
plates available on my Open Science Framework website https://osf.io/4zs5k.
Table 5.1 shows you the completed data form for Paul Norman and Mark Conner’s
(2006) paper which was included in Cooke et al.’s (2016) meta-analysis of TPB
alcohol studies.
Extracting study author names and year of study is an easy place to start—they
are listed on the first page of the paper! It’s sometimes harder to determine the coun-
try of study—it’s amazing how many papers do not report this information in the
methods section, with this paper being no exception. I think the main reason for this
is to ensure that the study is blinded for peer review. In this case, I assumed that as
both authors are affiliated with English universities, the study was conducted in
England. When doing data extraction for meta-analyses, you often must make edu-
cated guesses like this. The rest of information you need to extract is reported in the
methods and results sections of the paper. My tendency is to look for the
Data Extraction for a Correlational Meta-Analysis 51
correlation—that’s it!!! So, make sure you extract the sample sizes being mindful
that sometimes sample sizes vary between relationships—check your paper care-
fully, especially notes below the correlation matrix.
After extracting correlations for the theoretical relationships—literally the most
important information in this meta-analysis—I finished off the form by extracting
information about the sample type (Undergraduate psychology students), sample
demographics reported by authors (mean age (and standard deviation) or age range,
number of male and female participants) and made a note about recruitment method.
Some psychology papers talk about recruitment but not all do so; you may have to
leave this blank. I also extracted the follow-up period which is important because
this meta-analysis was looking at studies using prospective designs, that is, where
behaviour is measured at a later time point to psychological predictors. There’s a
nice paper by Hagger and Hamilton (2023) on this issue that I recommend you read
if you want to know more about such designs. My PhD was focused on the stability
of cognitions, such as attitudes and intentions, over time, so I have usually looked at
the time frame between measurement of constructs or behaviour and you may end
up using this as a moderator variable (see Chap. 12).
Table 5.2 shows the completed data extraction form from Chris Armitage’s (2009)
paper, which was included in Cooke et al. (2023). Like the correlational example,
the first field contains the Study Authors and Publication Year. Chris also reports the
study location in the method section (North of England) and information on the
sample representativeness, which is quite rare in psychology papers. The fourth
field—Intervention description—is really important when doing data extraction for
a meta-analysis of effect size differences. Include as much information as you need
here as this is a field you are likely to use when interpreting the results of your
meta-analysis.
The fifth field—Control description—is also important in my opinion. Sometimes
what the control groups did (or did not do) can impact on interpretation of the
results of your meta-analysis (see De Bruin et al., 2021; Kraiss et al., 2023, for more).
The next three fields are quite specific to this meta-analysis and may not be
needed in your meta-analysis. First, in many papers in this literature, researchers
combined implementation intention interventions with other interventions, usually
focused on increasing motivation, so, it was useful to code if the intervention was a
stand-alone or part of a combined intervention. Coding this information allowed us
to run an exploratory analysis to see if receiving the intervention in combination or
as a stand-alone intervention affected effect size differences, which it did not.
Second, when we pre-registered our protocol on PROSPERO, we thought that mode
of delivery (i.e. face-to-face, online, paper) and follow-up time point(s) might both
moderate the overall effect size. So, we recorded information about these factors on
our form. I’ll talk more about moderators in Chap. 12.
Data Extraction for an Experimental Meta-Analysis 53
Table 5.2 Data extraction form for meta-analysis of effect size differences
Study label (authors + Armitage (2009)
publication year)
Study location (country England
where data was collected)
Sample characteristics Shopping malls and working environments; 18–74 years old
and recruitment (M = 38.4; SD = 15.46) 125/113 women (53%) to men; 92%
white (educational qualifications also noted)
Intervention description If-then plans:
Self-generated implementation intention; experimenter generated
implementation intention
Control description Two control conditions:
Passive control: mere measurement
Active control: asked to plan how to reduce their alcohol
consumption but not provided with any guidance on how to do this
Standalone/combined Stand-alone
Mode of delivery Face-to-face
Follow-up timepoint(s) One month
Outcome(s) Weekly drinking
Control sample size 24* (passive control); 21 (active control)
Intervention sample size 18* (self-generated implementation intention); 16 (experimenter-
generated implementation intention)
Statistics reported in Means + SDs
paper
Control mean/SD for BD 5.49 (2.94)
Intervention mean/SD for 4.42 (2.70)
BD
Notes:
• Samples were split into low- and high-risk groups based on past drinking. Effect sizes were only
calculated for the high-risk groups as there was no effect of the intervention in low-risk groups
• We decided to compute the effect size difference between the self-generated implementation
intention and passive control because this is the comparison done in most other included studies
As there are already data extraction forms you can download and use, you might
wonder why I have not recommended using them? It’s mainly because I’m familiar
with my forms, and that because I created them they fit my needs. That’s not to say
available forms are bad or should not be used. If you find a form you like, then use
it. Alternatively, use my forms. Or create your own. What really matters is that you
are consistent in how you extract data, across studies, and I’m not sure there is any
form that will do that for you!
The final section of this chapter focuses on what happens when the information you
are looking to extract is not reported in the paper, or online; sometimes researchers
upload information that they cannot fit into the word count of the paper onto the
Open Science Framework. Assuming the information you want is not available,
what are your options?
The simplest option is to contact the authors and request the information you
need. I have had several lovely email exchanges with authors about data from their
research—I get the feeling that they rarely hear from the wider world about their
work and are happy when they receive emails requesting information. I’ve had
researchers post me chapters of their PhD (hard copy, air mailed!), dig out old data
from long-completed studies and send me SPSS data files of additional analyses. I
always thank them for their efforts by email and make sure I note this generous
activity in the published meta-analysis in the acknowledgments section (see Chap.
15). I also see it as a point of principle that if I am asked for information I always
try and provide it. Reciprocity makes the world a better place. I would argue that the
action of contacting authors to request data does not necessarily undermine the
systematic nature of your meta-analysis, although, if not all authors respond to your
request, you do run the risk of potentially biasing the result of your meta-analysis
via a form of reporting bias (see Chap. 6). This form of reporting bias is bias in the
statistical information you have available to pool due to decisions made by authors
about what they report in the paper. There’s not too much you can do about this, but
you should be aware of it as a potential issue if not all authors you contact respond.
Ultimately, you can exclude papers that do not report the statistics you need,
although I believe this drastic move requires some justification. One scenario that
would justify such an approach is when you are under time pressure to complete
your meta-analysis and do not have time to wait to hear from authors. Further justi-
fication for exclusion is, in my experience, you only end up with a few studies that
do not report the information you require. So, you could argue that excluding the
papers that do not report statistics is unlikely to affect the pooled effect sizes gener-
ated during meta-analyses too much. That is quite a judgment call to make, and I
would rather try to get the information from authors, although I do acknowledge
that this approach won’t work every time. Another option is to convert information
that is reported into an effect size you can use, for example, in Cooke and Sheeran
(2004), we calculated a phi coefficient, a correlation based on frequency data, to
enable us to retain a paper within our analysis. Borentstein et al. (2021) Chap. 7 has
further information on how to convert effect sizes.
Summary
The aim of this chapter was to get you thinking about data extraction for meta-
analyses by providing examples of completed data extraction forms for you to see
how I conducted data extraction. I’ll end with a couple of top tips for data extraction.
56 5 Data Extraction for Meta-Analysis
• Read and re-read your papers multiple times to check you have the correct statis-
tical and methodological information.
• Ask a review buddy to data extract at least 10% of included studies to check you
agree on information (especially things like sample sizes, correlations, and
descriptive statistics). It’s essential to extract the correct statistical information as
making errors means redoing the meta-analyses.
• Keep a folder of data extraction forms for easy access to information when you
are interpreting results of your meta-analysis (especially moderator analyses).
Tasks
Task 1: Complete data extraction using the blank form for Norman’s (2011) paper
(see reference list) for inclusion in a meta-analysis of correlations.
Task 2: Complete data extraction using the blank form for Hagger et al.’s (2012)
paper (see reference list) for inclusion in a meta-analysis of effect size
differences.
References
Armitage, C. J. (2009). Effectiveness of experimenter-provided and self-generated implementation
intentions to reduce alcohol consumption in a sample of the general population: A randomized
exploratory trial. Health Psychology, 28, 545–553. https://doi.org/10.1037/a0015984
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
Cooke, R., & Sheeran, P. (2004). Moderation of cognition-intention and cognition-behav-
iour relations: A meta-analysis of properties of variables from the theory of planned
behaviour. The British Journal of Social Psychology, 43(Pt 2), 159–186. https://doi.
org/10.1348/0144666041501688
De Bruin, M., Black, N., Javornik, N., Viechtbauer, W., Eisma, M. C., Hartman-Boyce, J.,
Williams, A. J., West, R., Michie, S., & Johnston, M. (2021). Underreporting of the active
content of behavioural interventions: A systematic review and meta-analysis of randomised
trials of smoking cessation interventions. Health Psychology Review, 15(2), 195–213. https://
doi.org/10.1080/17437199.2019.1709098
Hagger, M. S., & Hamilton, K. (2023). Longitudinal tests of the theory of planned behaviour: A
meta-analysis. European Review of Social Psychology, 1–57. https://doi.org/10.1080/1046328
3.2023.2225897
References 57
Hagger, M. S., Lonsdale, A., Koka, A., Hein, V., Pasi, H., Lintunen, T., & Chatzisarantis,
N. L. D. (2012). An intervention to reduce alcohol consumption in undergraduate students
using implementation intentions and mental simulations: A cross-national study. International
Journal of Behavioral Medicine, 19, 82–96. https://doi.org/10.1007/s12529-011-9163-8
Kraiss, J., Viechtbauer, W., Black, N., Johnston, M., Hartmann-Boyce, J., Eisma, M., Javornik,
N., Bricca, A., Michie, S., West, R., & De Bruin, M. (2023). Estimating the true effectiveness
of smoking cessation interventions under variable comparator conditions: A systematic review
and meta-regression. Addiction, 118(10), 1835–1850. https://doi.org/10.1111/add.16222
Norman, P. (2011). The theory of planned behavior and binge drinking among undergraduate stu-
dents: Assessing the impact of habit strength. Addictive Behaviors, 36(5), 502–507. https://doi.
org/10.1016/j.addbeh.2011.01.025
Norman, P., & Conner, M. (2006). The theory of planned behaviour and binge drinking: Assessing
the moderating role of past behaviour within the theory of planned behaviour. British Journal
of Health Psychology, 11(Pt 1), 55–70. https://doi.org/10.1348/135910705X43741
Quality Appraisal for Meta-Analysis
6
Quality appraisal is the process used to assess the quality of research studies
included in a systematic review. High quality studies use stronger research designs
that reduce the likelihood of biases influencing interpretation of results (see
Fig. 6.1). For example, if you randomly allocate participants to condition, your
study is of higher quality than a similar study that does not do so because randomi-
sation ensures that factors which may affect performance in each condition, like
participants’ motivation to change their behaviour, the extent of their impairment,
or the time they take to engage with the intervention, are less likely to influence
scores on the outcome. Randomisation addresses selection bias, which is one of
several biases we assess when we quality appraise studies included in our
meta-analysis.
This chapter will begin by introducing biases assessed in quality appraisal to
prime you to understand the tools we use for quality appraisal, which are mainly
focused on identifying the presence or absence of various biases. After describing
each bias, I have included text to outline methods researchers can use to address
biases; by telling you what methods researchers use to address biases, you’ll find it
easier to spot them in your included studies when conducting quality appraisal for
your meta-analysis. I have taken the decision to begin by focusing on experimental
study designs because the principles of quality appraisal for experimental studies
are more clearly developed than equivalent principles for correlational study
designs, which have only recently received attention.
Meta
analysis
Systematic
Review
Randomized
Controlled Trial
Prospective -
tests treatment
Cohort Studies
Prospective – exposed
cohort is observed for
outcome
the groups do not differ at baseline; they do not differ on variables you measured,
but they may differ on variables you didn’t! Let us say you do not measure how
motivated people are to change their behaviour. If you have not randomly allocated
participants to condition then you could end up accidentally assigning all the people
who are really motivated to change their behaviour to receive the intervention, and
everyone who aren’t really motivated end up in the control group. You may well find
that your intervention ‘successfully’ changes your outcome in the desired direction
at follow-up. You may well attribute this outcome to your brilliant intervention,
when the reality is that you were lucky that the more motivated participants all
received the intervention, and the less motivated participants received the control
materials. If allocation had been the other way round, perhaps the results would not
have been so impressive.
Random allocation to group neatly solves this issue. You may still be unaware of
participants’ motivation, but random allocation to condition will almost certainly
spread the more motivated participants to the intervention and control groups evenly,
making it more likely that a ‘successful’ intervention result, that is, a change in an
outcome at follow-up, is due to the brilliance of your intervention rather than factors
that differ between the groups. Randomising participants with an almost even
chance of receiving the intervention or experimental manipulation vs the control
condition, with a large enough sample, protects against selection bias. Random allo-
cation drastically reduces the odds of individual differences affecting your interpre-
tation of the effect of your intervention or manipulation. It’s all about minimising
the probability that something happens.
In the past, researchers used random number tables, or random telephone dialling
methods, to randomly allocate participants to condition. Nowadays, researchers
tend to use websites that generate random number sequences such as www.random.
org, or the brilliantly named The Sealed Envelope, or randomisation functions
within survey software—Qualtrics can randomly allocate participants to condition.
When I first started using the Cochrane Risk of Bias tool to quality appraise studies, it
took me some time to get my head around the difference between allocation conceal-
ment and blinding of participants and personnel (see Performance Bias section); how
is allocation concealment different to blinding? In brief, allocation concealment
relates to selection bias—who ends up in which condition—whereas blinding is about
performance bias—how participants and personnel act if they know which group they
are in. Allocation concealment and blinding are related processes that address differ-
ent biases. The point of judging allocation concealment is to provide a complementary
62 6 Quality Appraisal for Meta-Analysis
One of the best ways to conceal allocation to condition from participants is to have
someone outside of the research team allocate participants to groups. In Randomized
Controlled Trials (RCTs), this is usually done by a member of the clinical trials unit,
who are responsible for running the trial, but not part of the main project team. This
provides an extra layer of secrecy to the project to help reduce selection bias. Many
psychology studies are not RCTs, however, and as a result, the researchers may have
limited resources to employ someone to independently allocate participants to con-
dition. In this case, researchers can use computer software to allocate to condition.
Programmes like Gorilla and survey packages like Qualtrics contain functions for
random allocation to condition. Using these methods helps reduce selection bias.
Performance Bias
Psychologists are taught a lot about performance bias as part of their training in
experimental research methodology as undergraduates; as a discipline we are acutely
aware that if participants know they are in the experimental/intervention group this
can affect their performance. They might try harder to complete a puzzle or pay more
attention when receiving an intervention. Alternatively, knowing you are in the con-
trol group can lead to reduced persistence on a task or disinterest when completing
measures. Psychologists are also aware that researchers may intentionally or unin-
tentionally influence participants’ performance. Overall, I would say that psycholo-
gists are aware of the importance of blinding participants to group and aware that
blinding personnel, where practical, can be a good idea too. In sum, failing to blind
participants and personnel to group allocation can introduce performance bias.
Psychologists generally have a good understanding of ways in which you can blind
participants to condition, which include options such as (1) using a sealed envelope
to conceal from researchers and students which group participants are in or (2)
Biases in Research Studies 63
Detection Bias
Because psychologists run their own statistical analyses it typically means they
have unblinded access to the data, indicating which participants are in which groups,
when running statistical tests. This opens the door to detection bias and p-hacking
(see Box 6.2).
The major giveaway that psychologists have a blind spot about detection bias is
the rarity with which blinding of outcome assessors is mentioned in psychology
studies. Indeed, it was only when I completed quality appraisal for Cooke et al.
(2023) that I thought about it at all and noticed that all of my included studies were
rated as high risk of bias for detection bias. Because psychologists analyse their own
data or get their students/researchers to do it for them, we need to be aware of detec-
tion bias.
without any funding, which is another key difference with other disciplines. Until
psychologists conduct research studies with funds to employ an outcome assessor
to provide an independent statistical evaluation of results, I believe that detection
bias will remain an almost ever-present risk in psychological research. Even when
you have money to employ an outcome assessor to run the analyses, there will likely
be psychologists who enjoy analysing data—like me!—and will prefer to run their
own analyses. Perhaps these individuals can be given the dataset after someone
independently confirms the results, to reduce the risk of detection bias.
Attrition Bias
Attrition bias occurs when there is a difference between the sample you recruit and the
sample you analyse. For instance, imagine you recruit 100 students into a study about
binge drinking, but are only able to retain 50 at follow-up—you have lost half of your
sample between baseline and follow-up, which has the potential to cause attrition bias.
In this case, you only have responses from 50% of the original sample, which will
limit generalizability of findings because the 50% of your baseline sample may differ
on variables you are interested in from the 50% of the sample you have lost.
In Radtke et al. (2017), we found that those who dropped out of our intervention
study at follow-up drank more alcohol at baseline relative to those who we retained
in the study, meaning our intervention evaluation was undermined because we did
not know what happened to consumption in those we lost. Attrition bias can also
occur when you have incomplete data, that is, how you handle attrition, either sta-
tistically, by inputting missing values, for example, or by how you choose to include/
exclude cases in your analysis.
To calculate the attrition rate in your study (1) subtract the follow-up sample size
from the baseline sample size [telling you how many participants you lost] then (2)
divide that value by the baseline sample size, finally (3) multiply by 100, to get the
proportion of the original sample you lost. Here are some numbers to show this
effect in action:
In this example, the attrition rate was 28.4%. You can also work out your reten-
tion rate (how many people completed both measures) by subtracting the attrition
rate from 100. The retention rate is 71.6%.
I don’t recall too many conversations about what a good/bad/average/poor attri-
tion or retention rate is, although I do recall having a paper rejected for having a
retention rate of 48%. While I was cross about the rejection at the time, I now appre-
ciate the editor was correct to challenge the attrition in that study. I think most
psychologists rely on heuristics to satisfy their desire to get their work published,
although, I have not systematically researched this issue.
Other disciplines are much stricter on attrition rates. When I first read the
Cochrane Guidelines for Risk of Bias, I was surprised that they were recommending
that an attrition rate of 5% was desirable and evidence of low risk of bias. 5%!!!
Apart from some of my early studies, which students completed in return for
research credit, I rarely achieve a 5% attrition rate. When you do lots of prospective,
and longitudinal, studies, with student samples, you come to expect attrition, with
participants dropping out of follow-up surveys for a variety of reasons. Reading the
guidelines further, the implication was that while between 6 and 10% attrition could
be considered unclear risk of bias, anything more than 10% was clear evidence of
high risk of bias. In other words, unless you retained 90% of your original sample,
your study was at high risk of attrition bias.
When we are told something that challenges our world view, or we don’t like, our
natural tendency is to try and undermine it, because reconciling the information
with what we know and accept may not be possible. As we shall see when I talk
about using the Cochrane Risk of Bias tool in the next section, when I applied the
threshold of 10% attrition bias to the studies in Cooke et al. (2023), most did not
meet it. Interestingly, most of the study authors did not comment on attrition being
an issue; where it was, analyses were often run using statistical measures to deal
with missing data, like multiple imputation or running intention to treat analyses.
The issue of what psychologists consider acceptable levels of attrition is not one
that is widely discussed. I believe this issue needs addressing in research methodol-
ogy training and would benefit from psychologists, as a discipline, getting together
to talk about this issue. I think because much psychological research is unfunded
that we should expect higher levels of attrition than are found in RCTs, which
potentially set a high benchmark because participants may be more invested in tak-
ing part in a trial, for example, when it relates to a treatment for a health condition,
than they would be for a typical psychology survey study. Perhaps a good first step
would be for psychologists to routinely comment on the power of their study based
on the analysed (final) dataset. Power calculations are becoming more common in
primary papers, but these sometimes talk about power based on baseline rather than
final sample sizes, which can be misleading.
Biases in Research Studies 67
Reporting Bias
Reporting bias is sometimes called selective reporting. In the Cochrane Risk of Bias
tool, it mentions selective outcome reporting, which makes me think of examples
where authors have switched outcome after running statistical analyses. Box 6.3
gives an example of what I mean.
cc Box 6.3 Switching Outcomes After Running Statistical Analyses You are
interested in testing how well a school-based intervention effects
children’s fruit and vegetable intake, attitudes towards fruit and
vegetable, and knowledge of the 5-a-day message. Before the study,
your focus is on evaluating the effect of the intervention on fruit and
vegetable intake as you decide this is the most important variable to
show a change in. You designate fruit and vegetable intake as your
primary outcome and attitudes and knowledge as secondary outcomes.
When you run the analysis, as is often the case, the intervention has not
changed fruit and vegetable intake (d = 0.00). On the other hand, the
intervention led to positive changes in attitudes (d = 0.30) and knowledge
(d = 0.75). Despite knowing that it is typically easier to show an
intervention can change knowledge or attitudes than intake, you decide
to switch the focus of your paper to emphasise the changes in knowledge
and attitudes and de-emphasise the lack of change in intake. Doing this
is an example of reporting bias and HARKing—Hypothesising After
the Results are Known (Chambers, 2017).
When pre-registering the review protocol on PROSPERO for what became Cooke
et al. (2023), we opted to use the Cochrane Risk of Bias tool. This tool seemed a
good fit for the studies we were appraising. My co-author Helen McEwan and I used
it to independently judge the risk of bias in the included studies. The form we used
has been updated and is available online https://sites.google.com/site/
riskofbiastool/.
For readers unfamiliar with this tool, you judge each included study on seven
criteria: Random sequence generation; Allocation concealment; Blinding of partici-
pants and personnel; Blinding of outcome assessment; Incomplete outcome data;
Selective reporting; and Other bias. These criteria map onto the biases covered in
the previous section.1 For each criterion, you judge the study as being low, unclear,
or high risk of bias. Low risk of bias suggests you think that the bias you are rating
is unlikely. Unclear risk of bias suggests you are unsure about the extent of bias,
based on how the study was reported. Finally, high risk of bias suggests that there is
a good chance that the study suffered from this bias. Next, I will talk through how I
used the form for one of the studies included in Cooke et al. (2023).
I’ve included the completed worksheet for Wittleder et al.’s (2019) study as
Table 6.1. I’ve selected this paper because it was simple to quality appraise in some
ways but not others.
1
Other bias is a catch-all category for biases not covered in the rest of the form.
Example Risk of Bias Form—Wittleder et al. (2019) 69
ment unclear
selection bias occurred. I reserved high risk of bias rating for papers that do not
report randomisation to condition. Authors conducting experiments/evaluating
interventions should aim to limit possible explanations of their results. Randomisation
is one way to do this, and is often under the control of the authors, so failure to ran-
domise merits a high risk of bias rating.
I judged the risk of bias for allocation concealment to be low for Wittleder et al.’s
paper. With Qualtrics randomising to condition, it seems unlikely that there was bias
in allocation to condition. It’s worth noting that the authors made no reference to
allocation concealment, so I did have to infer this when judging the paper as low
risk of bias. Helen McEwan, the second author of the paper who independently
70 6 Quality Appraisal for Meta-Analysis
rated each paper using the same form, and I decided that studies using computer-
administered allocation to condition are likely to be low risk of bias. We reserved
unclear risk of bias for studies that did not clearly report their method of allocation.
We did not judge any of included studies to be high risk of bias.
I judged the risk of performance bias to be low in Wittleder et al.’s study. With
Qualtrics randomly allocating participants to condition it is hard to see how partici-
pants could know which condition they were in, and this judgment was reported by
the authors too. I think that the authors were blind to condition until data analyses
were conducted. Other papers were rated as unclear for blinding—this was gener-
ally where no information was provided about how participants were blinded to
condition, but it was not immediately obvious that participants and researchers were
aware of conditions. As noted above, I believe that psychologists are acutely aware
of blinding, so it is perhaps not too surprising that we judged all the studies in our
meta-analysis as low or unclear for performance bias.
In contrast to previous criteria, I rated Wittleder et al.’s study as high risk of detection
bias. The authors made no mention of blinding outcome assessors, so, I had to
assume that they ran analyses themselves. Indeed, we rated all the papers included in
Cooke et al. (2023) as high risk of detection bias as none of them made any reference
to blinding outcome assessors. When all papers in a meta-analysis suffer the same
negative assessment, it is worth discussing the suitability of the tool for psychologi-
cal research, which we did at the end of the paper. There is a broader point to be made
about the suitability of tools designed for RCTs being used to evaluate experimental
psychology studies. Although both study designs are experiments, the way research
is reported is quite different. Perhaps psychologists should create their own tools.
Wittleder et al. (2019) reported an attrition rate of 45%. As noted above, this eclipses
the cut-off of 90% for unclear risk of bias and 95% for low risk of bias found in the
Cochrane guidance. In the limitations section of the paper, the authors note that the
attrition rate in their study was higher than typically found in clinical trials. There is
not much the authors could do about this, beyond noting that attrition did not differ
by condition, which is a positive finding. Most of the studies included in Cooke
et al. (2023) reported high risk for attrition bias, although there were a couple of
studies that reported zero or low levels of attrition indicating that it is possible for
psychology papers to be rated as low risk of bias for incomplete outcome data.
Quality Appraising Correlational Studies 71
Helen and I judged that all included studies were low risk for both reporting bias
and other bias. This is not to say either criterion won’t be an issue with quality
appraisal for your meta-analysis, just that it was not an issue for our studies. One
final thing to note is that none of our included studies published a protocol paper
outlining their plans prior to conducting the study. When you are quality appraising
a literature where publication of protocols is accepted practice this makes the job of
judging reporting bias potentially easier as you can compare the primary outcome
mentioned in the protocol to the outcome reported in the paper. So, our quality
appraisal is a qualified one on reporting bias; we did not find any evidence of it, but
it was hard for us to do so due to the nature of the way the studies were conducted.
The final section of this chapter will discuss quality appraisal of correlational
studies.
if you had asked me to rate the study quality of the papers in my first three meta-
analyses (Cooke et al., 2016; Cooke & French, 2008; Cooke & Sheeran, 2004), I
would not have known where to start, although, in Cooke et al. (2016), we did
exclude a paper because we judged it to use an invalid measure of a construct.
Since completing Cooke et al. (2016), Protogerou and Hagger’s (2020) Quality
of Survey Studies in Psychology (Q-SSP) tool has been published. This tool can be
used to judge the quality of survey studies. My Professional Doctorate in Health
Psychology student, Amina Saadi, used this tool in her systematic review of predic-
tors of influenza vaccine uptake in hospital-based healthcare workers, and found it
to be a really effective way to judge the quality of correlational studies. As with the
Cochrane Risk of Bias tool criteria, Amina found in her included studies that there
were several criteria that authors did not routinely address: reporting of sample
demographics, psychometric properties of measures, and operational definitions of
the focal behaviour (vaccine uptake). Hopefully, over time psychologists will
become more familiar with this tool and this will improve the standard of both
research design, using the tool to ensure that correlational studies follow best prac-
tice in study design, and the reporting of information. I will use this tool the next
time I run a meta-analysis of correlations.
Summary
The aim of this chapter was to get you thinking about quality appraisal for meta-
analysis. Initially, I did this by talking about different biases that affect perceptions
of study quality. I then provided an example of how to quality appraise a study using
an experimental study design and offered advice for correlational study designs. I’ll
end with a couple of top tips for quality appraisal:
• Read and re-read your papers multiple times to check you have the correct meth-
odological information when quality appraising your studies. Relative to other
disciplines, psychology studies often fail to provide as much detail on criteria,
including allocation concealment and selective reporting, so you are going to
have to infer what happened (helpfully, psychologists are generally pretty awe-
some at inference, as we do it all the time).
• Ask a review buddy to independently quality appraise all included studies. This
might sound like a lot of work, but it is a good way to ensure you’ve both reached
the same inference.
• Keep a folder of quality appraisal forms for easy access to information when you
are interpreting results of your meta-analysis (especially moderator analyses).
References 73
References
Chambers, C. (2017). The seven deadly sins of psychology: A manifesto for reforming the culture
of scientific practice. Princeton University Press.
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., & French, D. P. (2008). How well do the theory of reasoned action and theory of
planned behaviour predict intentions and attendance at screening programmes? A meta-analy-
sis. Psychology & Health, 23(7), 745–765. https://doi.org/10.1080/08870440701544437
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
Cooke, R., & Sheeran, P. (2004). Moderation of cognition-intention and cognition-behav-
iour relations: A meta-analysis of properties of variables from the theory of planned
behaviour. The British Journal of Social Psychology, 43(Pt 2), 159–186. https://doi.
org/10.1348/0144666041501688
Cooke, R., Trebachzyk, H., Harris, P., & Wright, A. J. (2014). Self-affirmation promotes physical
activity. Journal of Sport and Exercise Psychology, 36(2), 217–223. https://doi.org/10.1123/
jsep.2013-0041
Higgins, J. P. T., & Green, S. (2011). Cochrane handbook for systematic reviews of interventions.
The Cochrane. Collaboration.
Protogerou, C., & Hagger, M. S. (2020). A checklist to assess the quality of survey studies in
psychology. Methods in Psychology, 3, 100031. https://doi.org/10.1016/j.metip.2020.100031
Radtke, T., Ostergaard, M., Cooke, R., & Scholz, U. (2017). Web-based alcohol intervention:
Study of systematic attrition of heavy drinkers. Journal of Medical Internet Research, 19(6),
e217. https://doi.org/10.2196/jmir.6780
Wittleder, S., Kappes, A., Oettingen, G., Gollwitzer, P. M., Jay, M., & Morgenstern, J. (2019).
Mental contrasting with implementation intentions reduces drinking when drinking is hazard-
ous: An online self-regulation intervention. Health Education & Behavior, 46(4), 666–676.
https://doi.org/10.1177/1090198119826284
Data Synthesis for Meta-Analysis
7
A key part of any meta-analysis is interpreting effect sizes from included studies.
You need to reflect on the direction and magnitude of each effect size because
understanding the results for each included study will help you interpret the overall
effect size produced by meta-analysis. The more you practise interpretation of effect
sizes in terms of direction and magnitude, the easier you will find it to interpret
meta-analytic results, which involves the same task based on pooling across studies.
I’ve included Table 7.1 to help you practise these ideas. This table contains data
from ten imaginary studies that have correlated drinking intentions (i.e. plans to
drink in the future) and drinking behaviour (i.e. self-reported alcohol consumption)
measured between one week and four weeks later (i.e. using a prospective design).
You can infer quite a lot of important information from each row without running a
meta-analysis: You can note their directions (positive; negative; null) and their mag-
nitudes (small; medium; large) using Cohen’s (1992) guidelines (see Chap. 3).
Start with a simple question—“Are all the correlations in the same direction?” In
our case the answer is “yes” —they are all positive correlations; having higher
drinking intentions is correlated with more self-reported drinking behaviour. This
78 7 Data Synthesis for Meta-Analysis
uniformity of direction means that when we run the meta-analysis, we should expect
to find a positive overall correlation (i.e. the correlation based on pooling results
across studies), if we don’t, it’s likely we’ve done something wrong!
We can also think about what these positive correlations mean and whether this
fits in with what you expect in terms of theory. Results match Cooke et al. (2016),
where we had positive correlations between drinking intentions and behaviour for
19 studies using prospective designs. When running statistical tests, always remem-
ber that statistics are there to help you answer a question—is what I expect to hap-
pen happening? Interpreting results can be challenging, so I recommend before you
start, remind yourself what you are doing, for example, what variables are you cor-
relating with one another, and why that is interesting to you. Thinking about these
non-statistical questions will help you when it comes to interpreting or inferring
results because they prime you to know what to look for.
How about magnitude? “Do all the correlations have the same magnitude?” In
our case, the answer to the question is “no”—correlations vary in magnitude. Using
Cohen’s (1992) guidelines (see Chap. 3), we have one small correlation, four
medium correlations, and five large correlations. The lack of uniformity in magni-
tude means we are uncertain about the size of the overall correlation, but you can
make an educated guess that because 9/10 correlations are either medium or large-
sized then the overall correlation is likely to be either medium or large-sized.
Answering these two questions about direction and magnitude of individual
study effect sizes primes you for your meta-analysis because we already know that
we need to focus more on magnitude and less on direction, when interpreting our
results; all correlations are positive, meaning that as intentions to drink increase, so
does drinking behaviour. Having covered inference of direction and magnitude for
a set of correlations, let’s do the same for effect size differences.
The same process of inference of direction and magnitude is also possible for effect
size differences. Table 7.2 includes statistics from ten imaginary studies reporting
effect size differences for adolescents receiving vs not receiving a behaviour change
What Statistical Information Does Meta-Analysis Produce? 79
intervention to reduce screen time. As before, you can note down the direction and
magnitude of these effect sizes before you run any data synthesis.
Let’s think about direction first: “Are all the effect size differences in the same
direction?” In this case the answer is “yes”. In each study, there is a positive effect
size difference, that is, where the intervention group reduced their screen time more
than the control group. So, there is uniformity in direction for this set of effect size
differences. How about magnitude? Using Cohen’s (1992) guidelines, we have five
medium effect size differences and five small effect size differences. The skills
you’ve just practised are the same ones you will need when interpreting what the
output of your meta-analysis is. The only difference is that you will be interpreting
the direction and magnitude of the overall effect size rather than the individual
effect sizes shown in Tables 7.1 and 7.2.
Meta-analysis outputs statistical information that relates to three key points of inter-
est. First, meta-analysis produces a sample-weighted average effect size, that is, a
sample-weighted average correlation between two variables or a sample-weighted
average effect size difference between two groups on an outcome. Meta-analysis
provides the average effect size (correlation or effect size difference) across included
studies after weighting each individual effect size by sample size; studies with
larger sample sizes are given greater weight, hence the phrase sample-weighted.
Meta-analysis will also output statistics that you can use to infer the significance of
the sample-weighted statistic you are interested in. I will talk more about these val-
ues in Chaps. 9 and 10 and weighting in Chap. 11. Second, meta-analysis produces
statistics that indicate the extent of heterogeneity among the effect sizes in your
included studies. Meta-analysis provides statistics that tell you if the studies’ effect
sizes differ from one another (heterogeneity) or if they are similar (homogeneity).
We’ll talk about ways to deal with this issue using moderation in Chap. 12. Third,
meta-analysis outputs statistics to help clarify if there is evidence of publication bias
in the included studies. Meta-analysis allows researchers to compare effect sizes
80 7 Data Synthesis for Meta-Analysis
from included studies to see if they represent the full range of potential values or if
they only represent a narrow range of potential values, usually positive, that suggest
that only positive effects are published. Chapter 13 will go over publication bias in
more detail. Let’s work through each of these ideas—sample-weighting, heteroge-
neity, publication bias—one by one.
The first point I mentioned in the previous paragraph was about sample-weighting.
Table 7.3 contains the same imaginary studies as Table 7.1 but with sample sizes
added, which range from 50 to 2000. In Chap. 3, we discussed the idea that we
should put more weight (importance) on effect sizes from studies based on larger
samples, because these results are more likely to reflect the population effect size
than equivalent effect sizes from smaller samples. In effect, meta-analysis outputs
an estimate of the population effect size—this is the sample-weighted average cor-
relation or the sample-weighted average effect size difference.
You can ask the question of studies in Table 7.3—“Which study is most likely to
represent the population effect size (e.g. correlation between drinking intentions
and self-reported drinking behaviour)?” The answer is Cole et al. (2000). Why?
This correlation is based on the largest sample size (N = 2000) and so is most likely
to reflect the population correlation between drinking intentions and behaviour. You
can turn the question around and ask, “Which study is least likely to represent the
population effect size?” This time the answer is Jacobi and Jordan (2014). A sample
of N = 50 does not inspire much confidence that results reflect the population cor-
relation. In your own meta-analysis, the sample sizes might vary from this example,
but the logic underpinning my reasoning will always be the same; the study with the
largest sample size (even if it does not look very large!) will ALWAYS be given the
Table 7.3 Example table of correlations between drinking intentions and behaviour with sam-
ple sizes
Study authors + year Correlation (r) Sample size (N)
Arking and Jones (2010) 0.25 100
Biggs and Smith (2002) 0.54 200
Cole et al. (2015) 0.45 2000
David et al. (2018) 0.35 150
Erasmus et al. (2009) 0.70 75
Feely and Touchy (2007) 0.65 400
Gent et al. (2020) 0.30 475
Horseham and Smooth (2021) 0.40 150
Illy et al. (2013) 0.60 125
Jacobi and Jordan (2014) 0.65 50
What Does Sample-Weighting Mean? 81
most weight in a meta-analysis. Equally, the study with smallest sample size (even
if it does not look very small!) will ALWAYS be given the least weight in a
meta-analysis.
So, even before you run a meta-analysis, you can make the following inferences
about the correlations and sample sizes reported in this set of studies:
There’s one final inference you could potentially make; it’s highly likely that all
these correlations are significantly different from zero (i.e. a correlation of r = 0.00).
Because the smallest correlation is r = 0.25, which is quite far from r = 0.00, I think
it’s a fairly safe bet that all our correlations are significantly different from zero.
However, the beauty of meta-analysis is that you don’t need to take that bet in igno-
rance. By running the meta-analysis of these studies (see Chap. 9) you’ll know
for sure!
I’ve added the sample sizes for the imaginary studies in Tables 7.2 to Table 7.4.
A key difference here is that you have separate sample sizes for control and inter-
vention groups, rather than the overall sample you get with a correlation. This is
because with correlations, you are running a test of association, and average results
across the whole sample, whereas with an effect size difference, you are running a
test of difference and want to maintain separation between the groups. The idea,
nevertheless, remains the same; effect size differences based on more people are
assigned greater weight (importance) in meta-analysis relative to studies based on
fewer people. Adapting our question about confidence in effect sizes to the studies
in Table 6.4, based on sample size, we should have greatest confidence that results
from Peeps et al. (2015), with a total sample size of N = 950, represent the popula-
tion effect size difference and least confidence in Quest et al.’s (2016) results, with
a total sample size of N = 35, represent the population effect size difference. The
rule of thumb is more people = more confidence, although this does depend some-
what on the numbers of participants in each group.
I will finish off this section by reminding you of the inferences you can make
about studies in Table 7.4:
• Effect size differences are in the same direction (favour experimental group).
• Effect size differences vary in magnitude (from medium (d = 0.67) to large
(d = 5.86).
• (Total) Sample sizes vary from N = 35 to N = 950.
Unlike the correlations, I am not prepared to make the inference that all these
effect size differences are significantly different from zero. Quest et al.’s (2016)
results, which combine the smallest effect size difference (d = 0.22) with the
82 7 Data Synthesis for Meta-Analysis
Table 7.4 Example table of effect size differences for a behaviour change intervention to reduce
screen time with sample sizes
Effect size Control group sample Experimental group sample
Authors difference (d) size (N) size (N)
Keane et al. 0.20 50 50
(2002)
Linus et al. 0.59 75 75
(1999)
Mimms et al. 0.70 25 25
(1977)
Noone et al. 0.64 125 120
(1985)
Owen (2023) 0.75 250 200
Peeps et al. 0.39 500 450
(2015)
Quest et al. 0.22 15 20
(2016)
Ricki et al. 0.47 115 120
(2007)
Sopp et al. 0.44 55 45
(2012)
Tapp et al. 0.78 30 20
(2003)
smallest total sample size (N = 35), make me pause before claiming that result is
significantly different from zero. We’ll pick up these ideas in more detail in Chap.
10. Let’s move on to talking about heterogeneity of effect sizes.
As a psychologist, you will have undoubtedly come across the concepts of homoge-
neity and heterogeneity before. Think back to classes on ANOVA or t-tests and the
homogeneity of variances tests you ran. We can use this principle to help us under-
stand the statistics we use in meta-analysis. We’ve already discussed that the effect
sizes in Table 7.1 varied in their magnitude; some correlations were small, some
medium, and some large. The original goal of meta-analysis was to produce an
overall (sample-weighted) average effect size that represents data from a set of stud-
ies that is homogeneous, the idea being that this overall effect size would provide a
sufficiently precise estimate for use by researchers. I’ve yet to run a meta-analysis
where the effect sizes for the overall analysis are homogeneous except when most
studies have null effects!!! I think this is probably a function of meta-analysing
psychology studies, which contain many sources of differences between studies. In
short, we’re not great at standardising our research methods (especially the mea-
sures we use) relative to other disciplines. Psychology studies often recruit small
sample sizes, which can lead to volatility in results between studies. How can meta-
analysis help us to work out the extent of heterogeneity between studies? In two
How Do You Identify Publication Bias in Meta-Analysis? 83
ways: using statistics to calculate the extent of heterogeneity, and by creating forest
plots to visualise the effect sizes from our included studies.
Two main statistics that meta-analysts report when describing the heterogeneity
of their effect sizes are: the Q test (a Chi-square test) and the I2 index. The Q test is
like many statistical tests. It takes observed values you have in your dataset and tests
the idea that they significantly differ from the expected value. This expectation
makes intuitive sense; in meta-analysis, you are aiming to pool together statistics
from studies that have done pretty much the same thing, like correlate drinking
intentions with drinking behaviour using prospective designs or evaluate the effec-
tiveness of an intervention to reduce screen time. Despite this aim, most of the times
I’ve run Q tests have produced significant effects, which mean that data are hetero-
geneous. We’ll return to discuss homogeneity in Chap. 12.
The I2 index (Higgins, 2003) estimates the amount of variability in study results
that is due to real differences between the studies, rather than chance. So, if you
have an I2 value of 50% that means 50% of variation in results is due to differences
between the studies. I2 values of 25%, 50%, and 75% have been proposed by Higgins
(2003) as indicative of low, medium, and high heterogeneity in results between
studies.
While the Q test and I2 index give you a sense of the heterogeneity of the effect
sizes, I find forest plots are really helpful in unpicking what is causing heterogeneity
in effect sizes. A forest plot (see Figs. 2.1, 9.5, or 10.5) allows the reader to quickly
determine the pattern of results, that is, are studies roughly similar or do they differ
from one another, as well as helping you to spot outliers, whether they are reporting
larger, or smaller, effect sizes than the other studies. We’ll return to this issue in Part
III of the book. Next, we’ll cover publication bias statistics and funnel plots.
look for equivalent studies to the left of the overall effect size (negative, or null,
effects based on small samples). If you find there’re few or zero studies on the left
of the plot near the bottom, you might have evidence of publication bias. We’ll
come back to this issue when we run meta-analyses in Chaps. 9 and 10 and in depth
in Chap. 13.
Summary
This chapter has focused on reiterating how to interpret effect sizes as well as prim-
ing you for the three key bits of information you’ll generate in your jamovi output:
the overall effect size; the extent of heterogeneity in effect sizes; evidence of publi-
cation bias. In Chap. 8, we’ll go over the practical aspects of installing jamovi and
major to run meta-analyses before outlining how to run a meta-analysis of correla-
tions in Chap. 9 and meta-analysis of effect size differences in Chap. 10. The tasks
below are included to help you practise applying the principles of interpreting effect
sizes in terms of direction and magnitude covered in this chapter.
Tasks
1. Complete Table 7.5 by writing in the direction and magnitude of each effect size
(Hint: Use Cohen’s (1992) guidelines, reported in Chap. 3, to infer the
magnitude).
2. Complete Table 7.6 by writing in the direction and magnitude of each effect size
(Hint: Use Cohen’s (1992) guidelines, reported in Chap. 3, to infer the
magnitude).
3. Table 7.7 includes a set of results for eight studies that reported effect size differ-
ences for an experimental manipulation testing the effects of using gain vs loss
frames to encourage physical activity: Gain frames are messages that emphasise
the gains that follow behaviour change (i.e. what you will gain by being more
Table 7.5 Correlations between perceived behavioural control over drinking and drinking
intentions
Study names Correlation (r) N Direction Magnitude
Arking and Jones (2010) 0.25 150
Biggs and Smith (2002) −0.02 200
Cole et al. (2015) 0.10 350
David et al. (2018) 0.35 35
Erasmus et al. (2009) −0.50 180
Feely and Touchy (2007) 0.45 389
Gent et al. (2020) 0.14 100
Horseham and Smooth (2021) 0.00 80
Illy et al. (2013) −0.15 125
Jacobi and Jordan (2014) 0.55 165
86 7 Data Synthesis for Meta-Analysis
Table 7.6 Example table of effect size differences for a behaviour change intervention to increase
digital resilience skills with sample sizes
Authors Effect size difference (d) Direction Magnitude
Keane et al. (2002) 0.50
Linus et al. (1999) 0.35
Mimms et al. (1977) 0.80
Noone et al. (1985) 0.00
Owen (2023) 0.15
Peeps et al. (2015) 0.20
Quest et al. (2016) −0.45
Ricki et al. (2007) 0.75
Sopp et al. (2012) 0.15
Tapp et al. (2003) 0.25
Table 7.7 Example table of effect size differences for gains versus loss frame messages to
increase physical activity
Authors Effect size difference (d) Direction Magnitude
Erikson (2002) −0.22
Maldini (1999) 0.37
Gower et al. (1977) −0.80
Gatting (1985) 0.00
Montana and Young (2023) 0.12
Rodgers (2015) 0.29
Smith (2016) −0.54
Backley (2007) −0.75
Maresca et al. (2024) 0.05
Frank (2003) −0.08
physically active). Loss frames are messages that emphasise the losses that fol-
low failure to change behaviour (i.e. what you lose by being physically inactive).
Complete Table 7.7 by writing the direction and magnitude of each effect size
(Hint: Use Cohen’s (1992) guidelines, reported in Chap. 3, to infer the
magnitude).
References
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
References 87
Duval, S., & Tweedie, R. (2000a). A nonparametric ‘trim and fill’ method of accounting for publi-
cation bias in meta-analysis. Journal of the American Statistical Association, 95, 89–98.
Duval, S., & Tweedie, R. (2000b). Trim and fill: A simple funnel-plot-based method of testing and
adjusting for publication bias in meta-analysis. Biometrics, 56, 455–463.
Egger, M., Higgins, J. P. T., & Smith, D. (1997). Bias in meta-analysis detected by a simple,
graphical test. BMJ, 315, 629–634.
Higgins, J. P. T. (2003). Measuring inconsistency in meta-analyses. BMJ, 327(7414), 557–560.
https://doi.org/10.1136/bmj.327.7414.557
Orwin, R. G. (1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics,
8(2), 157–159. https://doi.org/10.3102/10769986008002157
Rosenthal, R. R. (1979). The File drawer problem and tolerance for null results. Psychological
Bulletin, 86, 638–641.
Part III
Conducting Meta-Analysis in Jamovi
Using jamovi to Conduct Meta-Analyses
8
them to run analyses in SPSS, which they knew from their undergraduate degree,
and then again in jamovi, which was new to them. The feedback from the students
was unlike anything I experienced teaching stats before—they thanked me for intro-
ducing them to jamovi! One student even asked why we still taught statistical analy-
sis using SPSS??!
Discovering that both jamovi and JASP had an option to run meta-analysis piqued
my interest as, throughout my career, I had always used stand-alone packages to
complete meta-analyses; I was taught meta-analysis using Ralf Schwarzer’s (1988)
Meta programme, a DOS programme that did a pretty good job before Windows
Vista killed it off. You can still download Ralf’s software at http://userpage.fu-
berlin.de/~health/meta_e.htm, including the manual which has some useful tips for
running meta-analysis. I moved on to using Comprehensive Meta-Analysis
(Borenstein et al., 2005) produced by Michael Borenstein and colleagues. CMA is
a nice programme with two downsides. First, as licensed software, you must pay for
it. Second, CMA is not Mac-friendly. Even if you don’t use CMA for your meta-
analysis, I recommend reading Borenstein et al.’s (Borenstein et al., 2021) excellent
textbook that accompanies the software. It provides in-depth explanation of how
meta-analysis works.
So, my choice for this book was between jamovi and JASP. Both are free and
Mac-friendly. The main reason I went for jamovi is that it allows you to run meta-
analysis by entering data for effect size differences using two methods: (1) entering
the mean, standard deviation and sample sizes for the control and experiment/inter-
vention groups or (2) entering the effect sizes and standard errors. In JASP, you
must use method (2). I wanted to use a package that let me calculate effect size
differences based on method (1) because I know that many psychology papers fail
to report effect sizes. By teaching you how to use method (1), I’m aiming to make
your life a bit easier as you only need to enter the data reported by the authors—
most authors report means and standard deviations, although sometimes authors do
report adjusted means, so watch out for that! In sum, jamovi’s greater flexibility in
how you enter data for meta-analysis persuaded me to use it and this is why I am
recommending it for your meta-analysis. The next section will outline how to down-
load and install jamovi.
First search for ‘jamovi download’ using any search engine. Then download the
version for your type of computer, Windows, Mac, Linux, Chrome. There’s a cloud
version too, but I am going to stick to the desktop version as this is what I have used
for conducting meta-analysis. After downloading, installing, and opening jamovi, it
will look like Fig. 8.1.
Modules—A Library of Extensions 93
Fig. 8.1 What jamovi looks like when you open it for the first time
You have a data window, where you enter your variables (left-hand side of
screenshot, with column headings A B C). Unlike SPSS, jamovi displays the output
window alongside the data window (the right-hand side of the screenshot, currently
empty because we have not run any analyses). The analysis options on the top bar—
Exploration, T-tests, ANOVA, Regression, Frequencies, and Factor—are pre-
installed packages. Exploration has descriptive statistics and scatterplots, T-tests
contains independent group, paired, one sample. ANOVA has parametric (One way,
ANOVA (= multiple factors), Repeated Measures, ANCOVA, and MANCOVA) and
non-parametric (Kruskal Wallis, McNemar test) tests. Regression has correlations,
partial correlations, linear, logistic regression. Frequencies has binomial test, chi-
square goodness of fit and test of association, McNemar test, and log-linear regres-
sion. Factor has reliability analysis, exploratory factor analysis, principal components
analysis, and confirmatory factor analysis.
Meta-analysis is not included as one of the pre-installed options in jamovi. To
run meta-analysis, go to the Modules option (top right of screenshot where giant +
symbol is).
A key strength of R is that because it is open source, researchers are free to create
updates for it. By installing jamovi, you gain (partial) access to this activity, with
new extensions being added to jamovi on a regular basis. If you want to add an
94 8 Using jamovi to Conduct Meta-Analyses
extension to jamovi, click on the Modules (+ symbol) at the top right of your file and
scroll through the library until you find what you are after. For meta-analysis, we
need to find MAJOR.
Installing MAJOR
The MAJOR extension was written by W. Kyle Hamilton to run the metafor pack-
age reported in Viechtbauer (2010); metafor provides a set of methods for conduct-
ing meta-analysis in R, and MAJOR is the jamovi version of metafor. Although it
does not have all the functionality of metafor, it provides enough for us to learn the
principles of meta-analysis without all the fun of learning to code (that comes
later!). I recommend reading Viechtbauer paper as it will help explain how the soft-
ware runs and is also a good introduction to principles of meta-analysis. It’s another
good resource to have as it explains various ways of running meta-analysis and can
help expand your knowledge when you’ve learned the basics. So, your next step is
to scroll through the modules until you find MAJOR (see Fig. 8.2).
When you’ve found it, click on install and you’ll notice that MAJOR is now
available in your analyses toolbar (see Fig. 8.3).
To run a meta-analysis in any software package, you must create a dataset that con-
tains a specific set of variables. In all meta-analytic datasets, you will need a study
label/authors variable. For example, in Cooke et al. (2016), we identified included
studies based on author and year of publication, for example, Cooke & French
(2011). For most studies, this worked fine, but for some we needed to add extra
information. For papers that had multiple studies, we labelled these Conner et al.
(1999) Study 1, Conner et al. (1999) Study 2, Conner et al. (1999) Study 3.
Alternatively, Zimmermann and Sieverding (2011) reported correlations sepa-
rately for men and women, so we labelled the results in our dataset as Zimmermann
and Sieverding (2011) male and Zimmermann and Sieverding (2011) female.
Regardless of the type of meta-analysis you run, MAJOR expects you to have cre-
ated a variable that serves this function. So, a study label/authors variable is com-
mon to all meta-analytic datasets. Other variables included in the dataset depend on
whether you are running a meta-analysis of correlations or a meta-analysis of effect
size differences.
In any meta-analysis of correlations, you need to create a correlation variable (r) and
a sample size variable (N): The correlation from each included study is your effect
size, and the sample size is used to weight the studies when running the meta-
analysis (see Chaps. 3 and 7). Studies with larger sample sizes receive more weight-
ing in sample-weighted average correlations. In sum, to run a meta-analysis of
correlations, you need to create three variables: (1) study label/authors; (2) correla-
tion; and (3) sample size.
There are two methods to create a dataset for meta-analysis of effect size differ-
ences. Method (1) involves creating a study label/name variable and then entering
96 8 Using jamovi to Conduct Meta-Analyses
the sample size, mean, and standard deviation for each group (i.e. control; experi-
mental/intervention). The means and standard deviations are used to generate the
effect size difference (d), which are weighted using the sample sizes. Using this
method is best when authors of included studies have NOT reported the effect size
differences in their paper. In sum, to run a meta-analysis of effect size differences
using Method (1), you need to create seven variables: (1) study label/authors; (2)
experimental/intervention group mean; (3) experimental/intervention group stan-
dard deviation; (4) experimental/intervention group sample size; (5) control group
mean; (6) control group standard deviation; and (7) control group sample size.
Method (2) is best used when most of the authors of included studies have
reported effect size differences. Using method (2), you only need to create three
variables to run a meta-analysis of effect size differences: (1) study label/authors;
(2) effect size difference; and (3) standard error or sample variance. You already
have the effect size differences so all you need is the standard errors, or sample vari-
ances, to allow MAJOR to weight the effect sizes.
I’ve never been able to run a meta-analysis of effect size differences based on
method (2) because the literatures I’ve meta-analysed to date have failed to include
effect size differences in the papers. I thought I should mention it in case you get
lucky with your own meta-analysis. As an aside, if most of your studies have
reported effect size differences but a couple of studies have not, then I would advise
you use either one of the effect size calculators available on the Internet (see Chap.
3) or if you are comfortable with R, download metafor and use the escalc function
(see Viechtbauer, 2010, for more information). Either option can be used to calcu-
late the missing d values for you to add to jamovi.
Summary
In this chapter, I have provided guidance on how to download and install jamovi,
how to install MAJOR, as well as information about how to prepare the dataset for
meta-analysis. In Chaps. 9 and 10, I will describe how to perform meta-analysis
using MAJOR in jamovi. The two tasks are designed to help prepare you for
this task.
Tasks
References
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2005). Comprehensive meta-
analysis (Version 2) [Computer Software]. Biostat.
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2021). Introduction to meta-anal-
ysis (2nd ed.). Wiley.
Conner, M., Warren, R., Close, S., & Sparks, P. (1999). Alcohol consumption and the theory of
planned behavior: An examination of the cognitive mediation of past behaviorid. Journal of
Applied Social Psychology, 29(8), 1676–1704. https://doi.org/10.1111/j.1559-1816.1999.
tb02046.x
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
Newby, K., Teah, G., Cooke, R., Li, X., Brown, K., Salisbury-Finch, B., Kwah, K., Bartle, N.,
Curtis, K., Fulton, E., Parsons, J., Dusseldorp, E., & Williams, S. L. (2021). Do automated digi-
tal health behaviour change interventions have a positive effect on self-efficacy? A systematic
review and meta-analysis. Health Psychology Review, 15(1), 140–158. https://doi.org/10.108
0/17437199.2019.1705873
98 8 Using jamovi to Conduct Meta-Analyses
Schwarzer, R. (1988). Meta: Programs for secondary data analysis. [Computer Software].
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of
Statistical Software, 36(3), 10.18637/jss.v036.i03.
Zimmermann, F., & Sieverding, M. (2011). Young adults’ images of abstaining and drinking:
Prototype dimensions, correlates and assessment methods. Journal of Health Psychology,
16(3), 410–420. https://doi.org/10.1177/1359105310373412
How to Conduct a Meta-Analysis
of Correlations 9
First, open jamovi and create a new file with three variables by clicking on the let-
ters (A, B, C) above the dataset: Study Name; Sample Size; Correlation. Study
Name is used to identify study authors/study label; it is a nominal measure type and
text data type. Sample Size is the sample size for each study; it is a continuous mea-
sure type, and an integer data type. Correlation is the correlation between the vari-
ables; it is a continuous measure type, and a decimal data type. After creating the
variables, enter data1 to recreate Fig. 9.1.
You are now ready to run a meta-analysis of correlations!
Clicking on MAJOR will open a drop-down window in your dataset as in
Fig. 9.2.
Select the first option—Correlation Coefficients (r, N) —to tell MAJOR you
want to run a meta-analysis using correlations (r) and sample sizes (N) from a set of
studies. When you click on this option, you will enter the analysis window (see
Fig. 9.3) where you enter variables to run your meta-analysis of correlations.
Use the arrows in the middle of the display to match the variable names with the
relevant boxes:
• Correlation➔Correlations
• Sample Size➔Sample Sizes
• Study Name➔Study Label
1
This data is from Table 7.1 and is also available as a .csv file on my Open Science page for you to
download and open in jamovi.
After you do this, your output window in jamovi will populate with information
about your meta-analysis. I will go slowly go through each part of this output in the
next section.
How Do I Interpret the Output? 101
The output is split into three sections: (1) Main output table (Fig. 9.4), which shows
the overall effect size for your meta-analysis plus various statistics; (2) Tests of
heterogeneity table (Fig. 9.5) and forest plot (Fig. 9.6); (3) Tests of publication bias
table (Fig. 9.8) and funnel plot (Fig. 9.9). I will start by explaining the Main out-
put table.
Main Output
The main output table contains the key statistical information from your meta-
analysis. (see Fig. 9.4). The text in blue confirms the type of meta-analysis you have
run in MAJOR—Correlation Coefficients using correlations and sample sizes.
Immediately above the table it says a random effects model was used—I will explain
this idea in Chap. 11.
102 9 How to Conduct a Meta-Analysis of Correlations
First, look at the estimate, which in this case is the sample-weighted average cor-
relation between drinking intentions and drinking behaviour, based on the ten stud-
ies displayed in Table 7.1. According to Cohen (1992), correlation coefficients that
equal or exceed r = 0.50 can be interpreted as large-sized. Thus, we can say our
estimate shows we have a large-sized correlation between drinking intentions and
behaviour because r = 0.546, or 0.55 if you round up.2 This is the most important
information within your meta-analytic output; it tells the reader what the correlation
is between the two variables after they have been sample-weighted, averaged, and
pooled across the ten studies.
The remainder of the table contains several other pieces of statistical informa-
tion: se is the standard error of the effect size; Z is a test of whether the effect size
estimate is significantly different from zero; the p value is a test of the significance
of the Z test. What the Z test and p value are telling you is the likelihood that your
effect size is significantly different from zero, that is, that there actually is a correla-
tion between drinking intentions and drinking behaviour which is not null.
Where meta-analysis differs from usual practice about reporting statistics in psy-
chology, however, is that most of the time we’re not that interested in a significant
result in terms of a p value. Instead we use other statistics, confidence intervals, to
infer significance in a meta-analysis. Because meta-analysis is about pooling results
across studies, we are interested in the range of possible values the effect size esti-
mate could take, using the data from the studies we have collected to provide this
information. If your experience of statistics is focused on analysing data from a
single dataset, you’ve probably not stopped to think about the range of values an
effect size, like a correlation, could take, across a set of studies. And yet, this is a
good question to ask! Every time we do a study, we are collecting data to inform
ourselves about something we are interested in; in this example, the correlation
between drinking intentions and drinking behaviour. A key reason to run a meta-
analysis is to compare correlations from multiple studies to get a sense of the range
2
Reporting results to two or three (or more) decimal places often generates animated arguments in
statistics. I was taught to round up to two decimal places, but some journals insist on reporting
three decimal places and rounding up can cause confusion and shady practice when used with p
values. Tread carefully!
How Do I Interpret the Output? 103
of values found for an effect size. Doing this helps us develop a more precise esti-
mate of the range of values our effect size could take. Moving from primary to
secondary analysis means thinking about the accumulation of evidence rather than
the result from one study. This issue is more salient when running secondary
analyses.
We have a lower limit confidence interval of 0.416, which is a medium-sized
correlation, and an upper limit confidence interval of 0.676, which is a large-sized
correlation. Lower and upper confidence intervals are calculated to fall above or
below your effect size estimate (r = 0.545) and are equidistant between the lower
and upper limit values. The lower value of r = 0.416 is 0.13 below the effect size
estimate (r = 0.545), the upper value (r = 0.676) 0.13 above the estimate. So, the
final thing your main table is telling you is that while the sample-weighted average
correlation for your studies is r = 0.545, the correlation could fall anywhere between
r = 0.416 (the lower confidence interval) and r = 0.676 based on data from the ten
studies you included. From an interpretation point of view this means that our drink-
ing intentions–drinking behaviour correlation is at least medium-sized (the lower
value is medium-sized) and likely to be large-sized (the estimate and upper value
are both large-sized).
I’ll finish this section by making two additional points. First, over time, you
develop an intuitive sense of wide and narrow confidence intervals. To me, these
confidence intervals are narrow, meaning, that the correlation values we have across
studies are similar (common with made up data!). Second, confidence intervals are
easier to interpret than p values, in my opinion. To interpret an effect size estimate
as significant using confidence intervals, all you need to do is check if the signs are
the same: both positive or both negative means you have a significant effect. If one
sign is negative and the other positive, however, that means you have a non-
significant effect? Why? Because it means that zero is a potential value for your
effect size, and if zero lies within the range of potential values for your effect size
estimate, you cannot rule out the possibility that it is the ‘true’ effect size.
Heterogeneity Statistics
The second table in your output is titled Heterogeneity Statistics (see Fig. 9.5).
I2 is a commonly reported measure of heterogeneity between studies included in
a meta-analysis. As a heuristic, the way to interpret I2 is if the value is below 25%,
you have low heterogeneity, between 26 and 75% is moderate heterogeneity, while
above 75% is high heterogeneity. In our case, the value is 91.04% indicating high
heterogeneity. The Q value (Q = 76.883, p <0.001) indicates significant heterogene-
ity between effect sizes included in the meta. Taken together, these statistics show
that correlations between drinking intentions and behaviour vary considerably from
one study to another. I’ll talk about Tau and Tau2 when I discuss differences between
random effects and fixed-effect meta-analyses in Chap. 11 and provide more infor-
mation on heterogeneity statistics in Chap. 12.
104 9 How to Conduct a Meta-Analysis of Correlations
The forest plot (see Fig. 9.6) below the table provides a visual illustration of the
heterogeneity between studies. Each square on the plot represents a correlation for
one study with the arms showing the confidence intervals. There is a useful trick
when it comes to inferring effects within the forest plot—the wider the confidence
intervals, the smaller the sample size, and conversely, the narrower the confidence
intervals, the larger the sample size. The diamond at the bottom of the plot shows
the overall correlation; its edges represent the confidence intervals.
Inspecting the Forest plot you can see that correlations vary from the small-sized
correlation reported by Arking and Jones (2010) r = 0.26 to the likely collinear
large-sized correlation of r = 0.87 reported by Erasmus et al. (2009). Five of the
studies report large-sized correlations, so we should perhaps not be too surprised
that the overall effect size estimate is also large-sized.
We can add to our understanding of effect sizes included by getting MAJOR to
add the weightings for each study. Remember that meta-analysis assumes that larger
samples are more representative of the population effect size (see Chaps. 3 and 7),
which means that of these ten studies, Cole et al. should be more representative of
the population effect size than Jacobi and Jordan. We can confirm our reasoning by
using one of the menus in jamovi. If you click on the plots menu, you can add the
Model fitting weights to the forest plot. These weights tell you how much each
study informs the overall effect size.
How Do I Interpret the Output? 105
Confirming our belief in the importance of Cole et al., we can see in Fig. 9.7 that
this study had 11.75% weight (influence) on the overall effect size. As mentioned
above, Jacobi and Jordan (2014) has the lowest weight (influence) 7.55%, due to
having the smallest sample size of our studies. The takeaway message from this
digression is that the overall correlation depends more on correlations from larger
samples than correlations from smaller samples. This is the genius of meta-analysis;
Jacobi and Jordan’s (2014) large correlation (r = 0.78) is unlikely to be the popula-
tion correlation between drinking intentions and behaviour because it is based on a
small sample size, which are prone to being unreliable. In contrast, a correlation
from a larger sample, like Cole et al.’s (2015) r = 0.48 is more likely to reflect the
population correlation and is a better basis for inference. I will discuss weighting
studies in more detail in Chap. 11.
There are a couple of other things you can infer from this forest plot. Each study
has an effect size difference to the right of the vertical line, which is set at zero
meaning no correlation between intentions and behaviour. So, all our included stud-
ies reported positive correlations.
As well as checking the direction of each study from the forest plot, we can also
check the magnitude of the effect size for each study. Taking this idea a step further,
the forest plot allows us to identify any effect sizes that are not significantly differ-
ent from zero, by examining the confidence intervals for each study. In this set of
studies, none of the confidence intervals contain zero, meaning that all results are
significant. This does not mean all included studies show a ‘true’ effect, however. It
is possible that some results reflect p-hacking (see Chap. 13). This leads us neatly
on to discuss the table showing Publication Bias Assessment statistics.
106 9 How to Conduct a Meta-Analysis of Correlations
Publication Bias
The final table in your output provides information about publication bias, that is,
the tendency for journals to publish papers reporting significant findings (see Chap.
13), like significant correlations between variables. Two methods—statistical esti-
mates of publication (see Fig. 9.8) and funnel plots (see Fig. 9.9)—are reported
following meta-analyses to help identify publication bias in research literatures.
Rosenthal’s (1979) Fail-Safe N statistic tells you how many studies you would need
to find that all show null effect sizes (correlations in this case) to reduce confidence
in your meta-analytic results. You contrast Fail-Safe N values with the number of
studies included in your meta-analysis. In this case, you have found ten studies and
the fail-safe n value = 2644 studies. Given you spent ages systematically searching
and screening and found ten studies, it seems highly unlikely that you have missed
an additional 2644 studies that ALL report null correlations!!! So, we can infer
confidence in our correlation from fail-safe values.
Begg and Mazumdar’s (1994) Rank Correlation and Egger’s Test (1997) regres-
sion test both estimate the extent of symmetry in effect sizes from included studies.
In a symmetrical distribution of effect sizes, you should have a roughly equal num-
ber of effects above and below the overall estimate. This would indicate that studies
with smaller, null, or negative effects, which are less likely to be significant, are
being published. In an asymmetrical distribution, in contrast, studies reporting
smaller, null, or negative effects, which are less likely to be significant, are missing
from the plot. Hence, there is a lack of symmetry in the distribution of effect sizes.
An additional indicator of publication bias is when your set of studies ONLY
include studies with small sample sizes reporting positive and significant effect
sizes and lacks studies with small sample sizes with negative (or null) effect sizes.
Assuming publication bias exists, the studies with significant effects are more likely
to be published than the studies with non-significant effect sizes, even if these stud-
ies, when based on small sample sizes, might produce unreliable effects. In our
table, both statistics are not significant which suggests an absence of publication
bias. We can also look at a funnel plot to check for publication bias.
Interpreting a funnel plot (Fig. 9.9) centres on thinking about included studies’
effects in terms of their magnitude and standard errors, which are an analogue of
sample size. Effect sizes (correlations here) are plotted on the X axis. For instance,
at the bottom of the plot nearest the X axis, Jacobi and Jordan’s r of 0.78; near the
top of the plot, to the left of the vertical line, you can see Cole et al. r = 0.48. The
effect sizes are spread apart in this plot because they vary in sample size, and there-
fore, sample error, which is plotted on the Y axis.
As discussed in Chap. 7, studies with larger sample sizes necessarily possess
smaller standard errors relative to studies with smaller sample sizes. Standard errors
represent the distance between an individual effect size from the overall (popula-
tion) effect size, for example, the distance between Cole et al.’s r = 0.48 and the
overall effect size r = 0.55, adjusted by the sample size for the individual effect size.
The reason that Jacobi and Jordan has a larger standard error than Cole et al. is that
it has a smaller sample size. We can use our forest plot to identify effect sizes and
then use the funnel plot to identify which studies have the smallest standard errors.
108 9 How to Conduct a Meta-Analysis of Correlations
Small and large are both relative terms here; a small standard error in this Funnel
plot might be a large standard error in another sample of studies.
Although you can use the funnel plot to check results for individual studies, most
of the time we use them to visualize the distribution of effect sizes and see if they
appear symmetrical or asymmetrical; asymmetrical effect sizes suggest publication
bias. Figure 9.9 shows a symmetrical distribution: five effect sizes appear to the
right of the population estimate (r = 0.55) and five appear to the left of it. While
there are no studies to the left of the vertical line, indicating studies reporting smaller
correlation that also have a large standard error, there is only Jacobi and Jordan’s
paper that is to the right of the vertical line with a large standard error; this is the
kind of result that suggests an unreliable finding, perhaps even evidence of p-hack-
ing. In a literature where publication bias is present, you would expect there to be
more studies in this sector of the plot.
While you cannot rule out publication bias based on the symmetry of a funnel
plot (Borenstein et al., 2021), the non-significance of Begg and Mazumdar’s Rank
Correlation (1994) and Egger’s Test (1997) regression statistics reported in the table
increases confidence that our set of studies is not too badly affected by publication
bias. Interpreting results from meta-analysis often involves constructing arguments
based on multiple sources of information. If the funnel plot looks symmetrical and
the statistics are non-significant, you can propose a lack of publication bias.
Conversely, an asymmetrical distribution and significant statistics suggests your set
of included studies may suffer from publication bias. I will pick up these issues in
greater depth in Chap. 13.
Summary
You now know how to run a meta-analysis of correlations in jamovi using MAJOR
and how to interpret the output. When writing up the results of your meta-analysis,
always report the overall effect size, confidence intervals, tests of heterogeneity, and
publication bias in the main text and include relevant forest plot(s) as a figure(s) (see
Chap. 15 for more tips on how to write-up your results). You can include a funnel
plot(s) too, but I usually include them as supplementary files, especially if they
show symmetry. To help reinforce your learning of the material covered in this
chapter, I’ve included some tasks to help you practise your skills.
Tasks
Task 1: Report the key information from the main output table of your
meta-analysis.
Task 2: Report the heterogeneity of your meta-analysis.
Task 3: Report the evidence for/against publication bias in your meta-analysis.
Task 4: Go back to your dataset and change the sample sizes of these studies as
follows:
References 109
Then re-run your meta-analysis. Compare the results to your answers for Tasks 1–3
and see if any changes have taken place that affect your interpretation of the
results.
Task 5: Copy your original dataset (give it the name: ‘Eight correlations’). Remove
Jacobi and Jordan and Erasmus et al. and see what impact that has on (1) the
main output table, (2) the heterogeneity table, and most critically (3) the fun-
nel plot.
References
Begg, C. B., & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for pub-
lication bias. Biometrics, 50, 1088–1101.
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2021). Introduction to meta-anal-
ysis (Second edition). Wiley.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Egger, M., Higgins, J. P. T., & Smith, D. (1997). Bias in meta-analysis detected by a simple,
graphical test. BMJ, 315, 629–634.
Rosenthal, R. R. (1979). The File drawer problem and tolerance for null results. Psychological
Bulletin, 86, 638–641.
How to Conduct a Meta-Analysis
of Effect Size Differences 10
1. Study Name (to identify the study authors, a nominal measure, and text data type)
2. Control N (the sample size for the control group/condition; a continuous mea-
sure and integer data type)
3. Control M (the mean for the outcome in the control group; a continuous measure
and usually a decimal data type)
4. Control SD (the standard deviation for the outcome in the control group; a con-
tinuous measure and usually a decimal data type)
5. Experiment N (the sample size for the experimental or intervention group/condi-
tion; a continuous measure and integer data type)
6. Experiment M (the mean for the outcome in the experimental or intervention
group; a continuous measure and usually a decimal data type)
7. Experiment SD (the standard deviation for the outcome in the experimental or
intervention group; a continuous measure and usually a decimal data type)
Table 10.1 Raw statistics for studies testing interventions to reduce screen time
Authors Control Intervention
M SD N M SD N
Keane et al. 13.50 4.90 50 12.50 4.88 50
Linus et al. 10.00 2.20 75 8.75 1.99 75
Mimms et al. 15.00 4.00 25 12.00 4.40 25
Noone et al. 20.00 5.15 125 17.00 4.20 120
Owen 17.00 3.12 250 13.5 6.10 200
Peeps et al. 14.00 7.88 500 11.50 4.30 450
Quest et al. 25.00 4.50 15 24.00 4.30 20
Ricki et al. 13.00 2.20 115 11.50 3.90 120
Sopp et al. 18.00 3.50 55 15.50 7.50 45
Tapp et al. 30.00 4.00 30 27.00 3.50 20
Next enter the data1 from Table 10.1, to recreate Fig. 10.1.
You are now ready to run a meta-analysis of effect size differences!
Clicking on MAJOR will open a drop-down menu in your dataset, as in Fig. 10.2.
Select the fourth option—Mean Differences (n, M, SD) —to tell MAJOR you want
to run a meta-analysis of effect size differences using sample size (n), mean (M),
and standard deviation (SD) for the two groups. When you click on this option, you
will enter the analysis window (see Fig. 10.3) where you enter variables to run your
meta-analysis of effect size differences.
Use the arrows in the middle of the display to match the variables with the rele-
vant boxes:
1
This data is also available on my Open Science Framework page as a csv file you can download
and open in jamovi.
Running a Meta-Analysis of Effect Size Differences in Jamovi Using MAJOR 113
Fig. 10.3 MAJOR analysis window for Mean Differences (n, M, SD)
After you do this, your jamovi output window will populate with information
about your meta-analysis. I will slowly go through each part of this output in the
next section.
The output is split into three sections: (1) Main output table (Fig. 10.4), which
shows the overall effect size plus various statistics; (2) Tests of heterogeneity table
(Fig. 10.5) and forest plot (Fig. 10.6); (3) Tests of publication bias table (Fig. 10.8)
and funnel plot (Fig. 10.9). I will start by explaining the Main output table.
Main Output
The main output contains the key statistical information from your meta-analysis
(see Fig. 10.4). The text in blue confirms the type of meta-analysis you have run in
MAJOR—Mean Differences using n (sample sizes), M (means), and SD (standard
deviations)—for both groups. Immediately above the table it says a random effects
model was used—I will explain this idea in Chap. 11. First, look at the estimate,
which is the sample-weighted average effect size difference between the two groups
in the outcome of interest (screen time). According to Cohen (1992), effect size dif-
ferences can be interpreted as medium-sized if they equal or exceed d = 0.50.
Therefore, we can say we have a medium effect size difference in the outcome
between our two groups because the estimate is d = 0.524. This is the most impor-
tant information within your meta-analytic output; it tells the reader what the effect
size difference is, averaged, sample-weighted, and pooled across the studies.
The remainder of the table contains several other pieces of statistical informa-
tion: se is the standard error of this effect; Z is a test of whether the estimate is sig-
nificantly different from zero—in this case, it is because the p value is <0.001. What
Fig. 10.4 Main output table for meta-analysis of effect size differences
Heterogeneity Statistics 115
the Z test and p value are telling you is the likelihood that your effect size is signifi-
cantly different from zero; put another way, that there actually is an effect size dif-
ference in screen time between control and intervention groups and it is not null.
Now a key difference with reporting meta-analysis is that the preference is to
focus on the confidence intervals, in this case 0.400 and 0.648, rather than Z and p
values. Meta-analyses tend to focus on the confidence intervals, instead of the Z and
p values, because it helps with inferences about the range of values reported across
studies, an issue that is more salient when running secondary analyses. We have a
lower limit confidence interval of 0.400, which is a small effect size difference, and
an upper limit confidence interval of 0.648, which is a medium effect size differ-
ence. Your overall effect will always fall equidistant between the lower and upper
limit values; it is 0.124 above the lower limit value and 0.124 below it. Overall, the
values tell us that the effect of receiving an intervention to reduce screen time in the
included studies produced somewhere between a small-sized and medium-sized
effect size difference in screen time.
I’ll finish this section by making two additional points. First, over time, you
develop an intuitive sense of wide and narrow confidence intervals. To me, these
confidence intervals are narrow meaning that the effect size difference values we
have across studies are like one another (common with made up data!). Second,
confidence intervals are easier to interpret in terms of significance, than p values. To
interpret an effect as significant using confidence intervals, all you need to do is
check if the signs are the same: both positive or both negative means you have a
significant effect. If one sign is negative and the other positive, however, that means
you have a non-significant effect? Why? Because it means that the value zero is a
potential value for your effect, and if zero is a potential value you cannot rule out the
possibility that it is the ‘true’ effect size.
Heterogeneity Statistics
The second table in your output is titled Heterogeneity Statistics (see Fig. 10.5). I2
is a commonly reported measure of heterogeneity between studies included in a
meta-analysis. As a heuristic, the way to interpret I2 is if the value is below 25%, you
have low heterogeneity, between 26 and 75% is moderate heterogeneity, while
above 75% is high heterogeneity. In our case the value is 40.25% indicating moder-
ate heterogeneity. The Q value is non-significant (Q = 14.883, p = 0.094) which
suggests a lack of heterogeneity in effect sizes between studies. Taken together,
these statistics show that while effect size differences vary from one study to another,
Fig. 10.5 Heterogeneity statistics table for meta-analysis of effect size differences
116 10 How to Conduct a Meta-Analysis of Effect Size Differences
this variance is not particularly large. I’ll talk about Tau and Tau2 when I discuss
differences between random effects and fixed effect meta-analyses in Chap. 11 and
go into the heterogeneity statistics in more detail in Chap. 12.
The forest plot (Fig. 10.6) provides a visualisation of the effect sizes from
included studies and confirms that the effect size differences do not vary much
between studies. Each square on the plot represents an effect size difference for one
study with the arms showing the confidence intervals. There is a useful trick when
it comes to inferring effects within the forest plot—the wider the confidence inter-
vals, the smaller the sample size, and conversely, the narrower the confidence inter-
vals the larger the sample size. The diamond at the bottom of the plot shows the
overall effect size difference; its edges represent the confidence intervals.
Based on this forest plot, most effect size differences are similar. Sopp et al.
(d = 0.44); Rick et al. (d = 0.47); Linus et al. (d = 0.59) and Noone et al. (d = 0.64)
all fall within the overall confidence intervals of [0.40; 0.65] with Peeps et al.
(d = 0.39) just below the lower interval. The other five studies are more variable:
Keane et al. (d = 0.20) and Quest et al. (d = 0.22) are both small-sized; Mimms et al.
(d = 0.70), Owen (d = 0.75) and Tapp et al. (d = 0.78) are all medium-sized. So, our
range of effect size differences is from d = 0.20 (Keane et al.) to d = 0.78 (Tapp
et al.). While this is quite a wide range of values, the fact that four studies lie within
the confidence intervals, with one close to the lower limit, shows there is some
degree of replication of effect sizes in these included studies. The studies are not all
Heterogeneity Statistics 117
Publication Bias
The final table in your output provides information about publication bias, that is,
the tendency for journals to publish papers reporting significant findings, like sig-
nificant effect size differences that favour the intervention group over the control
(see Chap. 13).
Two methods—statistical estimates of publication (see Fig. 10.8) and funnel
plots (see Fig. 10.9)—are reported following meta-analyses to help identify publi-
cation bias.
Rosenthal’s (1979) Fail-Safe N statistic tells you how many studies you would need
to find that all show null effect sizes (effect size differences in this case) to reduce
confidence in your meta-analytic results. You contrast Fail-Safe N values with the
number of studies included in your meta-analysis. In this case, you have found ten
studies and the fail-safe n value = 425. Given you spent ages systematically search-
ing your literature and found ten studies, it seems highly unlikely that you have
missed an additional 425 studies that ALL report null effect size differences!!! So,
Publication Bias 119
Fig. 10.8 Publication Bias Assessment table for meta-analysis of effect size differences
we can infer confidence in our correlation from this statistic by reporting the Fail-
Safe N value.
Begg and Mazumdar’s (1994) Rank Correlation and Egger’s Test (1997) regres-
sion test both estimate the extent of symmetry in effect sizes from included studies.
In a symmetrical distribution of effect sizes you should have a roughly equal num-
ber of effects above and below the overall estimate. This would indicate that studies
with smaller, null, or negative effects that are less likely to be significant are being
published. In an asymmetrical distribution, in contrast, studies reporting smaller,
null, or negative effects, that are less likely to be significant, are missing from the
120 10 How to Conduct a Meta-Analysis of Effect Size Differences
plot. Hence, there is a lack of symmetry in the distribution of effect sizes. An addi-
tional indicator of publication bias is when your set of studies ONLY include studies
with small sample sizes reporting positive and significant effect sizes and lacks
studies with small sample sizes with negative (or null) effect sizes. Assuming pub-
lication bias exists, the studies with positive effects are more likely to be published
than the studies with negative effect sizes, even if these studies, when based on
small sample sizes, might produce unreliable effects. In our table, both statistics are
non-significant suggesting a lack of publication bias.
Interpreting a funnel plot (see Fig. 10.9) centres on thinking about included studies’
effects in terms of their magnitude and standard errors, which are an analogue of
sample size. Effect size differences are plotted on the X axis. For instance, at the
bottom of the plot nearest the X axis, you’ll find Quest et al.’s d of 0.22. If you were
to draw a vertical line from this point you would nearly hit Keane et al.’s d = 0.20.
Although both studies have similar magnitudes (d = 0.20 vs d = 0.22) they vary in
their standard error, which is plotted on the Y axis.
As discussed in Chap. 7, studies with larger sample sizes necessarily possess
smaller standard errors relative to studies with smaller sample sizes. Standard errors
represent the distance between an individual effect size from the overall (popula-
tion) effect size, for example, the distance between Keane et al.’s d = 0.20 and the
overall effect size d = 0.52, adjusted by the sample size for the individual effect size.
The reason that Quest et al. has a larger standard error than Keane et al. is that it has
a smaller sample size. We can use our forest plot to identify effect sizes and then use
the funnel plot to identify which studies have the smallest standard errors. Small and
large are both relative terms here; a small standard error in this Funnel plot might be
a large standard error in another sample of studies.
Although you can use the funnel plot to check results for individual studies, most
of the time we use them to visualize the distribution of effect sizes to see if they
appear symmetrical or asymmetrical; asymmetrical effect sizes suggest publication
bias. Fig. 10.9 shows a symmetrical distribution: five effect sizes appear to the right
of the population estimate (d = 0.52) and five appear to the left of it. Importantly,
there is one study to the left of the population estimate that also has a large standard
error. This is the type of study we would expect to fall foul of publication bias so the
fact we’ve found it encourages us that our set of included studies does not suffer
from publication bias.
While you cannot rule out publication bias based on the symmetry of a Funnel
plot (Borenstein et al., 2021), the non-significance of Begg and Mazumdar’s (1994)
Rank Correlation and Egger’s Test (1997) regression statistics reported in the table
increases confidence that our set of studies is not too badly affected by publication
bias. Interpreting results from meta-analysis often involves constructing arguments
based on multiple sources of information. If the funnel plot looks symmetrical and
the statistics are non-significant, you can propose a lack of publication bias.
References 121
Summary
You now know how to run a meta-analysis of effect size differences in jamovi using
MAJOR and how to interpret the output. When writing up the results of your meta-
analysis, always report the overall effect size, confidence intervals, tests of hetero-
geneity, and publication bias in the main text and include relevant forest plot(s) as a
figure(s) (see Chap. 15 for more tips on how to write-up your results). You can
include a funnel plot(s) too, but I usually include them as supplementary files, espe-
cially if they show symmetry. In the next chapter, we’ll discuss a key conceptual
issue—the difference between Random effects and Fixed-effect meta- analytic
methods.
Tasks
Task 1: Report the key information from the main output table of your
meta-analysis.
Task 2: Report the heterogeneity of your meta-analysis.
Task 3: Report evidence for/against publication bias in your meta-analysis.
Task 4: Go back to your dataset and change the sample sizes of these studies as
follows:
Then re-run your meta-analysis. Compare the results to your answers for Tasks
1–3 and see if any changes have taken place that affect your interpretation of the
results.
Task 5: Copy your original jamovi dataset (give it the name: ‘Eight effect size
differences’). Remove Keane et al. and Quest et al. and see what impact that has on
(1) the main output table, (2) the heterogeneity table, and most critically (3) the fun-
nel plot.
References
Begg, C. B., & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for pub-
lication bias. Biometrics, 50, 1088–1101.
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2021). Introduction to meta-anal-
ysis (2nd ed.). Wiley.
122 10 How to Conduct a Meta-Analysis of Effect Size Differences
There are two kinds of meta-analysis: Fixed-effect and Random effects. In a Fixed-
effect meta-analysis, it is assumed there is one ‘true’ (Fixed) effect size, whereas in
a random effects meta-analysis, it is assumed there are a range of ‘true’ effect sizes
that vary between studies. MAJOR’s default is to run a random effects model, using
the Restricted Maximum-Likelihood method although you can choose between
models by clicking on the Model estimator drop-down menu (see Fig. 11.1). I have
used the Hunter-Schmidt method (Cooke & French, 2008; Cooke & Sheeran, 2004)
DerSimonian and Laird method (Cooke et al., 2016), or the Restricted Maximum-
Likelihood method (Cooke et al., 2023), which are all random effects methods.
Differences between these methods are discussed in Borenstein et al. (2021). I’m
going to explain fixed-effect meta-analysis first as it is simpler to understand.
which means they also have larger sampling errors. Therefore, a fixed-effect meta-
analysis assigns greater weight to studies with larger samples as they contain lower
sampling error. The formula for working out the weighting of each study in a fixed-
effect meta is the inverse of the variance (i.e. 1/variance); because studies with
larger samples have smaller variance, they are given more weight. Table 11.1 con-
tains the variance and weightings for five studies.
In this set of studies, Kubrick has the lowest variance, so they end up with the
highest weighting. In the meta-analysis, these (absolute) values are adjusted to pro-
vide a relative weighting, where all the values add up to 100%. So, a fixed-effect
meta-analysis is quite straightforward: We assume that all studies are providing a
test of a true (Fixed) effect size, and we use that assumption to justify the weighting
of each study being solely based on sample size when we pool results. Things are
more complex in a random-effects meta-analysis.
Random effects meta-analysis assumes that sample size is not the only source of
variance between studies included in meta-analysis, embracing the idea that studies
are likely to vary with one another even when they are all reportedly testing the
same effect size. Random effects meta-analysis breaks down the variance between
included studies into two parts: (1) within-study variance and (2) between-study
variance. Within-study variance is based on differences in sample size between
studies: Like a fixed-effect meta-analysis, random effects meta-analysis also weights
studies with larger sample sizes relatively more than studies with smaller sample
sizes. Unlike a fixed-effect meta, random effects meta does NOT assume that all
studies are providing a test of the same ‘true’ (Fixed) effect. This is where the
between-studies variance comes in and is the main reason why studies are weighted
differently in fixed effect versus random effects meta-analysis.
A random effects meta-analysis works differently to a fixed-effect meta-analysis
because it assumes that the ‘true’ effect size may vary between studies following a
normal distribution and any meta-analysis pools only a random selection of tests of
the range of true effects. Included studies only represent a random selection of all
the tests that could have been performed. Such an approach acknowledges that the
effect size in any individual study may vary because of factors like where the study
was conducted, how data was collected (online or in person), how constructs or
128 11 Fixed Effect vs Random Effects Meta-Analyses
studies’ correlations. Importantly, the Q values, and their associated p values, are
identical, which tells you that in random-effects meta-analysis, the Q test is based
on a fixed-effect analysis (see Chap. 12).
Although the effect sizes are both large-sized, they are not identical, which is due to
how studies are weighted in each type of meta-analysis. Hinting at this difference
you can see that the confidence intervals are narrower for the fixed-effect [0.47;
130 11 Fixed Effect vs Random Effects Meta-Analyses
0.54] vs the random effects [0.42; 0.68] model, and the Z value is almost 4 times
bigger in the fixed effect meta. Tau values are only reported for the random effects
analysis because they are not required in a fixed-effect meta-analysis (see above).
As stated above, a fixed-effect model assumes that the studies are testing the
same effect size. So, the ten studies we meta-analysed mainly differ in how many
people were recruited. A consequence of making this assumption is that you do not,
in statistical terms, need to account for other sources of variance between studies,
which means your analysis is, for want of a better term, more confident 😊. Less
consideration of variation between studies leads you to believe that the range of
possible values (confidence intervals) for the correlation is narrower. I’ll end by not-
ing that in interpretation terms, there’s little difference between the models. Both
estimates are for a large-sized correlation between intentions and behaviour, both
are significant, the confidence intervals are quite similar, and both estimates show
heterogeneity.
What does differentiate a fixed-effect from a random effects model is how each
model weights the included studies. I’ll use Forest plots to explain this point.
Before reading any further, please go back to the Model estimation tab and
change your analysis back to Restricted Maximum Likelihood
In the Plots tab, click on the Model Fitting Weights, and your forest plot for a
Restricted Maximum Likelihood model should look like Fig. 11.4:
This forest plot displays how much each study is weighted in the overall effect
size. The weight of each study is reported in percentage terms that sum to 100;
because the ten studies are used to generate the overall weight, the sum of their
How Do Fixed-Effect and Random-Effects Meta-Analyses Weight Studies? 131
Fig. 11.4 Forest plot showing study weightings applying a random-effects (Restricted Maximum-
Likelihood) model
weightings must add up to 100%. Cole et al. (2015), which has the largest sample
size, also has the largest weighting (11.75%), while Jacobi and Jordan (2014),
which has the smallest sample size, has the smallest weighting (7.55%). Weightings
are quite similar sized, ranging from 7.55% to 11.75%. Now, change your model
estimator to Fixed-Effect. You will end up with Fig. 11.5:
Look at the weightings again: Cole et al. (2015) now has a weighting of 54.05%,
while Jacobi and Jordan (2014) is only 1.27%! In a fixed-effect meta-analysis, sam-
ple size drives weighting. This helps to explain why the overall correlation is lower
for the fixed-effect meta than the random effects meta—Cole et al.’s (2015) sample
size means its correlation exerts greater weight (influence) on the overall correlation
in the fixed-effect vs the random effects model.
Hopefully you can see from the weightings in the two screenshots why results
for the overall effect size differ depending on whether you use a random-effects or
fixed-effect method. The remainder of the output is broadly similar for both models;
the publication bias statistics are identical for random effects and fixed effect meta-
analyses because they are based on standard errors that are relative to the overall
effect size (see Chap. 13). The funnel plots do look slightly different, which I believe
reflects the wider funnel in the random effects model.
132 11 Fixed Effect vs Random Effects Meta-Analyses
Fig. 11.5 Forest plot showing study weightings applying a fixed-effect model
Random effects meta-analysis is a methodology that embraces the idea studies are
likely to vary with one another in both known (measured) and unknown (unmea-
sured) ways, even when they are all, in theory, testing the same effect size. When
studies are conducted, there is often a lack of consensus between research teams on
aspects of study design, measurement, analytic approach, et cetera. So, even in
Cooke et al. (2016), where studies were purportedly all testing the same theory
relationships, and should have used similar designs and measures, we coded a range
of factors that differed between studies.
Some of the differences between studies reflect choices made by research teams,
such as, the age of the sample recruited. Most researchers tested theory relation-
ships in young adults, a handful in adolescents. When you have a known (measured)
factor that differs between studies, and enough cases of each level of the factor, you
can conduct a moderator analysis (see Chap. 12) to compare effect sizes at different
levels of this factor, that is, compare attitude–intention relationships for adolescents
and adults as we did. Often, you’ll be aware of these differences before you conduct
the systematic review which informs the meta-analysis, and therefore it’s best
Summary 133
Summary
References
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2021). Introduction to meta-anal-
ysis (2nd ed.). Wiley.
Cooke, R. (2021). Psychological theories of alcohol consumption. In R. Cooke, D. Conroy,
E. L. Davies, M. S. Hagger, & R. O. de Visser (Eds.), The Palgrave handbook of psychological
perspectives on alcohol consumption (pp. 25–50). Springer International Publishing. https://
doi.org/10.1007/978-3-030-66941-6_2
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., & French, D. P. (2008). How well do the theory of reasoned action and theory of
planned behaviour predict intentions and attendance at screening programmes? A meta-analy-
sis. Psychology & Health, 23(7), 745–765. https://doi.org/10.1080/08870440701544437
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
Cooke, R., & Sheeran, P. (2004). Moderation of cognition-intention and cognition-behav-
iour relations: A meta-analysis of properties of variables from the theory of planned
behaviour. The British Journal of Social Psychology, 43(Pt 2), 159–186. https://doi.
org/10.1348/0144666041501688
Moderator (Sub-group) Analyses
12
between studies variance, (see Chap. 11); I2 represents the ratio of true heterogene-
ity to total observed variation.
In a random-effects meta-analysis, we assume that the ‘true’ effect size can vary
between studies and weight studies according to this principle (see Chap. 11). This
means there is naturally going to be variability in effect sizes because unlike with a
fixed-effect meta, you accept the possibility that the true effect size can vary. So,
part of heterogeneity in overall effect sizes is a consequence of this between-study
variance in true effect sizes. There is also heterogeneity due to sampling (within-
study) error, which reflects differences between studies in sample sizes. When
thinking about heterogeneity in overall effect sizes, we seek to partition it into
between-studies and within-studies components. I’ll start by discussing the Q statis-
tic which helps understand the within-studies component.
The Q statistic represents the ratio of observed variation to within-study error. To
compute Q, you calculate the deviation of each effect size from the overall effect
size, square it, weight it by the inverse variance (see Table 11.1) and then sum the
results. This produces a weighted sum of squares or Q value. Because you are sub-
tracting each effect size from the overall effect size you are creating a standardised
measure, which means it does not matter what scale your effect size is on. Once we
have computed our Q value, we next work out our expected value under the assump-
tion that all studies share a common effect size. This is simply the degrees of free-
dom, that is, the number of studies (K) − 1. So, if you have ten studies, your degrees
of freedom would be nine.
We now have an observed weighted sum of squares (i.e. the Q value) and an
expected weighted sum of squares (degrees of freedom). If we subtract the degrees
of freedom from the Q value, we get the excess variation to answer the question
“Does observed variation between studies effect sizes, exceed what we would
expect?” If your Q value exceeds your degrees of freedom, this tells you that you
have more variation between studies than would be expected. This means that Q is
dependent on the number of studies you include in your meta-analysis. So, a non-
significant Q value indicates a lack of heterogeneity between effect sizes because
the variation of the effect sizes is less than we would expect based on how many
effect sizes are included in the meta; more effect sizes = more expected variation.
When you see a Q value in a meta-analytic output, it is accompanied by a p
value, which follows the tradition common to psychology of being significant at
p < 0.05 (or lower) and non-significant if p > 0.05. For example, the Q value is sig-
nificant for the meta-analysis of correlations in Chap. 9, but not for the meta-analysis
of effect size differences in Chap. 10. The Q test is testing the null hypothesis that
all studies share a common effect size, reasoning that is more in keeping with a
fixed-effect than random-effects meta-analysis (and hence why the Q values are the
same in Table 11.2). So, a significant Q suggests studies vary in their ‘true’ effect
size. However, as Borenstein et al. (2021) note, a non-significant Q value does not
imply that effect sizes are similar (or homogenous). The lack of significance may be
due to low power, which is likely with a small number of studies and/or small sam-
ple sizes.
Statistics Used to Test Heterogeneity in Meta-Analysis 137
sure of inconsistency across study findings; it does not tell you anything about the
‘true’ effects for studies, however. One difference between I2 and Q is that I2 is not
affected by the number of studies in the way Q is. Borenstein et al. (2021) argue that
I2 can be used to determine what proportion of observed variance is real. Low I2
values suggest observed variance is mostly spurious, whereas moderate and higher
values of I2 imply that there are factors that underlie differences between studies’
effect sizes. This is where moderator (sub-group) analyses come in.
138 12 Moderator (Sub-group) Analyses
When I first started thinking about meta-analysis, I was encouraged to think about
variables that might moderate the correlations included in my meta-analysis. Indeed,
Cooke and Sheeran (2004) literally used meta-analysis to test moderators of theory
of planned behaviour relationships (e.g. attitude–intention; intention–behaviour)
using properties of cognition (e.g. how accessible attitudes and intentions are in
memory; how stable intentions are over time). Hence, considering moderators when
planning a meta-analysis is so engrained in my thinking that I struggle to conceive
of conducting a meta-analysis without specifying moderators a priori.
I’ll use Sheeran and Orbell’s (1998) meta-analysis as an example of how to iden-
tify factors that might moderate an overall effect size in meta-analysis. Six factors—
sexual orientation; gender; sample age; time interval; intention versus expectation;
steady versus casual partners—were proposed as potential moderators of the size of
the relationship between condom use intentions and condom use. We can split these
into sample and methodological moderators.
Sample moderators are factors that capture differences between studies in sam-
ples recruited. Sexual orientation, gender, sample age, and casual versus steady
partners are all sample moderators as they reflect characteristics of the samples
recruited. Methodological moderators capture methodological differences between
included studies: Time interval (i.e. the gap between measurement of intentions and
measurement of condom use) and intention versus expectation are both method-
ological moderators.
Sampling factors are sometimes outside the control of researchers conducting
primary studies; you might aim to recruit a sample with roughly equal numbers of
younger and older participants but end up with more of one group than the other. In
contrast, methodological factors tend to reflect decisions made by the researchers
conducting primary studies. The two methodological moderators mentioned above
highlight this principle in action; researchers decided how big the time interval
between measures of intentions and condom use was and if they wanted to measure
intentions or expectations.
Sheeran and Orbell (1998) reported mixed evidence for the effect of the six mod-
erators on their overall effect size of r+ = 0.44 between intentions and condom use.
Some moderators did not affect the overall effect size: the sample-weighted average
correlation was r+ = 0.45 for men and r+ = 0.44 for women, while the sample-
weighted average correlations for intention (r+ = 0.44) and expectation (r+ = 0.43)
did not differ from one another either. The lack of difference due to moderators tells
us that these variables do not offer effective explanations for heterogeneity in the
overall effect size. A lack of difference in effect sizes between levels of a categorical
moderator suggests that variation on this factor is not causing heterogeneity in the
overall effect size. Other factors moderated the overall effect size: Adolescents
reported weaker correlations (r+ = 0.25) than older samples (r+ = 0.50); the effect
size was stronger over shorter time intervals (less than 10 weeks, r+ = 0.59; more
than 10 weeks r+ = 0.33). The former result suggests that younger samples inten-
tions are less likely to be enacted than older samples, while the latter result implies
How to Identify Moderators When Writing the Protocol for Your Meta-Analysis 139
that intentions change over time, something I found in my PhD. Finally, the moder-
ating effect of sexual orientation could not be tested due to a lack of studies, an issue
we’ll return to later in the chapter.
I think it’s relatively easy to come up with sample factors a priori—in most meta-
analyses you can assume that gender (or some aspect of sex or classification system)
might influence results, and I’ve already discussed age as another sample modera-
tor. Conversely, sexual orientation and/or casual versus steady partner are not mod-
erators that would be suitable in meta-analyses of other topics. Identifying
methodological factors a priori takes a bit more thought but can still be done—time
interval between measurement is something that could easily affect the size of a
correlation or the effect size difference, so, might be a viable candidate for any
meta-analysis.
Other methodological factors are likely to be specific to your meta-analysis.
Unlike in the late 1990s, we don’t spend much time thinking about behavioural
intentions vs behavioural expectations anymore, but there will be methodological
factors that are relevant to your meta-analysis that are worth considering. For exam-
ple, we knew there were different types of implementation intention interventions—
if-then plans, self-affirmation implementation intentions, volitional help
sheets—when we pre-registered Cooke et al. (2023). During data extraction, we
also found that there were mental contrasting implementation intentions.
Specifying moderators in advance is a balance between knowing what factors are
likely to affect results and reflecting on the suitability of moderators that are tested
in many meta-analyses. There is no penalty for proposing moderator analyses a
priori that prove impossible to conduct. Indeed, when we had finished coding stud-
ies for Cooke et al. (2023), we only had sufficient studies to compare effect size
differences for studies that used if-then plans or volitional help sheet. Box 12.1
contains some tips about methodological moderators.
With dichotomous (binary) moderator analyses, you run separate meta-analyses for
studies depending on which level of the moderator they represent, that is, all studies
that reported correlations between expectations–condom use are meta-analysed,
with the intention–condom use correlations separated out and meta-analysed
together. Once you have done this, you can run Fisher’s (1921) Z test to see if the
correlations significantly differed from one another. The Z test checks the idea that
two correlations are drawn from independent samples, in other words, that they
significantly differ from one another.
Categorical Moderator Variables 141
In Cooke and Sheeran (2004), I computed overall effect sizes for high versus low
levels of properties of cognitions (temporal stability, accessibility, etc.) then ran Z
tests to see if they differed from one another. Alternatively, in Cooke et al. (2023),
we tested the idea that sample type (community, university) moderated the effect
size difference in weekly alcohol consumption. We found a larger effect size differ-
ence in consumption for community samples (d+ = −0.38) compared with university
samples (d+ = −0.04). We used the Q test, which is more typically used with cate-
gorical moderator variables, to confirm that these effect size differences signifi-
cantly differed from one another. As an aside, both community and university effect
size differences were not heterogeneous, suggesting that coding studies based on
the sample recruited created sub-sets of studies that found similar effect sizes.
Any variable with more than two levels is classed as a categorical moderator, which
are analysed using the Q test. This statistic seeks to test the idea that there is signifi-
cant (overall) heterogeneity between all sub-groups of studies in your meta-analysis.
If there is overall heterogeneity in the overall test, this means that your effect sizes
for your categories differ from one another.
In Cooke and French (2008), we found evidence of heterogeneity in each of the
five effect sizes representing theory of planned behaviour relationships we meta-
analysed. Type of screening test was a categorical moderator we tested to try and
reduce heterogeneity between studies. This moderator had six levels: cervical smear,
colorectal cancer, genetic test, health check, mammography, and prenatal. Type of
screening test moderated four of the overall effect sizes, suggesting that type of
screening is an important factor to consider when testing theory relationships. When
categorical moderators are significant, you can use pairwise Z tests to identify pairs
of sub-groups where these differences exist. The fifth relationship, between per-
ceived behavioural control and behaviour, however, showed no effect of this mod-
erator indicating that the correlation between control and behaviour was similar
regardless of screening type. Moderators do not always moderate effect sizes!
In each of the significant analyses, you see a difference between the heterogene-
ity statistic (Q) for the overall effect size (i.e. the attitude–intention relationship
across all screening studies) and the moderator results (i.e. the attitude–intention
relationship computed separately for cervical smear studies, colorectal cancer,
genetic test, health check, mammography, and prenatal studies). For example, over-
all, heterogeneity for the attitude–intention relationship was chi-square = 737.96;
after coding studies into type of screening test, chi-square reduced to 345.32. This
suggests that grouping effect sizes by type of screening test has accounted for some
of the heterogeneity between effect sizes. However, you are not formally testing
anything using this method, so be wary about the claims you make. Formal tests of
effects of moderators exist and are discussed in the ‘Testing Multiple Moderators
Simultaneously’ section.
142 12 Moderator (Sub-group) Analyses
The other thing to note regards the non-significant moderation for the perceived
behavioural control–behaviour relationship. Heterogeneity between studies in the
overall meta for this effect size was significant but also the lowest of the five rela-
tionships (chi-square = 58.13); in the moderated analysis, the value was non-
significant chi-square = 6.86. Two things to comment on. First, because the overall
heterogeneity was lower for this relationship, relative to the other relationships,
there was less heterogeneity to explain in absolute terms. Second, even though there
was less heterogeneity, type of screening test did not appear to account for this het-
erogeneity, and as a result, this moderator did not help explain the overall heteroge-
neity for this relationship. When this happens, it’s reasonable to look at alternative
moderators to explain overall heterogeneity.
When you have a continuous moderator variable, like time interval between mea-
surements, you can either transform it into categorical variable like Sheeran and
Orbell (1998) did by coding studies based on a median split for the intervals, or use
a technique called meta-regression to plot the effects of the moderator on the effect
size. In Cooke et al. (2023), we found evidence that time interval was a significant
moderator of effect sizes, such that studies with shorter intervals between measures
showed larger effect size differences, although this effect was small. In
Comprehensive Meta-Analysis, you can generate meta-regression plots to visualise
these effects.
In Sheeran and Orbell (1998), age and time interval were both significant modera-
tors of the overall effect size, which begs the question of whether the factors might
interact with one another. They dealt with this issue by creating sub-sub-groups by
crossing the levels of each factor with one another: that is, adolescent short-term
and long-term intervals; older short-term and long-term intervals. While an emi-
nently practical solution, this approach relies on their being sufficient papers in each
cell of this design to test differences and does not provide a formal test of each
moderator against the other moderator because you have split the factors into four
groups, reducing the power of your analysis.
An alternative approach with more statistical power is to run a mixed effects
meta-analysis, which is like a multiple regression in a primary paper. After coding
moderators as variables in your dataset (see below), you ask your software package
to test the effects of each moderator simultaneously. For example, in Cooke et al.
(2023), we had three significant moderators: time frame (continuous); sample type
(dichotomous); mode of delivery (dichotomous). The mixed-effect meta-analysis
found that sample type and time interval were both significant moderators and that
the heterogeneity in the (overall) effect size reduced to non-significance in the
How to Perform Moderator Analyses as Part of a Meta-Analysis 143
presence of these moderators. In other words, when we accounted for the effects of
time interval, sample type, and mode of delivery, the overall effect size was now
homogenous. This suggests that these moderators provide important clues as to why
effect sizes differed in the overall analysis. This analysis generates questions for
future research questions too.
One of the reasons I think that mode of delivery was not a significant moderator in
the mixed effect meta-analysis is that it was likely confounded with sample type;
almost all our university samples received their intervention via online mode of
delivery, while all of our community samples received their intervention via paper
mode of delivery. Hence, sample type and mode of delivery were confounded
because we cannot disentangle the effect of sample from the mode of delivery. If we
had been able to identify studies that delivered interventions to community samples
online, along with more studies delivering interventions via paper to university sam-
ples, we might have been able to disentangle the effects of the two moderators, but
we would have been in a similar position to Paschal and Sheena in needing enough
studies for each cell of the comparison.
I think the simplest solution to confounded moderator variables is to conduct
primary studies to test the factors experimentally. Remember that in meta-analysis
we are using the endpoint of a set of research studies, their findings to pool results
together. When we find heterogeneity in effect sizes, and that methodological or
sample variables as putative moderators of this heterogeneity, it can mean a lack of
research attention to an issue, such as a lack of studies delivering implementation
intentions interventions to community samples online. While we could wait for
more studies to address this issue in a future meta-analysis, it strikes me that a better
idea would be to run those studies using the meta-analysis to guide our research
plans. Having thoroughly discussed moderator analyses, I’ll now discuss ways to
run moderator analyses in software packages.
The first step in any moderator analysis comes during data extraction. When extract-
ing key information from study characteristics, for example, author names, country
of study, demographics, you also code included studies for moderator variables.
I’ll give two examples of how I did this for my meta-analyses:
The next stage is to create variables to add the information into your dataset.
Create a new variable, give it a name that reflects what it represents and decide what
type of variable it is going to be: dichotomous; categorical; continuous. Some pro-
grammes, including Comprehensive Meta-Analysis and metafor, are happy for you
to enter either text, to represent the categories, or numbers that represent groups.
Continuous moderators must be entered as numerical information. Jamovi prefers
numerical values.
Before I go any further, I must admit that although MAJOR is a great package for
introducing meta-analysis, it is limited when it comes to running moderator analy-
ses. It does not offer as much flexibility in moderator analyses as either
Comprehensive Meta Analysis (which I used in Cooke et al., 2016) or metafor
(which I used in Cooke et al., 2023). Currently, MAJOR allows you to test the effect
of a single moderator on your overall effect size and this variable can be either cat-
egorical (including dichotomous variables) or continuous. However, there does not
seem to be any way to run separate meta-analyses by level of categorical moderator,
as it is possible to do in Comprehensive Meta Analysis and metafor, other than cre-
ating separate datasets based on the levels of the moderator, that is, a dataset con-
taining data for only the community samples or only the university samples. As a
result, it’s hard to recommend using MAJOR to test moderation in your meta-
analysis currently. I will run a basic analysis to show you what is possible.
We’re going to use the dataset from Chap. 9 as our example because results for this
meta-analysis indicated significant heterogeneity. In brief, we found an overall cor-
relation that was large-sized (r+ = 0.55), with evidence of high heterogeneity
I2 = 91.04, Q = 76.88, p < 0.001. We can test the impact of a moderator on the I2 and
Q values by entering it into the moderator box in the meta-analysis in
MAJOR. Reductions in these values imply that by coding studies using a modera-
tor, we have accounted for some of the heterogeneity between studies in the effect
size. I’ve added the moderator “Time Interval” to the dataset, (see Table 12.1). Copy
this information into your dataset and then add the variable to the Moderator box in
your meta-analysis. Your output will look like Fig. 12.1:
The output shows the effect of time interval as a moderator of correlations
between measures of drinking intentions and drinking behaviour (1 = 6-month gap
between measures; 2 = 3–5-month gap; 3 = less than a 3-month gap) in the top table,
which is a significant result. The bottom table shows the heterogeneity statistics
covered at the beginning of the chapter: Q, I2, and Tau2. Relative to the meta-analysis
Some Cautionary Notes About Moderator Analyses 145
Table 12.1 Correlations between drinking intentions and behaviour with sample sizes and time
interval as a moderator
Study authors + year Correlation (r) Sample size (N) Time interval
Arking and Jones (2010) 0.25 100 1
Biggs and Smith (2002) 0.54 200 2
Cole et al. (2015) 0.45 2000 2
David et al. (2018) 0.35 150 1
Erasmus et al. (2009) 0.70 75 2
Feely and Touchy (2007) 0.65 400 3
Gent et al. (2020) 0.30 475 1
Horseham and Smooth (2021) 0.40 150 2
Illy et al. (2013) 0.60 125 3
Jacobi and Jordan (2014) 0.65 50 3
without the moderator, Q and I2 values have reduced, which suggests that some of
the heterogeneity between studies has been accounted for. The Q value is no longer
significant, but recall that we can only infer a meaningful result from a significant Q
value, not from a non-significant value, so caution is urged here.
I want to end this chapter with a section covering a few issues for you to consider
when running moderator analyses. A key issue is determining how many papers you
need to run a moderator analysis. As Hagger (2022) succinctly puts it, the answer to
the question of the minimum number of papers needed for any meta-analysis is
simple, it’s two! As Martin goes on to say, however, unless these two studies have
146 12 Moderator (Sub-group) Analyses
used robust study designs, validated measures, et cetera, you can debate the value in
pooling their results together. Even having more studies is not enough for some
journal editors and reviewers; Cooke and French (2008), which contains meta-
analyses based on K = 33 tests of attitude–intention and K = 19 tests of the inten-
tion–behaviour relationship, was labelled as ‘premature’ by a reviewer from the first
journal we submitted it to, while we had to delay re-submission of Cooke et al.
(2023) after rejection by the first journal having been told we needed more studies.
As there is no agreed-upon number of studies that is sufficient for a meta-analysis
(see Chap. 2), you need to be cautious when running moderator analyses. In mod-
erator analyses, you are splitting your sample of studies into sub-groups, which
often produces even smaller sets of studies to meta-analyse. While there’s no hard
and fast rules about the numbers of studies needed for each level of your moderator
variable, it’s advisable to only run moderator analyses when you have more than
two studies for each level and I feel more comfortable having more than this number
of studies, usually four or five per level. In Cooke et al. (2023), we decided not to
include self-affirmation implementation intention studies in our test of intervention
type because we only had two of these studies, nor mental contrasting implementa-
tion intentions as we had only one study. We were wary about inferring too much
from meta-analyses based on such numbers of studies. Ideally, you would have at
least five studies for each level of your moderator, but the more the better, as we
know that Q statistics are sensitive to the number of studies included.
Another issue to be aware of is that it is not always possible to code all included
studies for every moderator. For example, in Cooke and French (2008), a paper on
tuberculosis inoculation did not clearly fit within the categories for type of screen-
ing test moderator. Consequently, we left this paper out of the moderator analyses,
which is a reason to be careful when comparing heterogeneity levels for overall
analyses (with all studies included) with moderator analyses, where studies may be
excluded. A further consequence of not being able to code all studies is that you will
not always be able to include all moderators in your mixed-effects model. In Cooke
et al. (2023), we only tested three of the four moderators in the mixed model because
there were not enough self-affirmation implementation intention studies to include
type of intervention in the analysis. This is akin to missing data for a variable in a
primary analysis; if you have missing data for an individual on one variable, then
that individual’s data is excluded from the analysis. In meta-analysis, the same logic
applies, but it is at the level of the study rather than the individual.
Summary
The aim of this chapter was to discuss running moderator analyses to address het-
erogeneity between effect sizes in meta-analysis. In the next chapter, we’ll discuss
how publication bias can affect meta-analytic results.
References 147
References
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interac-
tions. SAGE.
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2021). Introduction to meta-anal-
ysis (2nd ed.). Wiley.
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., & French, D. P. (2008). How well do the theory of reasoned action and theory of
planned behaviour predict intentions and attendance at screening programmes? A meta-analy-
sis. Psychology & Health, 23(7), 745–765. https://doi.org/10.1080/08870440701544437
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
Cooke, R., & Sheeran, P. (2004). Moderation of cognition-intention and cognition-behav-
iour relations: A meta-analysis of properties of variables from the theory of planned
behaviour. The British Journal of Social Psychology, 43(Pt 2), 159–186. https://doi.
org/10.1348/0144666041501688
Cooke, R., & Sheeran, P. (2013). Properties of intention: Component structure and consequences
for behavior, information processing, and resistance. Journal of Applied Social Psychology,
43(4), 749–760. https://doi.org/10.1111/jasp.12003
Fisher, R. A. (1921). On the probable error of a coefficient of correlation deduced from a small
sample. Metron, 1, 3–32.
Hagger, M. S. (2022). Meta-analysis. International Review of Sport and Exercise Psychology,
15(1), 120–151. https://doi.org/10.1080/1750984X.2021.1966824
Higgins, J. P. T. (2003). Measuring inconsistency in meta-analyses. BMJ, 327(7414), 557–560.
https://doi.org/10.1136/bmj.327.7414.557
Sheeran, P., & Orbell, S. (1998). Do intentions predict condom use? Meta-analysis and examina-
tion of six moderator variables. British Journal of Social Psychology, 37(2), 231–250. https://
doi.org/10.1111/j.2044-8309.1998.tb01167.x
Publication Bias
13
to go into further detail about the concept of publication bias, we are covering this
topic in the book because meta-analysis can play a role in identifying publication
bias within research literatures.
The main reason I run meta-analyses is to find a precise estimate of an effect size
I’m interested in to allow me to answer research questions like “What is the direc-
tion and magnitude of the correlation between drinking intentions and drinking
behaviour?” I want to know the answer to this question for various reasons, includ-
ing to (1) know if the theoretical ideas I’m researching are supported across multi-
ple studies; (2) be able to compare that effect size with effect sizes for other
relationships; and (3) be able to estimate the sample size I need in future studies
testing this effect size.
Because publication bias has the potential to undermine confidence in the results
of meta-analyses, software packages for meta-analysis include statistical and graph-
ical methods to help identify this bias. I covered these issues briefly in Chaps. 9 and
10 when discussing the output generated by MAJOR. In this chapter, I’ll elaborate
on what measures of publication bias mean in the next few pages.
Fail-Safe N Values
Rosenthal (1979) was one of the first researchers to draw attention to publication
bias, coining the phrase ‘the file-drawer problem’ to cover the situation where non-
significant results end up in a file-drawer. He also proposed a statistical test to esti-
mate the extent of publication bias in a literature: Fail-Safe N values tell you the
number of studies you would need to find, in addition to those you have included in
your meta-analysis, that report null effect sizes (i.e. r = 0.00; d = 0.00) to undermine
confidence in your results. I use examples to illustrate how Fail-Safe N works.
Let’s imagine Terry locates 15 studies testing the correlation between physical
activity attitudes and physical activity intentions. They meta-analyse the correla-
tions generating a sample-weighted average correlation of r+ = 0.35. Their
Publication Bias Assessment table tells them they have a Fail-Safe N value of 100.
This means that to undermine Terry’s confidence in their results they would need to
find 100 studies all with null correlations between attitudes and intention, in addi-
tion to the 15 studies they found, to undermine confidence that there is a significant
correlation of the magnitude reported above. It’s like asking ‘How many studies that
show no effect would I need to find to bring my overall effect size down to zero?’ In
this case, it would be 100 null studies. It is important to note that you interpret val-
ues relative to the number of studies you found to add context. If Terry finds 15
Statistics Used to Identify Publication Bias in Meta-Analysis 151
studies, it is unlikely he has missed 100 additional studies, and, that these all show
null findings.
Alternatively, let’s imagine Alex is interested in the difference between educa-
tional intervention and control groups on mathematical reasoning. They find 25
studies produce a pooled effect size difference of d+ = 0.55, favouring the interven-
tion group participants. Their Publication Bias Assessment table gives a Fail-Safe N
value of 250. As in Terry’s example, the same logic applies ‘How many studies, all
showing null effect size differences, would you need to find to bring my overall
effect size down to zero?” In this case it is 250 studies. This gives Alex confidence
in their findings; having systematically searched the literature and found 25 relevant
studies, it’s unlikely that they’ve failed to locate 250 null findings. This means they
can be confident that their results are not unduly affected by missing results.
MAJOR allows you to choose between different Fail-Safe N Methods, including
Rosenthal’s (1979) and Orwin’s (1983). Rosenthal’s method is the default in
MAJOR and he tells you how many null studies you would need to locate to reduce
your effect size to zero. This method assumes you have a significant effect size, that
could be reduced to non-significance, so, if you have a null result from your meta-
analysis, I’m not sure there’s much point in reporting Rosenthal’s statistic because
there’s no effect to reduce. Borenstein et al. (2009) note several additional weak-
nesses with Rosenthal’s method. First, it focuses on statistical significance, which
tends to be of less interest in a meta-analysis where we are more focused on the
direction and magnitude of the overall effect size (see Chap. 7). Second, the formula
assumes that all missing studies report null effects. Missing studies could in prin-
ciple report null, negative, positive effects, so this assumption is questionable at
best; just because a study finds a positive effect does not guarantee it will be pub-
lished and, alternatively, not all negative or null effects are absent from the pub-
lished literature. Finally, the test is based on combining p values across studies, a
common practice when Rosenthal published his work in 1979. It’s more common
nowadays to compute the overall effect size and then compute the p value.
Orwin’s (1983) method tells you how many studies you need to reduce your
effect size difference’s magnitude from one category to another, that is, from a large
effect size to a medium effect size. When evaluating interventions, such statistics
can be used to guide as to how much confidence we should give our results. A large
Orwin value serves to tell us that it would take a lot of null results to shift our view
that we have, across studies, a medium effect size difference; not impossible, but not
likely either. In contrast, a small Orwin value would suggest that our effect size is
not as stable as we might like. An important consideration with Orwin’s value is
what magnitude of effect size are you starting with? In my experience, meta-
analyses of correlations are more likely to be large or medium-sized than meta-
analyses of effect size differences, which are more likely to be medium or
small-sized. I think this reflects a simple idea; that it’s easier to find evidence for a
correlation than show an experiment/intervention changes an outcome.
In my opinion, fail safe N statistics have had their day. Most textbooks about
meta-analysis, like Borenstein et al.’s (2021), have moved on to report other statis-
tics (see below); I had to dig out information from Borenstein et al. (2009) to
152 13 Publication Bias
complete this section! In addition, Simonsohn et al. (2014b) argue that researchers
may not place entire studies in the file-drawer, instead placing sub-sets of analyses
that produce non-significant results, that is, p-hacking. This means estimates of pub-
lication bias based on fail-safe n values are unlikely to represent how researchers
engage in attempts to overcome a lack of statistical significance. In Chap. 14, I will
introduce Simonsohn et al.’s p-curve analysis as a method to account for p-hacking.
MAJOR outputs two statistics that quantify the relationship between sample size
and effect size. Begg and Mazumdar’s (1994) Rank correlation test computes the
rank correlation between effect size and the standard error, while Egger et al. (1997)
proposed a regression test that likewise tests the size of the relationship between
effect size and standard error. Both tests are interpreted the same way; a significant
result suggests there is a relationship between effect size and standard error, hinting
at publication bias. Because larger standard errors indicate smaller sample sizes a
positive correlation between standard error and effect size indicates that studies
with smaller sample sizes are finding larger effect sizes. I find it useful to consult
these statistics when viewing funnel plots.
Funnel Plots
A funnel plot depicts effect size on the x axis and standard error on the y axis for a
set of studies included in a meta-analysis. Standard errors represent the amount of
dispersion a data point (like an individual effect size) has around the population
mean. In meta-analysis, we have effectively generated a population mean by creat-
ing a sample-average correlation or effect size difference, so the standard error for
each study tells us how much that each effect size differs from the overall effect
size. Standard errors are influenced by the sample size; they get smaller the larger
the sample size. This is because statistics from a study with a larger sample size are
more representative of the population effect size (see Chap. 3), because the sample
is closer to the total population sample, and, as a result, this reduces the standard
error for studies with larger samples relative to those with smaller samples. In meta-
analysis, this plays out in the form of weighting; studies with larger samples are
weighted more relative to studies with smaller samples. This means that studies
with larger samples necessarily have smaller standard errors relative to the overall
effect size—their influence is greater on the overall effect size which means their
standard error is smaller compared to studies with less influence/weighting.
A funnel plot can help to identify publication bias because it allows you to see if
there is (a) a roughly even number of studies with effect sizes above and below the
overall effect size (see Fig. 10.9 for an example of this) and (b) if there are any
‘missing’ studies, especially at the bottom of the plot where standard errors are
Using Duval and Tweedie’s Trim and Fill Method to Adjust the Overall Effect Size… 153
greatest (and sample sizes are smallest). In a literature that is not especially affected
by publication bias we would expect there to be similar numbers of studies above
and below the overall effect size, which is, an average of all the effect sizes we
included in the meta-analysis. This does not necessarily mean we should expect that
there will be a full range of positive, negative, and null values, because our set of
included studies represents a random selection of all possible tests that could be run.
As mentioned later in the chapter, meta-analyses of correlations frequently report a
range of positive effect sizes, but that does not necessarily mean there is publication
bias, just that the range of possible values is bounded, which may reflect limitations
in methods used to conduct correlational studies or that there really is a positive
relationship between the variables. I would say that it’s more likely you’ll get a
range of positive, negative, and null effect size differences, but even this is not guar-
anteed and a lack of one kind of effect size does not always indicate publication bias.
A better indicator of publication bias is where you get asymmetry between effect
sizes at the bottom of your plot. Recall that the y axis is the standard error of the
effect sizes and that studies with smaller sample sizes have larger standard errors.
Now if you have a funnel plot where you get a cluster of studies with large standard
errors (i.e. small sample sizes) that ALL report positive effects, and your plot lacks
an equivalent cluster of studies with large standard errors that ALL report negative
(null) effect sizes, then you may have evidence for publication bias. This pattern
sounds like publication bias; studies with large, positive effect sizes are published,
while studies with smaller positive, negative, or null effect sizes are missing from
the published literature. This idea has been taken even further by Uri Simonsohn
and colleagues who have created software to run p curve analysis, which specifi-
cally looks for studies that have only just reached conventional thresholds of signifi-
cance, like p = 0.04. I’ll talk more about this in Chap. 14.
Imagine that you have a funnel plot which has four studies at the bottom of the fun-
nel plot that are all to the right of the vertical line (the overall effect size). Further
imagine that there are no equivalent studies that fall to the left of the vertical line
near the bottom of the plot. This pattern, studies with larger positive effects being
present, studies with smaller positive (or negative or null effects) being absent, sug-
gests potential publication bias, based on ‘missing’ studies we would expect if pub-
lication bias either did not exist or was less influential in a literature. Duval and
Tweedie (2000a, 2000b) proposed a method to adjust the overall effect size for
‘missing studies’—the trim and fill method. It’s called trim and fill because the
method first trims the overall effect size in the same way you can trim a mean by
excluding extreme values. This outputs an adjusted estimate of the overall effect
size with the extreme values trimmed. The final step of the method is to fill in the
‘missing’ studies by adding them to the plot. This can be done in several software
packages—my experience of using the technique came when using Comprehensive
154 13 Publication Bias
Meta-Analysis to complete Cooke et al. (2016). You can also see the overall effect
size adjusted once the filled studies have been added. However, Simonsohn et al.
(2014a) argue that this method has several flaws that make it less effective than
p-curve analysis, which we will discuss in Chap. 14.
It’s one thing to know there is an issue and another to do something about it. In the
previous section, I talked a lot about identifying publication bias but not so much
about addressing it. One option is to make statistical adjustments to your overall
effect size. The Duval and Tweedie missing studies value does that for you. Applying
this correction has the effect of adjusting the overall effect size, although it may not
change the results too much. While this method is fine for adjusting the effect size
after completing a meta-analysis, there are more direct approaches to publication
bias that the meta-analyst can take that do not involve statistical corrections.
Typically, these take place during your systematic review.
A simple way to address publication bias when running a meta-analysis is to
include unpublished results. Unpublished results tend to be called ‘grey literature’,
reflecting a degree of uncertainty about them. Depending on the resources you have
available to search for studies, it is well worth considering searching the grey litera-
ture. I always tell those I work with on a meta-analysis to check the EThOS data-
base, which is the repository for UK PhD theses. This reflects my experience as a
PhD student where I identified one study that was never written up for publication
but did appear in an American PhD thesis. It’s easy to search EThOS so it’s worth
considering doing as the standard of work in PhD theses is generally high and there
are times when students do not have the time or energy to publish results, especially
negative or null results.
Other sources of grey literature include (a) mailshots to memberships of aca-
demic bodies, (b) government or charity reports, and (c) the open science frame-
work and pre-print servers. For the last few meta-analyses I’ve run, I have emailed
academic bodies, in my case the Division of Health Psychology, European Health
Psychology Society, and UK Society for Behavioural Medicine, to request unpub-
lished findings on the topic I am searching for. Responses to these emails tend to be
unpredictable. Sometimes, researchers will send you papers they are working on or
that are under review, other times you get nothing in response. Given the limited
amount of effort required to send a few emails, it’s probably worth doing this if you
want to identify unpublished studies.
In my experience, there’s not been much point in me including data from govern-
ment or charity reports when running a meta-analysis. This is not to say that these
reports are not useful, just that they rarely report correlations between variables I am
interested in or effect size differences in an outcome. Such statistics tend not to be
the focus of these reports, which are more likely to report effects over time, like
changes in the frequency of people reporting a behaviour, and the reports also often
report percentage values that are hard for me to do anything with in a meta-analysis.
Why It’s Important to Publish Null and Negative Effect Sizes 155
I’m going to finish this chapter with an example of the importance of publishing
null and negative effect sizes from one of my meta-analyses, Cooke et al. (2016).
One of the very best things about meta-analysis is its ability to confound your theo-
ries, hunches, and expectations. Prior to running the meta-analyses for Cooke et al.
(2016), I’d only ever found positive relationships between variables within the the-
ory of planned behaviour. Although not explicitly mentioned by Ajzen (1991), it
seemed to me that we should expect positive relationships between variables and
outcomes, and this is what we typically found. For most of my meta-analyses for
Cooke et al. (2016), we again found positive relationships between attitudes and
intentions, intentions, and behaviour, et cetera. There was one rogue finding where
self-efficacy had a negative relationship with drinking intentions, but that was quite
easy to explain away as that study had recruited an adolescent sample, most of who
were not drinking alcohol yet, so we should perhaps be expecting a negative rela-
tionship especially as all the other studies for self-efficacy and intentions reported
positive relationships.
However, when it came time to run the forest plots for the relationships between
perceived control and intentions and separately perceived control and drinking
behaviour, it became apparent that you don’t always find positive relationships!
Forest plots showed a real mixed-bag of results, including positive, negative, and
null correlations. Overall, the sample-weighted average correlations were null, sug-
gesting that perceived control correlated with neither intentions nor drinking
behaviour.
Such results presented a challenge to the theory in that we found no evidence for
either relationship, which contrasted with other meta-analyses of testing these rela-
tionships. Running these analyses made me go back and look at the literature for
perceived control and alcohol and it dawned on me that this was not a straightfor-
ward relationship. The first paper I read on perceived control and alcohol by Norman
et al. (1998) should have alerted me to the idea that a lack of control (i.e. a negative
relationship between control and drinking behaviour) can underlie drinking, because
that is exactly what Paul and his colleagues said in their paper. There’s also a nice
156 13 Publication Bias
paper by Schlegel et al. (1992) which showed that perceived control was an impor-
tant predictor of drinking among those with an alcohol use disorder, for whom
intentions were not significantly associated with drinking, but perceived control did
not predict drinking among those without an alcohol use disorder, matching results
from our meta-analysis.
Later, I worked with Mark Burgess and Emma Davies looking at results from an
open-ended survey of English drinkers (Burgess et al., 2019). Although not specifi-
cally focused on control, we found evidence that while many of our sample wanted
to remain in control when drinking, a minority reported drinking to get out of con-
trol. Of course, one should not get too far ahead of oneself based on results of one
study, but it remains the case that without those unusual negative and null findings
in the meta-analysis, I might have struggled to explain results in Burgess et al.
(2019). Negative and null findings often tell us more than we realise. It’s time we
valued them more.
Summary
The aim of this chapter was to discuss methods to identify and address publication
bias. In the next chapter, I will discuss extensions to meta-analysis to help you
develop your knowledge and expertise further.
References
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision
Processes, 50, 179–211. https://doi.org/10.1016/0749-5978(91)90020-T
Begg, C. B., & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for pub-
lication bias. Biometrics, 50, 1088–1101.
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (Eds.). (2009). Introduction to meta-
analysis (1st ed.). Wiley.
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2021). Introduction to meta-anal-
ysis (2nd ed.). Wiley.
Burgess, M., Cooke, R., & Davies, E. L. (2019). My own personal hell: Approaching and exceeding
thresholds of too much alcohol. Psychology & Health, 1–19. https://doi.org/10.1080/0887044
6.2019.1616087
Chambers, C. (2017). The seven deadly sins of psychology: A manifesto for reforming the culture
of scientific practice. Princeton University Press.
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Duval, S., & Tweedie, R. (2000a). A nonparametric “trim and fill” method of accounting for publi-
cation bias in meta-analysis. Journal of the American Statistical Association, 95, 89–98.
Duval, S., & Tweedie, R. (2000b). Trim and fill: A simple funnel-plot-based method of testing and
adjusting for publication bias in meta-analysis. Biometrics, 56, 455–463.
Egger, M., Higgins, J. P. T., & Smith, D. (1997). Bias in meta-analysis detected by a simple,
graphical test. BMJ, 315, 629–634.
References 157
Norman, P., Bennett, P., & Lewis, H. (1998). Understanding binge drinking among young peo-
ple: An application of the Theory of Planned Behaviour. Health Education Research, 13(2),
163–169. https://doi.org/10.1093/her/13.2.163-a
Orwin, R. G. (1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics,
8(2), 157–159. https://doi.org/10.3102/10769986008002157
Rosenthal, R. R. (1979). The File drawer problem and tolerance for null results. Psychological
Bulletin, 86, 638–641.
Schlegel, R. P., DAvernas, J. R., Zanna, M. P., DeCourville, N. H., & Manske, S. R. (1992).
Problem Drinking: A Problem for the Theory of Reasoned Action?1. Journal of Applied Social
Psychology, 22(5), 358–385. https://doi.org/10.1111/j.1559-1816.1992.tb01545.x
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014a). p-curve and effect size: Correcting for
publication bias using only significant results. Perspectives on Psychological Science, 9(6),
666–681. https://doi.org/10.1177/1745691614553988
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014b). P-curve: A key to the file-drawer. Journal
of Experimental Psychology: General, 143(2), 534–547. https://doi.org/10.1037/a0033242
Further Methods for Meta-Analysis
14
Extensions of Meta-analysis
In Chap. 13, we covered statistics that can be used to estimate publication bias:
Rosenthal and Orwin’s Fail-Safe N values, Begg and Mazmuder’s Rank Correlation,
Egger’s regression test and Duval and Tweedie’s trim and fill method. Such statis-
tics are commonly reported in meta-analytic papers to inform the reader about the
extent of publication bias in studies included in the meta-analysis. However,
Simonsohn et al. (2014b) argue that because such statistics only address the issue
where non-significant results are put away in the metaphorical file drawer they tell
us little about what happens when researchers engage in p-hacking.
Imagine three independent studies on the same topic. Study (a) reports a signifi-
cant positive effect size where p = 0.01, Study (b) reports a positive effect size
where p = 0.06, not quite meeting the threshold for statistical significance, Study (c)
reports a non-significant positive effect size, where p = 0.40. Study (a) is most likely
to be submitted for publication, while study (c) is probably going to find itself in the
file drawer or uploaded to the Open Science Framework. Study (b), however, is at
risk of p-hacking because results for the effect size are close to the magic p < 0.05,
the research team might decide to identify some outliers, whose removal changes
the significance level, or collect more data until they get p < 0.05. So, the team
behind study (b) might end up with sub-sets of data analyses—those that show non-
significance are filed, those that show significance are submitted for publication.
Simonsohn et al. argue that p-hacking affects the accuracy of traditional methods
of testing for publication bias because research teams may have few ‘failed’ studies,
those showing non-significance, but multiple analyses of these failed studies. So,
significant results might not have been significant without p hacking. I believe that
psychology is a discipline at high risk of p-hacking because most of the time psy-
chologists complete statistical analyses themselves (see Chap. 6 for a discussion
about detection bias).
P-curve analysis can help identify issues of publication bias by assessing the
distribution of statistically significant p values for a set of independent findings—
like a set of studies included in a meta-analysis. The first thing to draw your atten-
tion to is that this analysis does NOT focus on non-significant results. Published,
non-significant results are not obvious examples of publication bias! Instead,
p-curve analysis looks at the p values for all the significant studies and determines
how likely this set of studies suffers from publication bias.
The way p-curve analysis works is breathtakingly simple. When you have a lit-
erature without obvious publication bias, you should have a greater number of stud-
ies that report more significant results—for example, results where p is 0.01, 0.02,
for example. Such results are hard to achieve by p-hacking and more likely repre-
sent a true effect. In contrast, a literature containing publication bias, should have
multiple studies that report p values close to the threshold of p < 0.05, for example,
p = 0.04, p = 0.035. Effect sizes with these levels of significance are more likely to
be the result of p-hacking, because it is (relatively) easy to p-hack a result to achieve
a p value in this range. This is not to say all results where p = 0.04 are the result of
p-hacking, just that it is probably easier to achieve a value of p = 0.04 by p-hacking
A Special Type of Moderator Analysis—meta-CART 161
relative to a p value of 0.01. P-curve analysis plots the number of p values for a set
of studies and outputs results telling you if most of your significant findings have
low or high p values.
To run a p-curve analysis based on studies you include in your meta-analysis,
you don’t need to extract the p values from your primary papers. All you need is the
effect size (r, d) and the degrees of freedom. The analysis takes advantage of the
degrees of freedom (effectively telling the analysis what the sample size is) to deter-
mine the significance of the effect in the time-honoured tradition of looking up a
value at the back of your statistics textbook. So, a correlation of r = 0.89 will be
significant (at p < 0.05) with a smaller sample size than a correlation of r = 0.15,
because the first correlation is much larger than the second one. This follows on
from a simple principle that we already know; it’s easier to find a significant effect
with a larger sample.
At www.p-curve.com there is an app that will let you calculate p-curves. Enter
the individual effect sizes from your meta, plus the degrees of freedom (in brackets)
and that’s it! The output for a set of studies with no evidence of publication bias
should be skewed so that most of the p values are near the 0.01, 0.02 end of the x
axis, whereas one where there is evidence of publication bias will be skewed so
most values are near the 0.04 end of the axis. The output also tells you how many
included studies meet p < 0.05.
Simonsohn and colleagues have published further papers on p-curve analysis,
which I recommend reading if you want to know more. I especially like Simonsohn
et al. (2014a) because it shows the limitations of the trim and fill method to identify
publication bias, using results from a meta-analysis. Compared to some of the
extensions discussed in this chapter, it really is easy to run p-curve analyses follow-
ing a meta-analysis, so I would encourage you to do so.
Newby et al. (2021) report results of a systematic review and meta-analysis I co-
authored with a team of researchers based at Coventry University, led by Dr Katie
Newby, Professor Katherine Brown, and Dr Stefanie Williams. The primary goal of
the meta-analysis was to estimate the effect of receiving a digital intervention on
participants’ self-efficacy, that is, their confidence that they can perform a behaviour
(Bandura, 1977).
This meta-analysis was broader than meta-analyses that I first author, as we
tested the effects of digital interventions aiming to increase self-efficacy across five
health behaviours: alcohol consumption, dietary behaviours, physical activity, safe
sex, and smoking. The team was keen to see if interventions using different behav-
iour change techniques, that is, methods to promote behaviour change such as goal-
setting, social support, et cetera, were more/less effective at increasing self-efficacy.
A further question was if clusters of these techniques, when delivered in the same
intervention, would work together to amplify, or attenuate the effects on
self-efficacy.
162 14 Further Methods for Meta-Analysis
To address this interesting question, Katie invited Xinru Li and Elise Dusseldorp
based at Leiden University in the Netherlands to join the team. Xinru and Elise are
pioneers in the field of meta-CART (Classification And Regression Trees) who had
already published a neat paper where they used data from Susan Michie’s meta-
analysis on BCTs (Michie et al., 2009) to show the value of meta-CART (Dusseldorp
et al., 2014).
Essentially, meta-CART acts as special type of moderator (sub-group) analysis.
It seeks to build models that partition studies from a meta-analysis into clusters that
cohere, for example, do all the studies that use a particular behaviour change tech-
nique, like implementation intentions, have similar effects? It is a form of machine
learning that seeks a parsimonious solution to heterogeneity in data. The idea is, can
we reduce the heterogeneity in our meta-analysis effect sizes by partitioning studies
together? meta-CART is a two-stage process: a classification and regression tree
model is built before running a mixed effects meta-analysis (see Chap. 12). The
classification aspect of CART reflects categorical factors, whereas the regression
aspect covers continuous variables.
We did not find any evidence for clusters of BCTs leading to more or less effec-
tive interventions to change self-efficacy in Newby et al. (2021). However, this
analysis was based on only 20 studies. A paper by Xinru (Li et al., 2017) clearly
showed that meta-CART produced its best results when based on a sample of 120
studies; Michie et al.’s (2009) paper which included over 100 studies. The 20 studies
we had were not enough to make the most of this technique. As the number of
research studies within a discipline grows, and especially where you have studies
reporting results using extremely popular methods like the Behaviour Change
Taxonomy, the potential to use meta-CART will increase.
Borenstein et al. (2009) note that reporting results from a meta-analysis typically
involves focusing on the overall effect size and the confidence intervals about this
effect. The overall effect size estimates the magnitude of effect found across
included studies while the confidence intervals tell us the precision of this estimate.
However, neither statistic tells us how the ‘true’ effects for each included study are
distributed about the overall effect.
When running a fixed-effect meta, this question does not matter because we
assume all included studies have the same ‘true’ effect size which is measured with
more or less precision based on sample size. In a random-effects meta-analysis,
however, this logic does not hold; recall from Chap. 11 that a random-effects meta-
analysis assumes the ‘true’ effect size might be different for each included study due
to a host of factors. A consequence of this assumption is that we need to think about
how the true effects are distributed about the overall effect size computed in a meta-
analysis. To address this issue, we need to talk about prediction intervals.
A Method to Estimate Variability Among ‘True’ Effect Sizes in a Random-Effects… 163
A prediction interval is the interval within which a score will fall in a distribution
if we select a case at random from the population the distribution is based on. Like
confidence intervals, prediction intervals can be calculated with reference to the
percentage of the distribution the value should fall within. So, a prediction interval
of 95% would mean the value selected would fall in the interval 95% of the time; a
99% interval, the value would fall 99% of the time.
MAJOR allows you to add prediction intervals to the forest plots in your meta-
analysis output. Figures 14.1 and 14.2 show the forest plots from Chaps. 9 and 10
with prediction intervals added as lines either side of the diamond at the bottom of
the plot. In Fig. 14.1, the prediction intervals range from 0.15 to 0.94 which means
95% of values for the ‘true’ effect (correlation) between drinking intentions and
drinking behaviour across studies will fall between a correlation of r = 0.15 and
r = 0.94. Our overall effect size of r = 0.55 is the sample-weighted average based on
observations (data) from the ten included studies, with the confidence intervals
reflecting error in measurement of this mean. The prediction intervals tell us that we
should expect the ‘true’ effect of drinking intentions on drinking behaviour to fall
between r = 0.15 and r = 0.94, 95% of the time. A few things to note. First, the pre-
diction intervals are both in the same direction (positive), which suggests that most
‘true’ effects are likely to be positive. Second, the prediction intervals do not include
zero which means the ‘true’ effect is unlikely to be zero in many studies.
Fig. 14.1 Forest plot for meta-analysis of correlations with prediction intervals
164 14 Further Methods for Meta-Analysis
Fig. 14.2 Forest plot for meta-analysis of effect size differences with prediction intervals
In Fig. 14.2, the prediction intervals range from 0.21 to 0.79 which means that
95% of values for the ‘true’ effect (effect size difference) for screen time for those
who received versus did not receive an intervention will fall between d = 0.21 and
d = 0.79. Our overall effect size of d = 0.52 is the sample-weighted average based
on observations (data) from the ten included studies, with the confidence intervals
reflecting error in measurement of the mean. The prediction intervals tell us that we
should expect the ‘true’ effect of drinking intentions on drinking behaviour to fall
between d = 0.21 and d = 0.79, 95% of the time. As both prediction intervals are in
the same direction (positive) we should expect most ‘true’ effects to find positive
effect size differences, rather than negative or null effects.
In addition, because the intervals do not include zero, the ‘true’ effect is unlikely
to be zero in many studies.
Confidence intervals reflect error in measurement of the mean. In meta-analysis,
an overall effect size’s error in measurement is strongly tied to the number of studies
you include. If you have five studies, then you are likely to have wider confidence
intervals than if you have 50, or 500 studies looking at the same effect size. So, the
more studies you include in your meta, the narrower the confidence intervals for the
overall effect size. By contrast, prediction intervals reflect both error in measure-
ment of the mean, which is affected by the number of studies, and variance in the
studies, represented by Tau2, which is not affected by the number of studies. So,
three meta-analyses containing five, 50 or 500 studies of the same effect size will
have similar prediction interval’s because Tau2 is not sensitive to the number of
A Method to Control for Baseline Differences in an Outcome Prior to Running… 165
studies included in the meta-analysis, whereas their confidence intervals will inevi-
tably be narrower in the set of 500 studies than the set of 50 or 5 studies. In sum, the
prediction interval is telling you the range of possible values the ‘true’ effect sizes
could take, under the assumption that the ‘true’ effect size for each study might dif-
fer. Despite being under-reported, prediction intervals should be reported routinely
given that most papers report results of random-effects meta-analyses and therefore
we need to be able to see the range of ‘true’ effects we expect there to be based on
the effect sizes we enter in our meta. The fact that it is easy to add prediction inter-
vals to your forest plot also counts in their favour.
co-author Paul Norman on this issue. Paul had come across a paper by Morris
(2008) which directly addressed the issue of baseline corrections in effect sizes. In
essence, what you do is generate mean difference values for each group, by sub-
tracting the baseline from the follow-up, subtract these values from one another
before dividing the result by the pooled standard deviation using baseline values.
Morris argues that assuming your intervention works, you should expect greater
variance in scores at follow-up than baseline (where you are assuming groups are
more similar in their scores) and goes on to demonstrate in his paper that dividing
mean values by baseline (pre-test) standard deviations is justified to generate your
effect size difference.
I must acknowledge there are challenges involved in running a meta-analysis of
effect size differences adopting Morris’ approach. Using Morris’ formula meant I
had to go through an extra step of re-calculating effect size differences before run-
ning the meta-analysis and the method had other disadvantages, including exclusion
of papers from meta-analysis that had not reported baseline statistics for each group.
Unless you have statistics for both timepoints, you cannot run this analysis. There’s
also the question of whether the extra effort is justified. As mentioned above, if the
studies you are meta-analysing use high-quality study designs, and do not show
obvious differences between groups in baseline scores, it may not be appropriate to
run meta-analysis this way. Nevertheless, I report this method here as I’ve used it,
and it may be valuable for others to use it too.
extracting the intercorrelations between the predictors of intentions, that is, the cor-
relations between attitudes and subjective norms, between attitudes and perceived
behaviour control; subjective norms and perceived behavioural control. Each of
these correlations was extracted from the included studies in each of the original
meta-analyses, before being pooled using random effects meta-analysis into r+. The
values they generated are reported in Table 1 of the paper. Table 2 contains both
these correlations plus the ones already reported in the original meta-analyses, that
is, the attitude-intention correlation of r+ = 0.62 from Cooke et al. (2016) is reported.
The next step Martin and colleagues took was to input these values into path-
analytic models in MPlus for each meta-analysis, which they report in Table 3 of
their paper. At first glance, not much seems to have changed from the original meta-
analyses; in Cooke et al. (2016) we found evidence for significant attitude–inten-
tion, subjective norm intention, and intention–behaviour relationships, and limited
evidence that perceived behavioural control was associated with drinking behaviour.
The key difference is the statistic that’s being reported for each theory relation-
ship, a beta value, like you get in a multiple regression. These statistics are showing
the effects of attitudes on intention while simultaneously accounting for the effects
of subjective norms, and perceived behavioural control. Doing this increases confi-
dence in results. In Cooke et al. (2016), we reported a sample-weighted average
correlation between attitudes and intentions of r+ = 0.62. In Hagger et al. (2016),
attitudes are shown, across studies, to significantly predict intentions beta = 0.51.
So, even when the effects of other predictors, subjective norms, perceived behav-
ioural control, are included in a regression model, attitudes remain significant pre-
dictors of drinking intentions. This approach allows research teams to run secondary
analyses in the same way as we would if we were analysing a primary dataset,
which is an excellent development.
One of the few downsides of the approach reported in Hagger et al. (2016) is that
path analyses were run using MPlus, which, as a licensed software package, may
not be accessible. Happily, alternative methods for running path analyses are now
possible in R, including the MASEM method (Cheung & Hong, 2017), so path
analysis within meta-analysis is available to all.
As I have not yet run such analyses myself, I am reluctant to go further than to
recommend it when testing theory relationships in psychology. Martin has pub-
lished several papers using these methods since the one I discussed, including his
updated meta-analysis (Hagger et al., 2017) of the common-sense model of illness
and a recent meta-analysis of studies testing theory of planned behaviour relation-
ships using longitudinal designs (Hagger & Hamilton, 2023). I recommend reading
these papers to increase your knowledge further.
An important, but often overlooked, issue in meta-analysis is the need for effect
sizes to be independent of one another (see Chap. 7). Part of conducting a meta-
analysis is to ensure that each effect size you pool does not represent data from any
168 14 Further Methods for Meta-Analysis
other effect size you include and to take care when including multiple measures of
an effect size within the same analysis. Where effect sizes have dependency, by
being related to one another, including both in a meta of ostensibly independent
effect sizes is like double-counting results in an election, adding greater weight to
certain effects relative to others.
In each of the meta-analyses I have authored, I had to address issues of depen-
dency between measures, with one or more of the following occurring: (1) papers
reporting multiple studies based on recruitment of independent samples; (2) papers
reporting effect sizes for sub-samples; (3) papers reporting effect sizes for multiple
timepoints. For (1), we argued that these datasets were independent of each other;
the authors reported them as Study 1, Study 2, Study 3, and it was not obvious that
participants from any sample took part in more than one study. When extracting
data, be careful, as this can happen—one of my papers (Cooke & Sheeran, 2013)
reports three studies, the first based on the total sample we recruited, and the other
two based on sub-samples recruited from within the original sample. For (2) we
treated the sub-samples (e.g. effect sizes for men and women reported separately) as
independent. One issue with this approach is that when you pool results across all
studies, your samples lack comparability; correlations based on a sample of female
and male students might be different to correlations based on just male or just
female students. Indeed, we found some evidence to support this claim (see discus-
sion in Cooke et al., 2016).
For (3), the response is more challenging. In Cooke and French (2008), we had
papers where the correlation between variables was reported on multiple occasions
using a longitudinal design. Our options were to either include correlations for all
timepoints and use each sample size or the average of the correlations and choose
between the sample sizes. We opted for the second option, because we felt that the
first option is a clear example of dependence in results. If I have a paper that reports
effect sizes from samples that contain at least some of the same participants, it is
likely that these correlations will correlate with one another as people tend to be
consistent when answering survey measures.
Martin Hagger’s (2022) paper offers some excellent advice on how to address
dependence in meta-analysis. Martin argues against aggregation of effect sizes from
the same sample due to the likelihood that this will inflate effects. So, what I did in
Cooke and Sheeran (2004) is not recommended! I’d argue that what we did in
Cooke and French (2008), where we averaged effect sizes from multiple timepoints
into one effect size and used the smallest sample available, is less problematic.
While I believe this is better than including the effect size for each timepoint, which
are likely to be dependent, you are still reducing the power of your analysis by using
the smallest sample size. Martin discusses two alternative methods to deal with
dependency: multi-level meta-analysis and robust variance estimation.
The multi-level approach partitions variance into three components (1) sampling
error (see Chap. 11); (2) variability due to multiple effect sizes from the same study;
(3) variability due to effects from different studies. So, (2) directly addresses the
issue of dependence, modelling the effect sizes for multiple effect sizes from the
References 169
same study and correcting for variance. Robust variance estimation (Hedges et al.
2010) approximates the average effect size from a set of studies from the same lab/
research team, by applying a ‘working model’ of the dependence structure among
effects in included studies. You can run both methods at the same time as they
address different aspects of dependence.
While I have yet to use either of these methods, I think robust variance estimation
would better suit my needs as I prefer not to include multiple measures of the same
outcome within the same meta-analysis but I often include studies from the same
research team or laboratory. If you do want to include multiple effect sizes from the
same paper, the multi-level approach sounds like the best approach to take. Further
details can be found in Gucciardi et al. (Gucciardi et al., 2022).
Summary
The aim of this chapter was to introduce some extensions of meta-analysis. I do not
expect you to adopt all these techniques within your own meta-analysis but would
recommend you report prediction intervals with your results and consider running a
p-curve analysis of your effect sizes. Both are low-cost additions to your meta-
analysis paper that will improve its quality. Using other approaches, especially
meta-CART, depend on the number of studies you have, and I appreciate you may
not wish to dive headlong into running meta-level modelling or path analysis.
Ultimately, you may find that these approaches help answer your questions in
greater depth and that is why they are included. The final chapter of the book pro-
vides advice about how to write up results of your meta-analysis for submission to
a peer-review journal.
References
Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological
Review, 84(2), 191–215.
Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (Eds.). (2009). Introduction to meta-
analysis (1st ed.). Wiley.
Cheung, M. W.-L., & Hong, R. Y. (2017). Applications of meta-analytic structural equation mod-
elling in health psychology: Examples, issues, and recommendations. Health Psychology
Review, 11(3), 265–279. https://doi.org/10.1080/17437199.2017.1343678
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., & French, D. P. (2008). How well do the theory of reasoned action and theory of
planned behaviour predict intentions and attendance at screening programmes? A meta-analy-
sis. Psychology & Health, 23(7), 745–765. https://doi.org/10.1080/08870440701544437
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
170 14 Further Methods for Meta-Analysis
My academic training is unusual in that the first thing I did during my PhD was
conduct a meta-analysis, which became my first publication (Cooke & Sheeran,
2004). It was a steep learning curve, but I received lots of help from my supervisor,
Prof Paschal Sheeran, and was supported by other PhD students conducting their
own meta-analyses. In many ways, it was the best environment to learn how to write
up a meta-analysis.
Much of what you need to write in a meta-analysis paper is the same as what is
required for other papers that report the results of quantitative analyses. The main
sections, an introduction that sets the scene for the study, a method that details what
was done, a results section outlining what was found, and a discussion that sum-
marises what was found, how results compare to other studies and points the way
for future research, are all included in a meta-analytic paper. Where things differ is
what appears in each of those sections. Sometimes material is a slightly modified
form of what is required in a primary paper, other times, you report completely dif-
ferent material. I’ll go through each section of the paper with some hints and tips to
help you out when you’re writing up your meta.
Title Page
Abstract
Introduction
We argued that a limitation of McEachan et al.’s review was that they synthesised
results across substance use behaviours (alcohol, drug use, smoking) meaning
their results did not give a precise estimate of the magnitude of relationships for
alcohol studies. We further argued that because they only found five studies on
alcohol, while we had found 33 predicting drinking intentions, we could sum-
marise a wider literature.
• In Cooke et al. (2023), Malaguti et al.’s (2020) meta-analysis of implementation
intentions and substance use (alcohol, smoking) studies had been published. We
had to justify what our meta-analysis would add to the literature which we did by
noting limitations of their meta-analysis, for example, that they had pooled effect
size differences across alcohol outcomes (weekly drinking and binge drinking),
which we argued meant that it was impossible to precisely determine effect sizes
for either outcome.
In both cases, we argued that our meta would provide a more precise estimate of
effect sizes. The final thing to say about the Introduction is to always report your
review questions at the end of the section, like you would with your hypotheses or
research questions in a primary study. I like to have overall review questions and
moderator questions. This idea comes from Sheeran and Orbell (1998). If it ain’t
broke don’t fix it.
Method
The method section of a meta-analysis clearly diverges from the method section of
a primary paper reporting quantitative designs. This is obvious from the sub-
sections, which have different titles to the ones used in primary papers. For instance,
we talk about studies rather than participant characteristics, and report results of a
systematic review that informed the meta-analysis which is not often done in pri-
mary papers. This is a consequence of reporting results of a secondary analysis
based on more than one dataset. I’ll talk through the sub-sections I’ve used in the
meta-analyses I’ve published to illustrate key differences.
In all my metas, I have always had a sub-section in the method that covered how I
found relevant studies and the inclusion criteria, though it should be acknowledged
that the name I have used for this section has changed over time. Another thing that
has changed from Cooke and Sheeran (2004) to Cooke et al. (2023) is that this sub-
section has moved from a meta-analysis format to embrace a systematic review
format, where I describe how the review was conducted in greater depth. The aim is
for transparent and replicable reporting. One way to do this is to pre-register your
review on PROSPERO (or Open Science Framework) and follow PRISMA report-
ing guidelines. I’ve used the latter since Cooke et al. (2016) and the former in Cooke
174 15 Writing Up Your Meta-Analysis
et al. (2023). I recommend you do both in your meta-analysis. Within this sub-
section of the method, I often have sub-sub-sections describing different parts of
what was done.
Search strategy. I have used similar search strategies in all four of my meta-analyses;
I’ve always searched bibliographic databases using search terms and I’ve also
searched reference lists of included studies. In all papers, I’ve also sent emails
requesting unpublished or in-press papers on the review topic. In most of my
metas, this involved emailing authors of included studies to request other data
they had, following the assumption that if they have already published a relevant
paper, they may have other data that I could potentially include in my meta. A
limitation with this approach is that does not allow you to find unpublished stud-
ies by authors who are yet to publish on the review topic. So, in Cooke et al.
(2023), we adopted a different approach where we emailed mailing lists of soci-
eties I am a member of: Division of Health Psychology, European Health
Psychology, UK Society for Behavioural Medicine, to try and reach a wider
audience. It’s worth noting that emailing anyone, either directly following publi-
cation of a relevant paper, or a mailing list, introduces bias into your search
because there is no guarantee that your email will be responded to in the same
way that someone else’s email is responded to. This contrasts with searching
databases and reference lists, which should be replicable by an independent
review team.
Inclusion criteria. Inclusion criteria for my four meta-analyses are alike. Most
include a criterion about results being published in English, and they all explic-
itly mention an effect size, or statistics needed to calculate an effect size; in a
meta, you should be specifying an inclusion criterion about statistical informa-
tion to ensure included studies can be pooled. Other criteria varied depending on
the topic: Cooke and French (2008) mentions screening; Cooke et al. (2016)
drinking behaviour; Cooke et al. (2023) weekly alcohol consumption and/or
heavy episodic drinking. All bar Cooke et al. (2023) mentioned theory relation-
ships, whereas Cooke et al. (2023) mentioned groups being asked to form or not
implementation intentions.
Selection of studies. Cooke and Sheeran (2004) and Cooke and French (2008) were
the result of me searching and screening papers on my own. In contrast, Cooke
et al. (2016) and Cooke et al. (2023) involved me and another reviewer (Mary
Dahdah in the former paper, Helen McEwan in the latter paper) independently
searching and screening papers. This necessitated adding a ‘Selection of Studies’
sub-section in these papers to detail this process. I’d recommend you follow best
practice in searching and screening by involving at least one other independent
person to check your selection of studies (see Chap. 4).
Assessment of methodological quality. Cooke et al. (2023) was the first time I
reported quality assessment of the studies included in a meta-analysis. We used
the Cochrane Risk of Bias tool and provided some detail about this.
Data extraction and coding. In both Cooke et al. (2016) and Cooke et al. (2023), two
authors independently extracted the data, which is best practice in systematic
Sections of an Academic Paper 175
reviewing. Both papers involved coding papers too. In Cooke et al. (2023), this
involved coding papers for moderator analyses. Things were a bit more complex
in Cooke et al. (2016). First, Paul Norman (the third author) and I independently
coded the items used to assess perceived behavioural control, self-efficacy, and
perceived control as there was lots of heterogeneity in how these constructs were
measured in included studies. Second, we spotted that there were 20 different
definitions of alcohol consumption in 44 papers included in one or more of the
meta-analyses! David French (the fourth author) and I coded the 20 definitions
into clusters representing similar phenomena. We ended up with five categories
representing different drinking patterns: Getting drunk; Heavy Episodic
Drinking; Light Episodic drinking; Quantity of Drinks Consumed; Not Drinking.
Having coded the studies in this way, we used this newly created coding frame
as a basis for moderation analyses for our studies. Based on my experience, most
of the time you’ll be coding papers for moderator analyses, rather than coming
up with an entirely new categorisation scheme.
(Schwarzer, 1988) Meta computer program. Two further details were included in
Cooke and French (2008). First, I mentioned including Fail-Safe N values
(Rosenthal, 1979). Second, I explicitly mentioned Cohen’s (1992) guidelines for
interpretation of magnitude of correlations (see Chap. 3). I did the same in Cooke
et al. (2016) and reported guidelines for effect size differences in Cooke et al. (2023).
By the time we get to Cooke et al. (2016), I had moved on to Comprehensive
Meta Analysis (Borenstein et al., 2005) and explicitly mention running a random
effects meta-analysis. Other changes include referencing forest and funnel plots.
There’s an indirect reference to the fact that we had categorical moderator variables
because I mention doing paired Z tests, and a more explicit mention of Publication
bias using funnel plots and Duval and Tweedie’s trim and fill method.
In Cooke et al. (2023), I noted that we used Morris’ (2008) recommendations to
control for baseline differences when we calculated effect size differences (see
Chap. 14). I used the metafor package to run meta-analyses in R, which is the soft-
ware MAJOR runs in jamovi. There’s also text about how we calculated effect size
differences so that negative values indicated greater reduction in alcohol consump-
tion/heavy episodic drinking by the intervention group (see Chap. 7). There’s greater
detail on publication bias statistics and more information on both homogeneity sta-
tistics, with I2 mentioned, and more detailed explanation of how we tested modera-
tors using meta regression and mixed effects meta-analysis (see Chap. 12).
Results
Like the method section, the results section for a meta-analytic paper is different to
the results section for a primary paper. These are the sub-sections I have used in
results sections of my meta-analyses.
Study Characteristics
In both Cooke et al. (2016) and Cooke et al. (2023), I included a Study Characteristics
sub-section. I use this sub-section to outline key information about the included
studies, akin to describing the sample’s characteristics in a primary paper. Like pri-
mary papers, I do my best to summarise the gender distribution, the age range (or
average age), and sample type (e.g. university student, community) of samples
reported in included studies.
Sections of an Academic Paper 177
However, one thing I’ve learned doing data extraction for meta-analyses is that
reporting of basic study details, like the numbers of men and women in the sample,
average age of sample, varies between studies. For example, when coding studies
during data extraction for Cooke et al. (2016), I wanted to include the average age
of the sample for each study. After coding the first five papers, only one of them had
reported the average age of their sample. I then got to my paper (Cooke & French,
2011) and realised I had not reported the average age of my sample either! By the
end of data extraction, I’d learned that while most of the studies did report their
sample(s) average age, this information was not always reported.
Additional information about studies that’s useful to report in a meta-analysis
includes (a) country of recruitment; (b) study sample sizes; (c) publication year and
(d) total number of studies included and the total number of samples. Reporting
information about country of recruitment can help with interpretation later in the
paper. For instance, in both Cooke et al. (2016) and Cooke et al. (2023), most
included studies were conducted in the UK. This means you need to be careful
about generalising results to other countries. Alternatively, the range of sample sizes
reported by included studies allows you to consider the power of the studies.
Publication year might be useful to discuss if there has been a change in a definition
or policy in your literature.
The total number of studies is probably the most important information to report
because it tells you a lot about the literature you are meta-analysing. In Cooke et al.
(2016), we included 28 papers reporting 40 studies. It’s quite common in meta-
analysis to include results from a paper that reports multiple studies. For example,
we included three studies from Conner et al. (1999) in Cooke et al. (2016). As these
were independent samples, this is fine, although it does mean that you trip over
sometimes when talking about the number of effect sizes you include as it’s the
number of samples rather than the number of papers. I’ve moved away from papers
to talking about samples because there are times when the same paper includes one
study with multiple samples. In Cooke et al. (2023), we included multiple, indepen-
dent, samples from Norman et al. (2018). This study used a fully factorial design
with three factors, creating eight independent samples. Initially, we extracted the
control condition (who received none of the three interventions) and the group that
only received the implementation intention intervention (and neither of the other
two). Following discussion, we realised that we could include further comparisons
between groups that received the same interventions but either did or did not form
implementation intentions. This increased the number of samples we could meta-
analyse. It’s important to report this information in your results section.
Main Effects
You should always report the main or overall effect size and the confidence intervals
from your meta-analysis. You may have quite a few of these, so, put them in a table
and describe sparingly to save word count. Nevertheless, it is important to report
results across all (or most) included studies as it provides a framework for the
remainder of the results section. In Cooke and French (2008), I was pushed for word
count so did not report the results for the five theories of planned behaviour
178 15 Writing Up Your Meta-Analysis
relationships in the main body of the text, instead, putting the statistics in a table. In
hindsight, I think this is a mistake and in Cooke et al. (2016), where there were nine
relationships, I still found space to report each overall effect size. This was particu-
larly important as the overall effect sizes varied considerably from one another with
three null relationships being worthy of highlighting. Refer to forest plots in this
section to help readers see the dispersion of effect sizes across included studies. In
Cooke et al. (2023), the main effect section is split into two paragraphs because we
were meta-analysing two outcomes (weekly drinking, heavy episodic drinking).
This split also occurred because there was a positive effect for weekly drinking and
a null effect for heavy episodic drinking. If both effect sizes had been of similar
direction (i.e. both null, both positive, both negative) I might have put them in the
same paragraph.
Other statistics to report in the main effects section relate to heterogeneity (see
Chap. 12) and publication bias (see Chap. 13). In my meta-analyses to date, there’s
a general pattern of me finding heterogeneity in overall effect sizes, which leads into
a discussion of moderators (see next section). I’ve generally found a lack of evi-
dence of publication bias, although in Cooke et al. (2016), we did find evidence of
publication bias for two relationships (perceived behavioural control-intentions;
self-efficacy-intentions). Closer inspection of the self-efficacy–intention relation-
ship using a sensitivity analysis suggested that publication bias for this relationship
was an artefact of including results from a study with a different sample age to all
other included studies. We were more convinced that there was publication bias for
the perceived control–intention relationship so reported the effect size as well as the
adjusted effect size following the Trim and Fill method (see Chap. 13). You can also
include funnel plots to comment on the presence or absence of publication bias.
Moderator Analyses
I’ve included moderator analyses in all my meta-analyses. I’ve looked at factors like
publication status (published vs unpublished; Cooke & Sheeran, 2004), time
between measurement/length of follow-up (Cooke et al., 2023; Cooke & Sheeran,
2004), type of screening test (Cooke & French, 2008), location of recruitment (i.e.
were participants recruited in response to a letter from their GP, a screening service;
Cooke & French, 2008), cost of screening (Cooke & French, 2008), whether partici-
pants were sent an invitation to screen (Cooke & French, 2008), pattern of con-
sumption (Cooke et al., 2016), gender distribution (Cooke et al., 2016), age of
participants (Cooke et al., 2016), sample type (i.e. university vs community, Cooke
et al., 2023), mode of delivery (i.e. paper vs online, Cooke et al., 2023), intervention
format (i.e. type of implementation intention, Cooke et al., 2023).
There are two takeaway messages from this list. First, there’s little consistency in
the moderators we tested across the metas. This reflects the idea that moderator
analyses tend to be specific to the meta-analysis you are conducting. Experimental
factors, like intervention type, are unlikely to be relevant for meta-analyses of cor-
relations. Behaviour-specific moderators, like those focused on screening type, or
pattern of consumption, are only relevant if your meta-analysis focuses on those
behaviours. Second, that we included moderator analyses in all four meta papers
Sections of an Academic Paper 179
tells you that in each paper, there was heterogeneity in overall effect sizes. This was
true of both meta-analysis of correlations and meta-analysis of effect size differ-
ences. It’s likely you will need to report moderator analyses in your results section
(see Chap. 12).
I’ve reported moderator analyses using the format of your main analyses, focus-
ing on overall effects for each category of your moderator or the overall effect if it
is a continuous moderator. In Cooke et al. (2023), we found that sample type and
time frame both moderated the overall effect of forming implementation intentions
on weekly drinking. So, for sample type, which had two categories (university;
community), we reported meta-analyses of the overall effect size for weekly drink-
ing for each category separately. In other words, we computed a meta-analysis of
effect size differences for the studies that recruited community samples and then
separately, computed a meta-analysis of effect size differences for the studies that
recruited university samples. We partitioned the effect sizes into these two catego-
ries. Results highlighted why we had a significant moderation effect: the effect size
difference for community samples was d+ = −0.38 (a small effect size difference),
while the effect size difference for university samples was d+ = −0.04 (a null effect
size difference). Therefore, the same intervention—forming implementation inten-
tions—had a significant effect on self-reported weekly drinking when received by
community samples, and no effect on self-reported weekly drinking when received
by university samples.
As well as reporting the overall effect sizes for each category, you should also
report the homogeneity of the overall effect size and a test of difference between the
categories, either a Z test of independent effects or a chi-square test (see Chap. 12
for more on tests). The reason for reporting the heterogeneity of the categorical
effect sizes is to see if you have met one of the original goals of meta-analysis; to
find a homogenous, overall effect size. Both effect sizes (those for community and
university samples) lacked heterogeneity. This suggests that the significant effect of
forming implementation intentions on weekly drinking in community samples is
consistent. Conversely, the null effect size difference on weekly drinking for univer-
sity students being homogenous suggests there’s unlikely to be an effect of this
intervention in university samples.
Narrative Synthesis
One of my favourite aspects of meta-analysis is that you need much less narrative
synthesis than in a systematic review. The meta-analysis provides the data synthesis,
meaning you can save your word count for other sections of the paper. I do not want
to leave you with the impression that meta-analysis does away with narrative syn-
thesis entirely, however. You still need to report key information.
For instance, if you were to only write the results of a meta-analysis as r+ = 0.45
you are leaving a lot of useful information out. You’ve not told the reader what mag-
nitude this result is (i.e. medium-sized correlation in Cohen (1992) terms). Similarly,
if you report results as heterogeneous by writing I2 = 78%, what does that mean (i.e.
high heterogeneity)? Or stating there is publication bias. Statistics can convey a lot
of information on their own, but without being put into context by the author, they
180 15 Writing Up Your Meta-Analysis
lose most of their value. After toiling for several months (years) on your meta-
analysis you will have become an expert on the literature you are synthesising. This
means you are well placed to explain to the reader, who is almost certainly less
knowledgeable than you, what the results mean.
As I typically meta-analyse results from fields I know well, I have already
invested considerable time in thinking about what the results would mean before I
run the meta-analysis. When I started working on Cooke et al. (2016), I knew what
I expected to find regarding theory of planned behaviour relationships for alcohol
and was using the meta-analysis to test those expectations. Alternatively, when I
began working on Cooke et al. (2023), I was curious to see how big the effect of
implementation intentions on alcohol outcomes was. I suspected that results would
show small effect size differences for drinking behaviour because, based on my
reading of the alcohol literature, interventions to reduce drinking behaviour are
typically associated with small effect size differences. As recommended in Chap. 7,
it’s advisable to get to know your included studies’ effect sizes before running a
meta-analysis to prime you for your meta-analysis output.
useful to include in the main body of the paper. I tend to include visualisations of
heterogeneity, from a meta-regression plot, or publication bias from a funnel plot,
as supplementary files.
Rigorous reporting of meta-analyses can lead to lots of additional tables and
figures, so include these as supplementary files. We had nine forest and nine funnel
plots in Cooke et al. (2016), plus, three supplementary tables, one of which was the
main table of information extracted from included studies. The other tables pro-
vided information on excluded studies and how we coded the patterns of consump-
tion data for each study. Cooke et al.’s (2023) supplementary tables include how we
coded the full factorial designs and a description of control conditions. We also
included Figures generated in RevMan for risk of bias using the Cochrane Tool, so,
tables displaying quality ratings can appear as supplementary tables.
The final thing to say in this section is that during the initial submission of Cooke
et al. (2023), which was rejected by the journal we submitted the paper to, we had
to upload the raw data used to calculate effect size differences to the Open Science
Framework. This was stipulated by the journal we submitted to and strikes me as
good practice AND allows you to save space in your paper. As noted repeatedly,
authors of experiments/interventions in the psychological literature rarely report
computed effect sizes, so for transparent reporting in a paper, you would have to
report the raw mean and standard deviations for both groups, plus both sample
sizes, in your table of included studies. That’s six columns immediately!!! I recom-
mend you put the raw statistics in a spreadsheet on the Open Science Framework,
like we did, and report the computed effect size differences in the paper. Then you
only need three columns—the effect size difference and sample sizes for each group.
Putting the extracted raw statistics on the Open Science Framework means any-
one can check your calculations. This might seem scary, but, the alternative, of
review teams calculating effect size differences then pooling them using meta-
analysis does not sound much better to me—and I speak as someone who used to do
this!!! It’s better to fear someone finding an error in your working than someone
NOT finding an error in your working. If someone finds the error it can be corrected,
if undiscovered, it’s unlikely it ever will be.
Discussion
Like papers reporting results from a primary analysis, the opening paragraph of the
discussion in a meta-analysis paper should contain a summary of the main findings.
The second paragraph contains text comparing results from the meta-analysis to the
broader literature. In Cooke and French (2008), Cooke et al. (2016), and Cooke
et al. (2023), I pretty much did the same thing—compare the results of the meta-
analysis I was writing about with similar meta-analyses reported in the literature.
While Cooke and French’s text compares results from meta-analysis of screening
relationships to broader meta-analyses of theory of planned behaviour/theory of
reasoned action meta-analyses (e.g. Armitage & Conner, 2001), publication of more
specific meta-analyses allowed me to make more focused comparisons in Cooke
et al. (2016) and Cooke et al. (2023). In the former, I compared results for alcohol
182 15 Writing Up Your Meta-Analysis
Summary
The aim of this chapter was to offer guidance on how to write up the results of a
meta-analysis for submission to a peer-review publication. I focused on how to
write up results for a secondary analysis, drawing attention to key differences in
reporting methods and results.
References 183
References
Armitage, C. J., & Conner, M. (2001). Efficacy of the theory of planned behaviour: A
meta-analytic review. British Journal of Social Psychology, 40(4), 471–499. https://doi.
org/10.1348/014466601164939
Bélanger-Gravel, A., Godin, G., & Amireault, S. (2013). A meta-analytic review of the effect of
implementation intentions on physical activity. Health Psychology Review, 7, 23–54. https://
doi.org/10.1080/17437199.2011.560095
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2005). Comprehensive Meta-
Analysis (Version 2) [Computer software]. Biostat.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Conner, M., Warren, R., Close, S., & Sparks, P. (1999). Alcohol consumption and the theory of
planned behavior: An examination of the cognitive mediation of past behaviorid. Journal of
Applied Social Psychology, 29(8), 1676–1704. https://doi.org/10.1111/j.1559-1816.1999.
tb02046.x
Cooke, R., Dahdah, M., Norman, P., & French, D. P. (2016). How well does the theory of planned
behaviour predict alcohol consumption? A systematic review and meta-analysis. Health
Psychology Review, 10, 148–167. https://doi.org/10.1080/17437199.2014.947547
Cooke, R., & French, D. P. (2008). How well do the theory of reasoned action and theory of
planned behaviour predict intentions and attendance at screening programmes? A meta-analy-
sis. Psychology & Health, 23(7), 745–765. https://doi.org/10.1080/08870440701544437
Cooke, R., & French, D. P. (2011). The role of context and timeframe in moderating relationships
within the theory of planned behaviour. Psychology & Health, 26(9), 1225–1240. https://doi.
org/10.1080/08870446.2011.572260
Cooke, R., McEwan, H., & Norman, P. (2023). The effect of forming implementation intentions
on alcohol consumption: A systematic review and meta-analysis. Drug and Alcohol Review, 42,
68–80. https://doi.org/10.1111/dar.13553
Cooke, R., & Sheeran, P. (2004). Moderation of cognition-intention and cognition-behaviour rela-
tions: A meta-analysis of properties of variables from the theory of planned behaviour. The British
Journal of Social Psychology, 43(Pt 2), 159–186. https://doi.org/10.1348/0144666041501688
Hunter, J. E., Schmidt, F., & Jackson, G. B. (1982). Meta-analysis: Cumulating research findings
across studies. SAGE.
Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P. A., Clarke,
M., Devereaux, P. J., Kleijnen, J., & Moher, D. (2009). The PRISMA statement for report-
ing systematic reviews and meta-analyses of studies that evaluate health care interventions:
Explanation and elaboration. Journal of Clinical Epidemiology, 62, e1–e34. https://doi.
org/10.1016/j.jclinepi.2009.06.006
Malaguti, A., Ciocanel, O., Sani, F., Dillon, J. F., Eriksen, A., & Power, K. (2020). Effectiveness of
the use of implementation intentions on reduction of substance use: A meta-analysis. Drug and
Alcohol Dependence, 214, 108120. https://doi.org/10.1016/j.drugalcdep.2020.108120
McEachan, R. R. C., Conner, M., Taylor, N. J., & Lawton, R. J. (2011). Prospective prediction
of health-related behaviours with the theory of planned behaviour: A meta-analysis. Health
Psychology Review, 5, 97–144. https://doi.org/10.1080/17437199.2010.521684
Morris, S. B. (2008). Estimating effect sizes from pretest-posttest-control group designs.
Organizational Research Methods, 11(2), 364–386. https://doi.org/10.1177/1094428106291059
Norman, P., Cameron, D., Epton, T., Webb, T. L., Harris, P. R., Millings, A., & Sheeran, P. (2018).
A randomized controlled trial of a brief online intervention to reduce alcohol consumption in
new university students: Combining self-affirmation, theory of planned behaviour messages,
and implementation intentions. British Journal of Health Psychology, 23(1), 108–127. https://
doi.org/10.1111/bjhp.12277
Rosenthal, R. R. (1979). The File drawer problem and tolerance for null results. Psychological
Bulletin, 86, 638–641.
184 15 Writing Up Your Meta-Analysis
Schwarzer, R. (1988). Meta: Programs for secondary data analysis. [Computer Software].
Sheeran, P., & Orbell, S. (1998). Do intentions predict condom use? Meta-analysis and examina-
tion of six moderator variables. British Journal of Social Psychology, 37(2), 231–250. https://
doi.org/10.1111/j.2044-8309.1998.tb01167.x
Topa, G., & Moriano, L. J. A. (2010). Theory of planned behavior and smoking: Meta-analysis and
SEM model. Substance Abuse and Rehabiliation, 1, 23–33.
Vilà, I., Carrero, I., & Redondo, R. (2017). Reducing fat intake using implementation inten-
tions: A meta-analytic review. British Journal of Health Psychology, 22, 281–294. https://doi.
org/10.1111/bjhp.12230
Glossary
C F
Cochrane Library, 45 Fixed-effect, 103, 116, 125
Cochrane Risk of Bias tool, 68 Fixed-effect meta-analysis, 125–127, 130,
Collinear, 104 131, 133
Confidence intervals, 7, 102–105, 108, 115, Forest plot, 10, 13, 83, 101, 104, 105, 107,
116, 118, 121, 129, 130, 137, 108, 114, 116–118, 120, 121,
162–165, 172, 177 130–132, 137, 155, 163–165,
Correlation, vii, 1, 3, 7–9, 11–14, 17, 19–30, 178, 180
36, 37, 40, 41, 50–52, 54–56, 72, Funnel plot, 15, 83, 84, 96, 101, 106–109,
75–83, 85, 93, 95, 99–109, 119, 114, 118–121, 131, 152–153, 176,
126, 128–131, 136, 138–141, 144, 178, 180, 181
145, 150–155, 161, 163, 166–168,
172, 175, 176, 178, 179, 182
Cross-sectional, 19 G
Ghost, 46
Grey literature, 25, 28, 42, 43, 154
D
Dispersion, 26, 84, 137, 152, 178
Double-blind, 63 H
d+, 9, 10, 12, 16, 141, 151, 172, 179 Heterogeneity, 7, 10, 11, 37, 38, 40, 46, 75,
77, 79, 80, 82–83, 85, 101,
103–105, 114–118, 121, 128, 130,
E 133, 135–138, 141–146, 159, 162,
Effect size, 2, 7–17, 19–30, 37, 38, 44, 175, 178, 179, 181
46, 49, 50, 52–55, 64, 75–86, 92, Homogeneity, 77, 79, 82, 83, 175, 176,
95–97, 101–105, 107, 108, 179
114–121, 125–132, 135–138, Homogeneous, 82
140–144, 146, 150–154,
159–169, 173–182
Effect size difference (d), 3, 9–12, 16, 19, 20, I
22, 23, 26–30, 37–39, 42, 50, Independent groups design, 9
52–54, 56, 75–82, 85, 86, 91, 92, I2, 83, 103, 115, 128, 135–137, 144, 145,
95–97, 105, 111–121, 126, 179