0% found this document useful (0 votes)
37 views4 pages

Statistical Reasoning

Uploaded by

bodwiser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views4 pages

Statistical Reasoning

Uploaded by

bodwiser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

J. Paediatr.

Child Health (2000) 36, 502–505

Statistics for Clinicians

3: Basic concepts of statistical reasoning:


Standard errors and confidence intervals
JB CARLIN1,4 and LW DOYLE 2,3,4
1The Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Parkville, 2 The Division of

Newborn Services, The Royal Women’s Hospital, Melbourne, and the Departments of 3Obstetrics and Gynaecology, and
4Paediatrics, The University of Melbourne, Parkville, Victoria, Australia

In this article we begin to discuss the techniques of formal will not be discussed further, except to say that statistical
statistical analysis or statistical inference, to be distinguished inference is of little value or can even be misleading unless a
from the descriptive statistical analysis that is involved in study has been designed to avoid major biases.
obtaining tables of frequencies, scatterplots of data and so on, In clinical research, the population of interest may be rather
as described in the previous article of this series.1 This discus- hard to define explicitly, but it will usually be some fairly
sion requires an understanding of a number of basic statistical general notion of the ‘universe’ of all patients of a given type.
concepts and terms, key among them being the idea of For example, in the study of long-term outcomes in children of
sampling variability. To explain the fundamental role of this very low birthweight (VLBW) introduced in our previous
concept we need to introduce the notions of population and article,1 the researchers’ underlying interest is not just in the
sample and probability. We will then explain how the concept particular group of patients that was followed but in the popu-
is used in the form of standard errors and confidence intervals. lation of all VLBW children of similar ‘sociobiological’
characteristics to those who were studied. When seeking publi-
cation in an international journal, the unspoken assumption is
POPULATIONS AND SAMPLES often that the population of interest extends beyond national
boundaries, although the precise definition of the population to
The essential role of formal statistical analysis is to account for which researchers seek to generalize is often left unstated.
the fact that research studies are performed on finite groups of As with the word population, we all have some familiarity
subjects. The group of patients (or other individuals) in a study with the concept of a sample, meaning a small amount of a
is regarded as a sample from a larger population. Statistical larger amount. In statistical analysis, a sample means a smaller
inference addresses the question of what can be said about the number of individuals taken from a population of interest. For
population based just on the sample, allowing for the crucial the valid application of most statistical methods, we strictly
fact that another sample or samples would not produce require that such samples be randomly selected from the popu-
identical results. lations of interest. In practice, this assumption can be difficult
Sociologists, epidemiologists and others are familiar with or impossible to sustain, but it is widely agreed that a reason-
the concept of a population, meaning a group of individuals able substitute is to be able to argue convincingly that one’s
with distinctive characteristics. Individuals can be humans, sample is representative of the population. In other words, we
animals, or other objects; in this series, unless stated otherwise, need to be able to think of our study group or sample as if it
we will use the term ‘individuals’ to refer to humans. What were a random sample from the population of interest.
makes the individuals distinctive to sociologists or epidemi- (Warning: further discussion of this delicate point may descend
ologists may be that they live in the same country (e.g. rapidly into a philosophical quagmire!)
Australia), or the same region (e.g. the state of Victoria), or The mechanics of much statistical analysis concern the use
perhaps they are of the same gender or some other subgroup of of summary statistics obtained from a sample to provide
interest. Since populations in the geographical and sociological estimates of population values, which are formally called para-
world are continually changing, extrapolation from one region meters. (This can be confusing to the clinician who may
or time to another can be hazardous. This is a major problem sometimes describe a measured attribute of a patient, such as
for the researcher, since it can cause systematic differences blood pressure, as a parameter.) Examples of parameters in the
between groups (bias – see previous article in the series1), but VLBW study are the proportion of children requiring mechan-
ical ventilation, mean verbal intelligence quotient (IQ) score
at age 5 years and the difference in mean verbal IQ score at age
Correspondence: Associate Professor LW Doyle, Department of
5 years between children of birthweight < 1000 g and those of
Obstetrics & Gynaecology, The University of Melbourne, Parkville
3010, Victoria, Australia. Fax: (03) 9347 1761; birthweight 1000–1499 g. In statistical texts, parameters are
Email: l.doyle@obgyn-rwh.unimelb.edu.au often symbolized by Greek letters, for example µ (the Greek
JB Carlin, PhD, Statistician. LW Doyle, MD, MSC, FRACP, ‘m’) and σ (Greek ‘s’) for a mean and standard deviation,
Paediatrician. respectively, and π (Greek ‘p’) for a proportion. It is important
Accepted for publication 13 July 2000. to have distinctive notation because the crux of statistical
Statistics for clinicians 503

analysis is the fact that parameters, the unattainable ‘true distributions, such as the binomial, mentioned above, and
values’ in the population, are distinct from the corresponding Poisson. The appropriate model to apply in any given analysis
sample values that we use as their estimates. For example, we is a technical matter beyond the scope of this series. For the
showed that the sample proportion of VLBW children purposes of statistical inference, however, a remarkable mathe-
requiring mechanical ventilation was 0.75 or 75%;1 this is matical fact (the ‘Central Limit Theorem’) says that many
obviously a good estimate of the true population value (under statistical inferences can be created using tools based on the
our crucial assumption of random/representative sampling) but normal probability distribution. For this reason we give a brief
it is not equal to it, unless we have been lucky. review of the important features of the normal distribution.
If ever a sample were to comprise all of a population of
interest there would be no need for statistical inference. This
is the case with a census, where sample values and popula- The normal distribution
tion parameters are identical. It is the nature of research,
however, that it seeks to generalize from the ‘local result’ to Many readers will have a general familiarity with the bell-
a broader target, and this is why statistical methods play such shaped curve that represents the normal distribution (Figs 1,2).
a central role. (It is sometimes called the Gaussian distribution to avoid
confusion with other meanings of the word normal.) Certain
variables are by their nature normally distributed, meaning that
STATISTICAL REASONING if we create a histogram based on a very large sample its shape
will approach that of the bell curve. To have this property, a
Statistical reasoning seems convoluted to the non-statistician, variable needs to have a continuous range of possible values.
especially when it comes to the use of hypothesis tests and P An example of a variable in our data set that is approximately
values. In order to defer some of these complications, we focus normally distributed is height at age 5 years (Fig. 2).
in this present article entirely on the reasoning involved in the An obvious feature of the normal distribution is that it is
estimation of population parameters using sample values. From symmetric, with a mean value in the centre of the distribution
a statistical point of view, such estimation must involve quan- and an even spread on both sides of the mean. The normal
tification of the precision of estimation, which is captured in
the calculation of a standard error and the closely related
confidence interval. These calculations simply quantify the Table 1 Possible outcomes of tossing two coins, with their prob-
extent to which variability from sample to sample (of the same abilities. It is easy to see from this table that the probabilities of
obtaining two heads, one head and no heads are 0.25, 0.5 and 0.25,
size as the one in your study) could be expected to lead to respectively
different estimates of the same parameter. Such quantification
of uncertainty requires the language of probability, and some Coin 1 Coin 2 Probability
familiarity with the famous normal distribution.
Head Head 0.25
Head Tail 0.25
PROBABILITY Tail Head 0.25
Tail Tail 0.25
Probability forms the basis for statistical inference. Most of us Total 1.0
have some understanding of probability, even if we think we
don’t. Perhaps we learnt something about probability at school.
Alternatively, most of us have had the occasional wager, perhaps
on events such as a horse race, and governments may be forcing
us to a better understanding of probability by their increasing
reliance on the gambling dollar as a source of revenue.
If we toss a coin fairly, we know there is an equal chance of
a head or a tail. The probability of a head is 0.5, and of a tail is
0.5. The total probability is 1, as it must always be for the sum
of probabilities of all possible (mutually exclusive) events.
With two coins, things become a little more complex. The
different combinations and their probabilities are shown in
Table 1. With many coins, the number of possible outcomes
increases rapidly. Calculating the probability of particular
combinations (e.g. the probability of two heads in 10 tosses) is
simplified by using the binomial probability distribution,
which will be discussed further in a later article.
All statistical inference is based on probability models, which
propose that we can think of the observed value of a variable
(e.g. did this child require ventilation? what was the child’s
Fig. 1 Graph of the normal curve. Values on the y-axis are probability
verbal IQ score?) as if it were the result of a random experiment density, meaning that the probability of a z-value falling within any
(like a coin toss). Usually this is a fiction based on the under- specified range is the area under the curve between the upper and lower
lying assumption, already discussed, of random sampling from a values of the range. For example, the probability of a value less than
population. To understand statistical inference fully requires an – 1 (left-hand shaded region) may be calculated as 0.159; the proba-
understanding of a number of different probability models or bility of a value greater than 2 (right-hand shaded region) is 0.023.
504 JB Carlin and LW Doyle

range’ may be suspected to have a problem. There are several


difficulties, however, with using a statistical concept of
‘normal’ to define a clinical ‘normal’. Firstly, not all medically
important variables are normally distributed. Secondly, if the
usual statistical cut-offs of 2 SD above or below the mean are
taken as the cut points, all ‘abnormals’ (diseases?) defined this
way will have a prevalence of approximately 5%, or 2.5% if
only one extreme is clinically important. As clinicians well
know, the rates of diseases vary widely above and below 5%
(or 2.5%).

SAMPLING DISTRIBUTIONS

The fundamental role of the normal distribution in statistical


inference does not derive from it adequately representing the
distribution of observed variables. Rather, this distribution is
Fig. 2 Histogram of height at 5 years of age, with a normal curve important because it describes the sampling distribution of
(with mean and standard deviation set equal to the sample values)
superimposed.
summary statistics, such as sample means, which we use as
estimates of the population parameters in which we are inter-
ested. The sampling distribution is a difficult but central
distribution’s bell curve is defined by a rather complicated concept in statistical inference. It refers to the distribution of
formula but it is conveniently characterized by two parameters, values of the summary statistic that we would get if the study
the mean and standard deviation (SD) of the distribution. The we are analyzing were repeated many times. For instance, if the
most commonly used facts about the normal distribution are VLBW study could be repeated many times in the same popu-
that about two-thirds (really 68.4%) of values fall within a lation of infants (a highly hypothetical notion!), with the same
range of 1 SD of the mean, 95% fall within 1.96 SD of the sample size of 165 surviving infants, the proportion requiring
mean, and 99.7% fall within 3 SD of the mean. assisted ventilation would not always reproduce the 75%
Other probabilities associated with the normal distribution observed in this study. We would in fact see a range of values
can be found in tables at the back of most elementary statistics spread around the true parameter value. This distribution of
texts or in statistical packages and spreadsheet programs. These values, the so-called sampling distribution, turns out to be
are always calculated in terms of the standardized normal normal, for a wide variety of summary statistics, even when the
distribution, which is a special case of the normal distribution individual measurements themselves are not normally distrib-
where the mean is zero, and the SD is 1 (Fig. 1). Probabilities uted. (Indeed in this mini-example, the individual measure-
associated with any value of a normally distributed variable ments are dichotomous!) The fact just described is the
are obtained from the standardized distribution by converting so-called Central Limit Theorem.
the value to a standardized score (or z-score) by subtracting the There are some constraints on the applicability of this result,
mean and dividing by the SD; in other words expressing but the overriding factor in determining whether the normal
the value as difference from the mean in units of 1 SD. For distribution can be used in creating inferences based on a
example, for a psychological test with a mean of 100 and a SD particular estimate is the sample size. In general, a sample of
of 15, an individual’s score of 106 is equivalent to 0.4 SD, and 30 or more is sufficient to ensure that the sampling distribution
another individual’s score of 91 is equivalent to – 0.6 SD. of the mean is normally distributed, even in the presence of
The standardized normal distribution is especially useful substantial lack of normality in the underlying variable itself.
when contrasting variables measured in different units relative
to the same standard, or measured by different tests. For
example, the intelligence quotient IQ can be measured by The standard error
several different psychological tests, but not all tests have the
same mean and SD. The mean of a psychological test increases Perhaps the most widely used and simplest application of the
over time, and hence psychological tests have to be restandard- basic idea of sampling distributions is that involved in quanti-
ized at intervals. However, it can generally be assumed that the fying the precision of estimation in using a sample mean to
different psychological tests are measuring the same quantity. estimate a population mean. It should seem obvious that the
Consequently, a child with an IQ of 85 on a recently standard- mean of a variable (×–) from a randomly selected sample is the
ized test with a mean of 100 and SD of 15 has a standardized best estimate of the population mean (µ). But the Central Limit
IQ score of – 1 SD, as does another child who has an IQ of 90 Theorem provides the less obvious fact that variation in sample
on a different test where the mean in randomly selected means over repeated samples follows a normal distribution,
contemporaneous controls has increased over time to 106 with with a SD determined by the SD of the population (σ) and the
a SD of 16. sample size (n), and given by SE(×– ) = σ/√n. This particular
The normal distribution is sometimes thought of as the quantity is called the standard error of the mean, sometimes
common or usual distribution, although there are many abbreviated to SEM. In reality, the SEM must itself be
variables that follow quite different distributions (both estimated by using the sample SD (s) in place of the unknown
symmetrically and non-symmetrically shaped). The normal σ, giving the simple but very important formula:
distribution is often used by clinicians, in the guise of ‘normal’
or ‘reference’ ranges, where patients falling outside the ‘normal SEM = s/√n.
Statistics for clinicians 505

The relationship between SD or s and SEM is a common


source of confusion. Which should be used when? The
essential point to remember is that the SD gives a measure of
the variability of the individual values of a variable (across a
sample or population), so it is a descriptive statistic. In con-
trast, the SEM is a more abstract quantity, giving an estimate of
the variability of sample means (of the given size n) across a
hypothetical population of values that would arise from repeti-
tions of the same study. The SEM is used in statistical
inference because it gives an idea of how confident we are in
our estimate of the true mean, based on the observed sample.
When we compare two (or more) groups, the SEM is likely to
be more relevant than the SD because it can be used to provide
a yardstick for whether the two samples are likely to have come
from populations with two different means (see below and next
article in the series).

Fig. 3 Mean and 95% confidence interval for mean of verbal intelli-
CONFIDENCE INTERVALS FOR MEANS gence quotient (IQ) at 5 years of age in birthweight subgroups above
and below 1000 g, displayed with a dotplot of individual observations
The confidence interval (CI) is the accepted statistical tech- in each group.
nique for expressing precision of an estimate. The simplest CI
are constructed very directly from the standard error of the means are narrow relative to the spread of the data points. The
estimate. We illustrate this by describing how a CI is created CI is wider in the subgroup < 1000 g birthweight for two
for estimating a population mean. As an example, suppose we reasons: the greater variation within this group (higher SD) and
wish to present the sample mean IQ score at age 5 years in our its smaller sample size. Figure 3 begs the question ‘Can we
study as an estimate of a true population value. (If we were not conclude that verbal IQ in the two birthweight subgroups come
willing to do this, why would we claim our study, based on from the same population distribution?’ We will discuss how
these 202 Melbourne children, is of any interest to a national or this question might be answered in the next article in the series.
international readership?). In the 138 children tested at 5 years
of age, the observed mean IQ score was 98.7 and the SD was
15.3, giving SEM = 1.3. CONFIDENCE INTERVALS FOR OTHER
A CI for the mean is created by taking the observed mean PARAMETERS
(the estimate) and in turn adding and subtracting a multiple of The underlying theory behind the CI for a mean is that the
the SEM, where the multiple is taken from the standard normal sampling distribution of the estimate (in this case, the sample
distribution and depends on the level of ‘confidence’ required mean) is normal, and this fact holds true for a large number of
in the interval. The conventional level chosen is 95%, which other sample statistics that are used for estimating population
corresponds to using a multiple (normal ‘z’ value) of 1.96, so parameters of interest. A proportion is in fact a particular type
that a 95% CI for the mean is given by the range: of mean (an average of ‘0’s and ‘1’s), and the same method
– – (1.96 × SEM) to ×
× – + (1.96 × SEM). works for constructing CI, with large samples, except that a
different formula is required for the standard error. The
In our example, for mean IQ, this works out as 96.1–101.3. The principle of using standard errors to construct CI also extends
interpretation is, rather loosely, that we can be 95% confident to making comparisons, for example examining the difference
that the true population mean lies between 96.1 and 101.2. between the mean IQ in the two subgroups considered above.
More precise interpretations depend on whether one takes the Finally, it is important to remember that the methods discussed
‘frequentist’ or ‘Bayesian’ view of statistics; for an introduction in this article are based on ‘large-sample theory’ and generally
to these issues see the text by Motulsky.2 Although the 95% need to be modified for smaller samples. We will return to
level is conventional in most scientific reports, it should be many of these issues in later articles in the series.
emphasized that this choice is essentially arbitrary. Some Our next task, however, is to relate the ideas of sampling
authors have argued for lower confidence levels, which variability and CI to the well-known statistical method of
produce narrower intervals (e.g. a 90% CI for the mean would hypothesis or significance testing. We will tackle this in the
use the ‘z’ multiplier 1.645 instead of 1.96), since these focus next article, which considers in more detail the comparison of
greater attention on parameter values that are supported more continuous distributions between two groups, using the t-test.
strongly by the data.3
Figure 3 gives an example of the graphical representation of
means and 95% CI for verbal IQ at age 5 years in each of the REFERENCES
two birthweight subgroups, displayed with the dotplot of indi-
1 Carlin JB, Doyle LW. Statistics for clinicians. 2: Describing and
vidual data points for each subgroup that we showed in the displaying data. J. Paediatr. Child Health 2000; 36: 270–4.
previous article in this series.1 The sample sizes, means, SD, 2 Motulsky H. Intuitive Biostatistics. Oxford University Press, New
and SEM for the two birthweight subgroups were as follows; York, 1995.
< 1000 g: n = 51, mean 94.7, SD 17.0, SEM 2.4; 1000–1499 g: 3 Turkey JW. Tightening the clinical trial. Control Clin. Trials 1993;
n = 89, mean 100.2, SD 14.6, SEM 1.5. Note that the CI for the 14: 266–85.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy