0% found this document useful (0 votes)
63 views17 pages

A Spatial Scan Statistic

Uploaded by

Ria N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views17 pages

A Spatial Scan Statistic

Uploaded by

Ria N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Communications in Statistics - Theory and Methods

ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: https://www.tandfonline.com/loi/lsta20

A spatial scan statistic

Martin Kulldorff

To cite this article: Martin Kulldorff (1997) A spatial scan statistic, Communications in Statistics -
Theory and Methods, 26:6, 1481-1496, DOI: 10.1080/03610929708831995

To link to this article: https://doi.org/10.1080/03610929708831995

Published online: 27 Jun 2007.

Submit your article to this journal

Article views: 5069

View related articles

Citing articles: 574 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=lsta20
COMMUN. STATIST.-THEORY METH., 26(6), 1481- 1496 (1999)

A SPATIAL S C A N STATISTIC

Martin Kulidorff

Biometry Branch. D C P C , National Cancer Institute


E P N 344, 6130 Executive Blvd, Bethesda MD 20892, USA
and
Department of Statistics, Uppsala University, 751 20 Uppsala, Sweden

Keywords: point patterns; inhomogeneous Poisson process; Bernoulli process;


clusters; clustering; confounders; sudden infant death syndrome.

ABSTRACT

T h e scan statistic is commonly used t o test if a one dimensional point process


is purely random, or if any clusters can be detected. Here it is sirnultane-
ously extended in three directions: (i) a spatial scan statistic for the detection
of clusters in a multi-dimensional point process is proposed, (ii) the area of
the scanning window is allowed to vary, and (iii) t h e baseline process may b e
any inhomogeneous Poisson process or Bernoulli process with intensity pro-
portional to some known function. T h e main interest is in detecting cfusters
not explained by t h e baseline process.
These methods are illustrated on a n epidemiological data set, but there
are other potential areas of application as well.

1. I N T R O D U C T I O N

A scan statistic is used t o detect clusters in a point process. It has been studied
in the one-dimensional setting by Naus (1965a) and by many others For a
point process o n an interval [a, b], a window [t, t $ wj of fixed size w < b - a is
moved along t h e interval. Over all possible values of t , the maximum number

Copyright O 1997 by Marcel Dekker, Inc


1482 KULLDORFF

of point,s in the window is recorded and compared to its distribution under the
null hypothesis of a purely random Poisson pzocess.
T h e one-dimensional problem has been extended in various direct,ions.
When the points are grouped into one of several sub-intervals we have aggre-
gated data. This has been studied by h'allenstein et al. (1989) among others,
and is of interest when we have, for instance, monthly counts of some event,
Weinstock (1981) has studied t h e problem where under t h e null-hypothesis the
intensity of the underlying Poisson process has a known inhomogeneity. Vari-
ous authors, such as Saperstein (1972) and Haus (1974), have studied a related
Bernoulli model, with a sequence of binary outcomes. Loader (1991 j allows for
a non-fixed window size. Glaz and Naus (1983) have looked a t a scan statistic
searching for multipie clusters. For any oi these extensions: and deperid~ngon
t h e application, t h e scan statistic may or may not be conditioned on the total
number of points observed.
In this paper a spatial scan statistic is proposed, .An a t t e m p t is made to
treat the problem in a setting as general as possible, except t h a t the analysis is
always conditioned on t h e total number of observed points. T h e window may
take any predefined shape and the size of the window is allowed t o vary as it
scans t h e study region. T h e latter is very useful when we lack a prior knowledge
about the size of t h e area covered by the cluster. The method also allows for
a n arbitrary, but known, underlying intensity that governs the distributior? of
points under the null hypothesis. This can take many different forms depending
on t h e application. It is modeled as a measure p on a geographical space G.
When G is a line and p is a uniform measure on [ a )b], we obtain the traditional
one dimensional problem as a special case. With the Lebesgue measure on t h e
plane we have a homogeneous spatial Poisson process. Other possible rneasures
include t h e following:
1 T h e spatial clustering of trees is studied in forestry. A problem of potential
interest is to see if there are clusters of trees that are of a specific kind or that
have a certain characteristic, after having compensated for t h e uneven spatial
distribution of all trees. T h a t is, we want to know if the proportion of one kind
of tree is particularly high in some location. T h e measure of an area i~oulclin
this case be the total number of trees growing there.
2 In astronomy, there is an equivalent three-dimensional problem if we want
to detect clusters of a particular kind of star after compensating for the irreg-
ular spatial distribution of all stars.
3 Epidemiologists are interested in geograpliical clusters of disease. Here it
is necessary to compensate for the uneven density of the population as a whole.
When data is aggregated into census districts the measure will b e concentratcd
a t the central coordinates of those districts.
4 To find uranium deposits, airplanes measure Geiger counts as they flj !n
parallel lines over large areas. A high number of counts in a spec~ficarea
SPATIAL SCAN STATISTIC 1483

indicates a possib!e deposit. T h e measure would be uniform along t h e flying


lines and zero elsewhere.
5 A zoologist may study the spatial distribution of sea gull nests. In a n
archipelago, t h e nests will be iocated on islands. T h e appropriate measure
would b e uniform over Land and zero elsewhere.

6 if we are interested in space time clusters of a disease, then t h e measure


would still b e concentrated in the geographical dimension as in example 3, but
it would also be extended in a third dimension reflecting the population size
as it changes over time in each of t h e census districts,

7 it is not al&ays suff.,c:ent to adjust for an uneven population d l s t r ~ b u t i o n ,


whether IC be humans. trees, stars or something else We may also need to
take various confounders into account For example, in ep~demiologywe could
let the measure reflect the age standard~zedexpected incidence rate

In one dimension, the exact distribution of t h e test statistic is known only


in special cases. Much of the literature has been concerned with finding good
approximations. In higher dimensions the statistical theory becomes more
complex. Naus (1965b) obtained distributional bounds for a two dimensional
scan statistic on a square with uniform underlying measure, and with a rect-
angular window of fixed but arbitrary size. Loader (1991 j treated t h e same
problem but allowed for variable window size, Turnbull et al. (1990), using
the underlying measure as described in example (3) above, used a circular win-
dow with constant measure. Since the exact distribution of t h e test statistic
could not be determined, Monte Carlo simulation was used to perform the
hypothesis test. A11 the models above are special cases of what is outlined in
this paper. Two other special cases can be found in Kuildorff and Wagarwaila
(1995) and Hjalmars et a!. (1995) who apply t h e spatial scan statistic to d a t a
sets of leukemia in Upstate New York and Sweden respectively,
When the size of zhe scanning window is fixed, the test statistic is aiways
taken as the rnaxirriuni number of points in the window a t any given time. Sliith
a variable window size that is no longer possible, and instead the like!ihood
ratio test statistic is used (Loader, 1991).
In section 2 the Poisson and t h e Bernoulli models are described. T h e
likelihood ratio test statistic is then presented in Section 3, while Section 4
describes some of its theoretical properties. A discussion on computational
issues and a practical example are given in Sections 5 and 6 , respectiveiy.

2, POISSON A N D BERNOULLI MODELS

Let i": denote a spatla! point process where N ( A ) is the random number of
points in the set A C G. As the window moves over she study area it defines a
1484 KULLDORFF

collection 2 of zones Z C G. Interchangeably, Z ~villbe used to denote both


a subset of G and a set of parameters defining the zone.
For the Bernoulli model we consider only measures p such that p ( A ) is
an integer for ail subsets A C G . Each unit of measure correspond:7 to an
'entity' or 'individual' who could be in either one of two states, for examp!e
with or without some disease, or being of a certain species or not. Individuals
in one of these states are defined as points, and t h e location of those individuais
constitute the point process. In t h e model, there is exactly one zone Z C C:
such t h a t each individual within that zone has probability p of being a point,
while the probability for individuals outside the zone is q . The probability
for any one individual is independent of all the others. The null hypothesis

N(A) -
i5 No : p = q. T h e alternative hypothesis is H I : p > q , Z E 2. Under Iso,

- -
B i n ( p ( A ) , p ) for all sets A. Under HI IVJA) B i n ( p ( A ) ? p )for all
sets A C Z , a n d N ( A ) B i n ( p ( A ) , q ) for all sets A C ZC.

-
Under t h e Poisson model, points are generated by an inhomogeneous Pois-
son process. There is exactly one zone Z c G such that N ( A ) P o ( p p ( A n
+
2) qp(A n 2')) VA. T h e null hypothesis is Ho : p = q , whiie the alternative
-
hypothesis states t h a t H, : p > q , Z E 2. Under i&, I V ( A ) Po(pp(A)) VA.
Note that one of t h e parameters, Z, disappears under the null hypothesis. This
is unusual but not unheard of, see for example Davies (1977).
T h e best choice of window, and thereby t h e corresponding collection 2 of
zones, depends on t h e application. Some possibilities are:

1 AH circular subsets.
2 All circles centered a t any of several foci on a fixed grid, with a possible
upper limit on circle size. (Kulidorff and Nagarwalla, 1994)

3 Same as (2) but with a fixed circle size (Turnbull et, al., 1989)

4 All rectangles of a fixed size and shape, (Naus, 1965b)


5 When looking for space-time clusters we could use cylinders, scanning
circular geographical areas over variable time intervals.
T h e underlying purpose of specifying a precise alternative is not to exclude
other possibilities, On the contrary, the purpose is not only to detect clustering
in the form of t h a t specific alternative, but also to detect clusters in the form
of other similar alternatives. For example, if a test has good power for an
alternative with circular zones, then it will have fairly good power for squares
and many ellipses as well, but not necessarily for a narrow zone stretching from
one corner of a map to another.
How should we choose between the Bernoulli and the Poisson rnodels?
T h e choice does not matter much when the total number of points is snlall
compared t o p ( G ) , T h e Berrioulli and Poisson modelsthen closely approximate
each other. In other cases it will depend on the application. If we have binary
SPATIAL SCAN STATISTIC 1485

counts, such as two types of stars, then we should use t h e Bernou~l1model. If


we have counts relating to some continuous risk factor, as with a Geiger meter,
we should use the Poisson mode!.

3 . LIKELIHOOD

It is now time to derive the likelihood ratio test. It is slightly different for
the Bernoul!i and the Poisson models, and we start with the former. Let nz
denote t h e observed number of points in zone Z, and n~ the total number of
observed points,

3,P Bernoulli m o d e l

T h e likelihood function for the Bernoulli model is expressed as


(1 - q ) ( c i ! G ! - c i ( Z ) ) - ( n c - n z )
qz,P , 4 ) = pnz (1 - P)4 Z ) - n z q n c - n z
To detect the zone that is most likely t o b e a cluster, w~ find t h e zone 2
that maximizes the likelihood function. In other words, Z is the maximum
likelihood estimator of the parameter Z. We do this in two steps. First we
maximize t h e likelihood function conditioned on Z.
ci(Zi-nz
L ( Z ) def
= sup l ( Z ,p, q) =
P> 4

when a> ( ~ ~ , and


I ~otherwise
~ ~ ) ,

Next, we find the solution 2 = ( Z : L(Z) 2 L ( Z 1 )'dZ' E Z}. In Section 5 , the


numerical calculation of i? is discussed. T h e most likely cluster is of interest
iri itseif, but w e are also interested in making statistical inference. Let

The iikelihood ratio, A, car! be written as


1486 KULLDORFF

Note that t h e denominator depends only on the total number of points


nG and not on t h e spatial distribution of t h e points. The ratio X IS used as the
test statistic, a n d how t o obtain its distribution through Monte Carlo replicas
is described in Section 5 ,
So far we have discussed clusters in terms of an abnormaliy high number
of cases in some area. It could be t h a t we a r e instead interested in detecting
areas with an unusually low number of points. This can be accomplished by
simply changing t h e direction of t h e two inequaiity signs in equation I . T h e
same is true for t h e Poisson model.

3.2 P o i s s o n model

The likelihood function for the Poisson model is a little more complex. T h e
probability of n G number of points in t h e study area is

T h e density function f (z) of a specific point being observed a t location x is

As befare, t h e likelihood ratio is defined as in equation 2. We have

For the numerator we first take t h e supremum over all p and q for a fixed 2.
Equation 3 takes its maximum when p = n z / p ( Z ) and q = ( n - n~z ) / ( p ( @ )-
P ( Z ) ) ,so

15 &JG nZ,
otherwise
SPATIAL SCAN STATISTIC

T h e test statistic X of t h e likelihood ratio test can now be written as

if there is a t least one zone Z such that % ti ~ ~ ; ~ G


andG ~ 1~otherwise.
A = ~("~~,,
I() is t h e indicator function.

IES OF THE TEST STATISTIC

4.1 D e t e c t i o n versus i n f e r e n c e

Most statistical methods for cluster analysis of a spatial point process are
either descriptive in t h e sense t h a t they can detect the location of clust~ersbut
without any inference involved, or they do inference but without the ability to
detect t h e location of clusters. An important characteristic of t h e spatial scan
test is that it does both, so that when the null hypothesis is rejected we can
h a t e t h e specific area of the m a p that causes the rejection, To b e precise,
let x = (x;,i= 1, . , , n G ) denote the set of coordinates of the n~ points in
a d a t a set where 2 is t h e most likely cluster, and let x' = (s:, i = 1,..; n G }
b e a n alternative configuration with exactly t,he same number of points, T h e
following theorem holds for the Bernoulii and Poisson models.

T h e o r e m 1 If the null hypothesis is rejected under x then it is also rejected


under x': if s: = r i for all xi E 2.

In words, the theorem states that as long as the points within the zone
constituting the most likely cluster are located where they are, we would still
reject t h e null hypothesis no matter how the rest of the points were shuffled
around. For example, if t h e null hypothesis is rejected due to a disease cluster
in Seattie, it does not matter how we move around the cases on the U.S. east
coast, the null-hypothesis will stili be rejected. This might sound like a self
evident property, but i t does not hold for most other tests for spatial clustering
such as Knox (1964), Whitternore et al. (1987), Cuzick and Edwards (1990),
or Diggle and Chetwynd (1991). Those tests are hence not suitable if we want
t o know the iocation of clusters. Rather, they are geared towards answering
1488 KULLDORFF

the question of whether the phenomenon of clustering occurs over the study
region as a whole, such as if a disease is infectious or not, a question for which
the spatial scan statistic is not suitable.

ProoP: Let X(x) and X(xl) denote the values of t h e test statistic for t h e two
different d a t a sets. Since the two d a t a sets have t h e same number of points,
t h e distribution of X under the null hypothesis will b e the same, and it is hence
enough to show that Xjx') 2 X(x). In t h e Bernoulli case we have
( 2 ) ( z ) sup, L ( Z J x l )
X ( X ) = ---- 5 = X(x'j.
Lo - Lo Lo
T h e first inequality holds since x' has a t least as many points within zone
Z as x. For the Poisson model it is trivially true if X(x) = 1. When X(x) > 1
we have from equation 4 that
nc-nz
X(x) = sup "'( ne - nz
z I< ,421 \4G? - P(Z)

where K = ( n G I P ( G ) ) "T~h. e first inequa!ity holds since for any constants a , 8


and N , (cun)"(P(N- n ) N - n is an increasing function of n when a n > P ( N - n j .

4.2 P o w e r

T h e power of t h e one dimensional scan statistic has been studied by Wallen-


stein et al. (1993,1994) and Sahu ei, al. (1993) among others. For t h e spatial
scan statistic we cannot expect t o find a uniformly most powerful test, except
in the special case when there is only one zone in t h e alternative hypothe-
sis. Instead, we show that it fulfills a criterion making it what we call an
individually most powerful test.
T o define a n individually most powerful test we divide t h e composite alter-
native hypothesis into distinct subsets. T h e parameter space is partitioned into
a countable number of subsets { A j ) such that A, n A,! = @ for all j # j', and
such that UA, constitutes t h e whole of t h e alternative hypothesis. Likewise,
and using the same index, the critical region C, where t h e null hypothesis is re-
jected, is partitioned into disjoint subsets {C,)where UC, = G. Let C' = UCi
denote an alternative critical region with corresponding disjoint silbsets.
SPATIAL SCAN STATISTIC 1489

Definition 1 For a partzcular szgnzjcance level cu, a test zs indiv~duallymost


powerful wzth respect to a partzlzon { A , ) of the parameter space, and a partztzon
{C,} of the crztzcal regton, zf for each Ak there are no sets C' and (C:}such
that

This means that if we fix the critical region except for its subset C k as
indicated by statement 1, then t h e test is uniformly most powerful compared
to all remaining choices of t h e critical region and with respect to all parameters
( Z , p , q ) E Ak. This property is very important in any mukiple testing type of
a situation, where there is a composite alternative hypothesis and where we
wish to know which part of it causes the rejection. As mentioned before, the
scan statistic has the ability t o identify the zone responsible for rejecting the
null hypothesis, and if we fail to detect a real cluster, it is of little comfort
if the null hypothesis is rejected based on a n untrue cluster in another part
of the study area, In fact, t h a t is usually less desirable than just failing to
reject the null hypothesis. T h e problem resembles other multiple comparison
situations, where instead of testing multiple ciuster locations simu!taneously,
we might test several new agricultural crop varieties to see if any of them are
better than t h e one presently in use, or we might simultaneously test several
poiential risk factors for cancer.
If we are only concerned about rejection versus no rejection, without an
interest in the location of clusters, then t h e property of being a n individually
most powerful test is of little value. For such a problem the likelihood ratio
based spatial scan statistic would be a suboptimal choice.
Now, let Az = ( ( Z , p , q ) : p > q ) and A. = { ( Z , p , q ) : p = q ) . Let Cz
denote the intersection of t h e critical region C and the subset of the sample
space in which 2 is the most likely cluster.

Theorem 2 The test based on X forms an individually most powerful test ,with
respect to the partitions { A z } and { C z ) This holds JOT the Bernouili as well
as the Poisson model.

roof: We show that if statements (1) and (2) in the definition are true, then
(3) cannot hold. For an arbitrary Z , let D- = {w : w E Cz,w $ Ch) and
D+ = {w : w E C>,w $ C z ) Let

M = sup L(Z, P , qlw)


WED+ L(ff~lw)
KULLDORFF

For the Poisson model, it follows from equation 4 that

A similar argument holds for the Bernoulli case, based on equation 1. Now,
for any (2,P:q) 6 A z ,

= M ( P ( u E D+ \No)- P ( u E D- lHo))
= M ( P ( w E C&IHo)- P ( w E CzIHo))
= M ( P ( w E C'lIIo) - P ( w E CIHo)) = 0
The second to last equality holds since C, = 6;for all + Z according to
statement 1 in the definition.

5. COMPUTATIONS AND MONTE CARLO

In order to find the value of the test statistic, we need a way to calculate the
likelihood ratio as it is maximized over the collection of zones in the alternative
hypothesis. This might seem like a daunting task since the number of zones
could easily be infinite. Two properties allows us to reduce it to a finite prob-
lem. The number of observed points is always finite and for a fixed number of
points the likelihood decreases as the measure of the moving window increases.
SPATIAL SCAN STATISTIC 1491

Consider t h e scanning window of example (2) in Section 2. If we let the circle


size increase for a fixed foci, we only need to recalculate t h e likelihood when-
ever a new point enters the circle. Since there is a finite number of points,
t h e number of times we need t o compute the iikelihood for each foci is finite,
and since the number of foci is also finite, the total number of calculations is
finite. Assuming a Bernoulli model or a homogeneous Poisson model, similar
arguments hold for the other four examples given in Section 2.
Once t h e talue of the test statistic has been ca!culated! it is easy t o do
t h e inference. We cannot expect t o find the distribution of t h e test statistic in
closed analytical form. Instead we rely on Monte Carlo simulation. Originally
proposed by Dwass (1957), this technique was first used in the context of
a scan statistic by Turnbuii et al, (1990). Because we know- the underlying
measure p , we can obtain replications of the data set generated under t h e null
hypothesis when we condition on the total number of points nc. With 9999
such replications, the test is significant a t the 5 percent level if t h e value of
t h e test statistic for the real d a t a set is among the 500 highest values of t h e
test statistic coming from t h e replications,
In addition to the most likely cluster we might also want to look a t sec-
ondary clusters with high likelihood values. Some of these will be related t o t h e
most likely cluster in the sense that they contain about the same set of points
with their respective zones overlapping each other. Such secondary clusters
are usually of little interest, although they serve t o remind us of the fact t h a t
t h e obtained location and size of detected clusters are only estimates,
More interesting types of secondary clusters are those located in another
part of the study region. We define these to be clusters t h a t d o not overlap
with a more likely one. It is often of interest to report these clusters along
with the most likely one.
For inference, we may take a secondary cluster and compare and rank its
likelihood value with the maximum likelihood ratio from the Monte Carlo repli-
cat,ions. Any secondary cluster that ranks below the significance levei would in
itself have caused t h e rejection of the null hypothesis.even if there had been no
other more likely cluster in t h e d a t a set. This gives us an inferential procedure
for secondary clusters as well, but since we are comparing a secondary cluster
from the d a t a set with t h e most likely clusters from the replications such a
test is somewhat conservative,

We illustrate the models using d a t a on sudden infant death syndrome (§IDS)


in North Carolina. The d a t a were compiled by M . Syrnrnons, D.Atkinson,
1492 KULLDORFF

TABLE I
The spatial scan statistic applied to sudden infant death syndrome in North
Carolina, adjusted for the uneven geographical distribution of births. Zones
refer t o Figure 1 and incidence is the number of deaths per 1000 live births.
Zone # SIDs # Births Incidence p-value
z nz P(Z)
Bernoulli A 139 36376 3.8 0.0001
model B 59 14388 4.1 0.0005
Poisson A 139 36376 3.8 0.0001
model B 59 14388 4.1 0.0003

FIG 1: Two significant clusters of sudden infant death syndrome in North


Carolina, adjusted for the uneven geographical distribution of live births,

and the State Center for Health Statistics of the North Carolina Department
of Human Resources. They have previously been analysed by Gressie and
Chan (1989) among others.
For each of the 100 counties in North Carolina, the data comprise the
total number of live births as well as the number of sudden infant deaths
(SIDs) for the years 1974-1984. The number of live births in the counties
ranges from 567 to 52345. The location of county seats were used as the
geographical coordinates The total number of SIDs are 1503 out of 753354
live births. This gives a state wide incidence rate of 2.0 per 1000. The total
SPATIAL SCAN STATISTIC 1493

number of births in each county, as well as the statewide number of SIDs, are
also stratified into whites and non-whites. T h e complete data are presented
by Cressie and Chan (1989).
The measure a t t h e coordinate point of each county is taken as t h e number
of live births in t h a t county. T h e measure is zero elsewhere. This is as in
example (3) of Section I . As zones for the window we use ail circles that are
centered at one of t h e county coordinate points a n d that include at most half
of t,he total population, This follows example (2) of Section 2,
Note that the zones are circular only with respect to t h e aggregated data,
As we draw the circles around one county seat, other counties will either be
completely part of a zone or else not a t ail, depending on whether its county
seat falls within t h e circle or not. Hence, we get a compact but irregular shaped
zone following the county boundaries. This can b e seen in Figures 1 and 2.

6.1 Bernoulli Model

The Bernoulli model is the most natural one t o use for this d a t a set. We
have birth counts, and each birth can correspond t o a t most one sudden infant
death. Table 1 summarizes the results of the analysis.
The most likely cluster, A, consists of t h e counties of Bladen, Coiurnbus,
Hoke, Robeson, and Scotland, in the southern part of the state, T h e rank is
1/10000, i.e. a p value of 0.0001.
There is one other significant cluster, B,composed of Halifax, Hertford,
and Northampton counties in the northeast. W i t h a rank of 5/10000 it has a
p value of 0.0005. This latter test is conservative, because we are comparing a
secondary cluster in the d a t a set with t h e most likely clusters from the replicas,

6.2 P o i s s o n Model

Since we are dealing with a rare disease, the Poisson model should give a close
approximation to t h e Bernoulli model. T h a t t h e results are indeed similar for
this data set can b e seen in Table 1.
The Poisson approximation is especially useful when we have covariates
that we wish t o include in the analysis, For SIDS, one possible covariate is
race (Cressie and Chan, !989), which may be related to SIDS through unob-
served variables such as quality of housing or access to health care. The racial
distribution differs widely among the counties in North Carolina, and could
possibly explain t h e previously detected clusters, We may want to see if there
are still geographic clusters after adjusting for race. This could lead us to other
spatially related risk factors that are otherwise hidden.
The overall incidence of SIDS is 1,512 for white children, and 2.970 for
non-white children (Cressie and Chan, 1989). T h e underlying measure at each
county coordinate x can now be defined as
1494 KULLDORFF

TABLE I1
T h e spatial scan statistic applied to sudden infant death syndrome in North
Carolina, adjusted for race a n d the uneven geographical distribution of live
births. Zones refer t o Figure 2.
Zone j+ SIDs El# SIDs] # Births p-value
z 122 P(Z)
Poisson A 139 94.5 36376 0.0036
model C 191 140.8 86780 0.0060

FIG 2: Two significant clusters of sudden infant death syndrome in North


Carolina, adjusted for race and the uneven geographical distribution of live
births.

~ ( x=) white births x 1.512 + nonwhite births x 2.970


which is proportional to t h e expected number of SIDs under the null hypoth-
esis. Note t h a t we do not need t o know the number of SIDs In each county
subdivided by race. T h e result of the likelihood ratio test is given in Table 2.
Comparing this analysis to the one where race was not incorporated, we
observe three things.

1 With a rank of 36/10000 (p = 0.0036), the southern cluster A remains


significant, and cannot b e explained solely by the high proportion of non-white
births in t h a t area.

2 Cluster B in the nosthea.;t is no longer significant, with a rank of 3336/1OOO1)


(p = 0.3336).
SPATIAL SCAN STATISTIC 1495

3 A previously 'hidden' cluster C emerges in t h e west, with a rank of 60/10000


( p = 0.006). It consists of the foilowing counties: Avery, Buncombe, Burke,
Caidweil, Cleveland, Haywood, Henderson, Jackson, Lincoln, Macon, Madi-
son, Mitchell, Polk, Rutherford, Swain, Transylvania, and Yancey.

ACKNOWLEDGE

V7aluabie discussions with Laurence Freedman and Lisa McShane are grate-
fully acknowledged. This research was partly funded by the Swedish Research
Council in the Humanities and Social Sciences.

BIBLIOGRAPHY

Cressie N and Chan NN,(1989). Spatial mode!ing of regional variables, Jour-


nal of the American Statistical Association 84, 393-401.

Cuzick J a n d Edwards R, (1990). Spatial clustering for inhomogeneous popu-


lations. Journal of the Royal Statistical Society Ser, B, 52, 73-104.

Davles RB, (1977). Hypothesis testing when a nuisance parameter is present


only lander t h e alternative. Biometn'ka 64, 249-254.

Diggle PJ and Chetwynd AG, (1991). Second-order analysis of spatial cluster-


ing for inhomogeneous populations. Biometrzcs 47, 1155-1163.

Dwass M , (1957). Modified randomization tests for nonparametric hypotheses.


Annals of Mathematical Statzstics 28, '181-187.

Glaz J and Naus 3, (1983). Multiple clusters on t h e line. Communicatzons in


Statistics: Theory and Methods 12, 1961-1986.

Rjalmars U, Kulldorff M, Gustafsson G and Nagarwalla N, (1996). Chiid-


hood leukemia in Sweden: Using GI§ and a spatial scan statistic for cluster
detection. Statistics in Medicine 15, 707-715.
Knox G, (1964). T h e detection of space-time interactions, Applied Statzstzcs
13, 25-29.

Kulldorff M a n d Nagarwalla N,(1995). Spatial disease clusters: Detection and


inference. Statistics in .Medicine 14, 799-810.

Loader C R , (1991). Large-deviation approximations to the distribution of scan


statistics. Advances zn Applzed Probabzlzty 23. 751-771.
1496 KULLDORFF

Naus Jf, (1965a). The distribution of the size of t h e maximum cluster of points
on t h e line. Journal of the American Statastical Association 60, 532-538.

Naus J I , (1965b). Clustering of random points in two dimensions. Biometrika


52, 263-267.
Naus J, (1974). Probabilities for a generalized birthday problem. Journal of
the American Statistical Association 69; 810-815.

Sahu SK, Bendel RB and Sison CP, (1993). Effect of relative risk and cluster
configuration on the power of t h e one-dimensional scan statistic. Statistics zn
Medicine 12, 1853-1865.

Saperstein B, (1972). T h e generalized birthday problem. Journal o f t h e Amer-


ican Statistical Association 67, 425-428.

Turnbull B W , Iwano EJ, Burnett WS, Howe HL and Clark LC, (1990). Mon-
itoring for clusters of disease: .4pplication to leukemia incidence in upstate
New York. American Journal of Epidemiology 132, S136-S143.

Wallenstein S, Weinberg CR and Gould M, (1989). Testing for a pulse in


seasonal event data. Biometrzcs 45, 817-830.

Wallenstein S, Naus J and Glaz J , (1993). Power of the scan statistic for
detection of clustering. Statistics in Medicine 12, 1829-1843.

Wallenstein S, Naus J and Glaz J , (1994). Power of the scan statistic in


detecting a changed segment in a Bernoulli sequence. Biometrika 81.

Weinstock MA, (1981). A generalized scan statistic test for t h e detection of


clusters. International Journal of Epidemiology 10, 289-293.

Whitternore AS, Friend N, Brown BW and Holly EA, (1987). A test to detect
clusters of disease. Biometrika 74, 631-635, and 75, 396.

Received September, 1996; Revised December, 1996

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy