Bivand 2006 Geographical - Analysis
Bivand 2006 Geographical - Analysis
This article reports on work in progress on the implementation of functions for spatial
statistical analysis, in particular of lattice/area data in the R language environment. The
underlying spatial weights matrix classes, as well as methods for deriving them from
data from commonly used geographical information systems are presented, handled
using other contributed R packages. Since the initial release of some functions in 2001,
and the release of the spdep package in 2002, experience has been gained in the use of
various functions. The topics covered are the ingestion of positional data, exploratory
data analysis of positional, attribute, and neighborhood data, and hypothesis testing of
autocorrelation for univariate data. It also provides information about community
building in using R for analyzing spatial data.
Introduction
Changes in scientific computing applied to data analysis are less rapid than in many
consumer fields. Underlying technologies are well known, and have been available
for decades. The visible changes are much more in the burgeoning of online and
virtual scientific communities and in the availability of cross-platform applications,
giving many more researchers, scientists, and students access to implementations of
methods that, until recently, only appeared in journal articles. In this context,
Anselin, Florax, and Rey (2004), Pebesma (2004), and Grunsky (2002) have
recently drawn attention to the R data analysis and statistical programming
environment as an emerging area of promise for the social and environmental
sciences, and Waller and Gotway (2004) provide a selected R code to support their
book on applied spatial statistics for public health data.
This article was first prepared for the CSISS specialist meeting on spatial data analysis
software tools, Santa Barbara, CA, May 10–11, 2002.
Correspondence: Roger Bivand, Economic Geography Section, Department of Economics,
Norwegian School of Economics and Business Administration, Helleveien 30, N-5045
Bergen, Norway
e-mail: Roger.Bivand@nhh.no
24
Roger Bivand Implementing Spatial Data Analysis Software Tools in R
Table 1 Packages for Spatial Data Analysis Linked from the Web Site. Packages are on
CRAN, Other Packages are on Sourceforge
Functionality Packages
Read shapefiles maptools, shapefiles
Read other vector formats RArcInfo, Rmap
Read raster formats rgdal
GIS integration GRASS
Draw maps maptools, RArcInfo, Rmap, maps, sp
Project maps mapproj, Rmap, spproj
Spatial data classes sp
Point patterns spatial, spatstat, splancs
Geostatistics spatial, gstat, sgeostat, geoR, geoRglm, fields
Areal/lattice spdep, DCluster, spgwr
dated survey was made by Bivand and Gebhardt (2000), reflecting the situation at
that time. R as a whole is experiencing rapid growth in the number of contributed
packages, and because it can be difficult to obtain an overview of relevant software,
authors of spatial statistics software agreed to set up a Web site. This has been in
operation since mid-2003, has an associated mailing list, and currently can be
reached by searching Google with the string, ‘‘R spatial,’’ or from an entry on the
navigation bar on the left of the main R Web site (http://www.r-project.org/Rgeo).
Rather than duplicate this information, summarized in Table 1, the next section will
be concerned with highlighting features of the R implementation of S that are of
potential value for implementing data analysis functions.
25
Geographical Analysis
26
Roger Bivand Implementing Spatial Data Analysis Software Tools in R
of course be read within the context of the user interface, and functions may be
edited, saved to user files, and sourced back into the program.
In some circumstances, it is desirable to move the internal computations of a
function to a compiled language, although, as we will see, this is not an absolute
requirement because the built-in internal functions themselves interface highly
optimized and heavily debugged compiled code. In this case, a source directory
will also be present, with C, C11, or Fortran 77 source files. Here, it is sufficient to
mention the possibility of a dynamically loading user-compiled shared object code,
and the usefulness of R header files in C, in particular providing direct access to R
functions. R data objects may be readily passed to and from user functions, and R
memory allocation mechanisms may be called from such user-compiled functions.
If required, the R code can even be executed in such user-compiled functions.
For instance, R provides a factor object definition for categorical variables,
with a character vector of level labels and an integer vector of observation values
pointing to the vector of levels. In the GRASS/R compiled interface using GRASS
library functions (Bivand 2000), moving categorical raster data between R and
GRASS is accomplished fast and fully preserving labels by operating on R factor
objects in C functions. Within R, functions are written to use object classes, for
example, the factor class, to test for object suitability, or in many modeling
situations to convert factors into appropriate dummy variables. The use of classes
in R for spatial data analysis is discussed in more depth by Bivand (2002).
Another example is the widespread use of the formula object abstraction.
Formulae are written in the same way for models of many kinds. The left- and right-
hand sides are separated by the tilde ( ), and the right-hand side may express
interactions and conditioning. If a factor is included on the right-hand side, the
model matrix will contain the necessary dummy variables. Formulae are used from
multiple box plots, through cross-tabulations, lattice (Trellis) graphics using con-
ditioning to explore higher-dimensional relationships, to all forms of model-fitting
functions. The formula abstraction was introduced to the S language in Chambers
and Hastie (1992), and is used in the spatial statistics packages as well; see
Pebesma (2004) for examples.
Users and developers can create new classes for which the class method
dispatch mechanism can be invoked. The summary() function appears to be a
single function, but in fact calls appropriate summary functions based on the class
of the first argument; the same applies to the plot() and print() functions. The
extent to which spatial data analysis packages use class- and method-based
functions varies, mostly depending on the age of the code and on the potential
value of such revisions; status as of early 2003 is reviewed in Bivand (2003). At
present, some do not use classes—assuming specific list structures for spatial data,
others use old-style classes specific to the package (also known as S3 classes,
described in Chambers and Hastie 1992, appendix A, pp. 455–80), and a few have
begun to use new-style classes (S4 classes, described in Chambers 1998, chapters
7–8; Venables and Ripley 2002, chapter 5). New-style classes mandate the
27
Geographical Analysis
structure of a class by design, so that content creep is not permitted, and functions
using the classes can be written cleanly, because the structure of the object is
known from its published definition.
Pebesma (2004) presents work in progress on designing S4 classes for spatial
data, which is hosted on SourceForge and can be accessed from the R spatial
projects Web site; publication of the foundation classes as package sp on CRAN has
now occurred. These classes will provide a set of containers for data import and
export on the one hand, and a standardized set of objects for analysis functions on
the other. It is not clear how much GIS functionality should be available within R
natively, so the availability of data objects that can be moved readily to external
GIS for processing and modification is necessary. On the other hand, GIS (perhaps
with the exception of GRASS) do not let the analyst get as close to the data as is
possible in R.
As has already been indicated indirectly, much of the added value of the R
project extends beyond the standard functionality of the language and program-
ming environment. The archive network is such an extension, as are the package-
checking mechanisms (in the tools package). Together with the test suites, they
have been developed to facilitate quality control of the core system rather than
user-contributed packages, but because the same standards are used, the con-
tributed packages also benefit in terms of organization and coherence. Another
useful spillover from package contributors to the core team is that all contributed
packages on CRAN are verified nightly against the code bases of the released
version of R, the patched version, and the development version—which will
become the next full release. This means that the core team can track the effects
of changes in the compute engine on a large body of code that has run without
evident error at some previous point in time. Of course, only use shows how close
the implementations are to the underlying formal and/or numerical methods, as
with other software, so these checks do not try to validate the procedures involved.
The underlying numerical functions and random number generators are, however,
well proven.
The tools package also contains functions to support literate statistical analysis,
written to contain both documentation and the specification of procedures actually
run to generate results (text, tabular, and graphical). The Sweave() function runs
on a file containing, in the LaTeX and R case, a marked-up mixture of LaTeX and
R code, producing a file in LaTeX including verbatim code chunks, the output of
the commands issued, and files with tables or graphics in suitable formats for
inclusion. Leisch and Rossini (2003) argue that, with increasing dependence on the
settings needed to replicate results, for instance, the specific random number
generator and seed used, such literate analysis techniques are needed for repro-
ducible research. This infrastructure has been extended and implemented as a less
terse way of documenting packages contributed to R, in documents known as
vignettes, and used extensively in the Bioconductor project (Gentleman et al.
2005). Vignettes also contain code that is verified on the same basis as help page
28
Roger Bivand Implementing Spatial Data Analysis Software Tools in R
examples. This article has been written using the literate statistical analysis support
available in R.
On the graphics side, R does not provide dynamic linked visualization, as the
graphics model is based on drawing on one of a number of graphics devices. R
does provide the more important tools for graphical data analysis, although no
mapping is present as yet in general terms. The R spatial analysis Web site referred
to above covers packages on CRAN for mapping using the legacy S format,
shapefiles, and ArcInfo binary files, but each of these packages uses separate
mechanisms and object representations at present. Panelled (Trellis) graphics are
provided in the lattice package, and is now mature and makes available many of
the tools needed for analyzing high-dimensional data in a reproducible fashion.
This can be contrasted with dynamically linked visualization, which can be difficult
to render as hard copy. Graphics are extensible at the user level in many ways, but
are more for viewing and command-line interaction than for pointer-driven
interaction.
Implementation examples
In this section, we will use some of the implementation details of spdep to
exemplify the internal workings of an R package; this package is a workbench
for exploring alternative implementation issues, and bug reports, contributions, and
criticisms are very welcome. The illustrations will draw on canonical data sets for
areal or lattice data, many of which are included in the package to provide clear
comparisons with the underlying literature. Using the reproducible research format,
R commands are preceded by the main prompt 4, or the continuation prompt 1,
and are expressed in italic; output is not preceded by a prompt and is set in normal
style.
Most of the world as seen by data analysis software still resembles a flat table,
and the most characteristic object class in R at first seems to be the data frame. But
a data frame, a rectangular flat table with row and column names, and made up of
columns of types including numeric, integer, factor, character, and logical, and
with other attributes, is in fact a list. Lists are very flexible containers for data of all
kinds, including the output of functions. They form the workhorse for many spatial
data objects, such as, for example, polygons read into R from shapefiles using the
maptools package.
We will be using the North Carolina Sudden Infant Death Syndrome (SIDS)
data set, which was presented first in Symons, Grimson, and Yuan (1983), analyzed
with reference to the spatial nature of the data in Cressie and Read (1985),
expanded in Cressie and Read (1989) and Cressie and Chan (1989), and used in
detail in Cressie (1993). It is for the 100 counties of North Carolina, and includes
counts of numbers of live births (also non-White live births) and numbers of sudden
infant deaths, for the 1974–1978 and 1979–1984 periods. In Cressie and Read
(1985), a listing of county neighbors based on shared boundaries (contiguity) is
29
Geographical Analysis
given, and in Cressie and Chan (1989), and in Cressie (1993, pp. 386–89), a
different listing is given based on the criterion of distance between county seats,
with a cutoff at 30 miles. The county seat location coordinates are given in miles in
a local (unknown) coordinate reference system. The data are also used to exemplify
a range of functions in the S-PLUS spatial statistics module user’s manual (Kaluzny
et al. 1996).
4library(spdep)
The imported object in R has class Map, and is a list with two components,
Shapes, which is a list of shapes, and att.data, which is a data frame with
tabular data, one row for each shape in Shapes. We convert the geometry format
of the Map object to that of a polylist object, which will be easier to handle. The
object is also put through sanity checks if the raw argument is set to FALSE; many
shapefiles in the wild do not conform to the specification that hole boundary
coordinates be listed anti-clockwise, information that is needed for plotting.
The polylist is an S3 class with additional attributes, and is a list of
polygons (possibly multiple polygons), each with further attributes. An S4 class
definition built on polylist is included in sp, and is used in the forthcoming
spgwr package for geographically weighted regression. Using the plot() function
for polylist objects from maptools, we can display the polygon boundaries, shown
as the background in Fig. 1.
30
Roger Bivand Implementing Spatial Data Analysis Software Tools in R
37
36
35
33
–84 –82 –80 –78 –76
Figure 1. Overplotting shapefile boundaries with 30 mile neighbor relations as a graph; two
extra links are shown from replicating the neighbor scheme with the original input data.
While point pattern data can exist happily within flat tables, as indeed can
point locations with attributes as used in the analysis of continuous surfaces, as well
as time series, the specific structuring data object of lattice data analysis describing
the neighborhood relations between observations cannot. When the weights matrix
is represented as a matrix, it is very sparse, and the analysis of moderate-to-larger
data sets may be impeded. This provides one reason for supplementing the existing
S-PLUS spatial statistics module for lattice data. In the S-PLUS case, weights are
represented sparsely in a data frame, with four columns: weights matrix element
row index, column index, value, and (optionally) the order of the weights matrix
when multiple matrices are stored in the same data frame. While this provides a
direct route to sparse matrix functions for finding the Jacobian, and for matrix
multiplication, it makes the retrieval of neighborhood relations awkward for other
needs. Here, it was found to be simpler to create a hierarchy of class objects leading
to the same target, but also open to many operations at earlier stages.
The basic building block is, as with the polylist class, the list, with each list
element being an integer vector containing the indices of its neighbors in the
present definition. The list is of class nb, and has a character region ID attribute, to
provide a mapping between the region names and indices. These lists may be read
in as legacy GAL-format files, enhanced GAL and GWT format files as written by
GeoDa, and Matlab sparse matrices saved as text files, or produced by collapsing
input weights matrices to lists. They may be generated from lists of polygon
perimeter coordinates or from matrices of point coordinates representing the
regions under analysis. Nicholas Levin-Koh contributed a number of useful
graph-derived functions, so that there is now quite a choice with regard to creating
lists of neighbors; further details may be found in Bivand and Portnov (2004). Class
nb has print(), summary(), and plot() functions.
31
Geographical Analysis
We will now access the data set reproduced from Cressie and collaborators,
included in spdep, and add the neighbor relationships used in Cressie and Chan
(1989), stored as nb object ncCC89.nb, to the background map as a graph, shown
in Fig. 1. The criterion used for two counties to be considered neighbors was that
their county seats should be within 30 miles of each other, which is not fulfilled for
two Atlantic coast counties. Columns c("lon," "lat") in data frame nc.sids
contain the geographical coordinates of the county seats (vector and matrix indices
are given in single square brackets, and list element indices in double square
brackets). Note that the plot() function is being used for class-based methods
dispatch, with one function being used to display objects of the polylist class
like ncpolys and a different function to display the neighbors list ncCC89.nb
object of class nb. One of the reasons why it is sometimes difficult to replicate
classic results is that it can be difficult to regenerate neighbor relations even given
area boundaries or point coordinates. This is illustrated here by running dnear-
neigh() on the local projected coordinates of the county seats in columns
c("east," "north") (Cressie 1993, pp. 386–89); two links are added to the
published list of neighbors, illustrated by plotting the neighbor list resulting from
computing the difference between the two lists.
4 data (nc.sids)
4 new30 o- dnearneigh(as.matrix(nc.sids[, c(‘‘east’’, ‘‘north’’)]),
1 d1 5 0, d2 5 30, row.names 5 nc.sids$CNTY.ID)
A print of the neighbor object shows that it is a neighbor list object, with a very
sparse structure—if displayed as a matrix, only 3.94% of cells would be filled
(simply typing an object name at the interactive command prompt prints it). We
should also note that this neighbor criterion generates two counties with no
neighbors, Dare and Hyde, whose county seats were more than 30 miles from
their nearest neighbors. The card() function returns the cardinality of the
neighbor sets.
4 ncCC89.nb
Neighbour list object:
Number of regions: 100
Number of nonzero links: 394
Percentage nonzero weights: 3.94
Average number of links: 3.94
2 regions with no links:
32
Roger Bivand Implementing Spatial Data Analysis Software Tools in R
2000 2099
4 as.character(nc.sids.shp$att.data$NAME)[card(ncCC89.nb) 55 0]
[1] ‘‘Dare’’ ‘‘Hyde’’
Probability mapping
Rather than reviewing functions for measuring and modeling spatial dependence in
the spdep package, we will focus on support for the analysis of rates data, starting
with problems in probability mapping. Typically, we have counts of the incidence
by spatial unit, associated with counts of populations at risk. The task is then to try
to establish whether any spatial units seem to be characterized by higher or lower
counts of cases than might have been expected in general terms (Bailey and Gatrell
1995; Waller and Gotway 2004).
An early approach by Choynowski (1959), described by Cressie and Read
(1985) and Bailey and Gatrell (1995), assumes, given that the true rate for the
spatial units is small, that as the population at risk increases to infinity, the
spatial unit case counts are Poisson, with the mean value equal to the population
at risk times the rate for the study area as a whole. The probmap() function
returns raw (crude) rates, expected counts (assuming a constant rate across
the study area), relative risks, and Poisson probability map values calculated
using the standard cumulative distribution function ppois(). Counties with
observed counts lower than expected, based on population size, have values in
the lower tail, and those with observed counts higher than expected have
values in the upper tail, as Fig. 2 shows. Here, we will use the gray sequential
palette, available in R in the RColorBrewer package (http://colorbrewer.org), and
the base findInterval() function to assign colors to probability map values.
37
36
35
Figure 2. Probability map of North Carolina counties, SIDS cases 1974–78, reproducing
Kaluzny et al. (1996, p. 67, Figure 3.28).
33
Geographical Analysis
20
15
10
0
0.0 0.2 0.4 0.6 0.8 1.0
4 library(RColorBrewer)
4 pmap o– probmap(nc.sids$SID74, nc.sids$BIR74)
4 brks o– c(0, 0.001, 0.01, 0.025, 0.05, 0.95, 0.975, 0.99, 0.999,
1 1)
4 cols o– brewer.pal(length(brks) – 1, ‘‘Greys’’)
Marilia Carvalho (personal communication) and Virgilio Gómez Rubio and co-
authors (Gómez-Rubio, Ferrándiz, and López 2005) have pointed out the unusual
shape of the distribution of the Poisson probability values (Fig. 3), echoing the
doubts about probability mapping voiced by Cressie (1993, p. 392): ‘‘an extreme
value . . . may be more due to its lack of fit to the Poisson model than to its deviation
from the constant rate assumption’’; see also Waller and Gotway (2004, p. 96): ‘‘. . .
it may not be possible to distinguish deviations from the constant risk assumption
from lack of fit of the Poisson distribution.’’ There are many more high values than
one would have expected, suggesting perhaps overdispersion, that is, that the ratio
of the mean and variance is larger than unity.
One ad hoc way to assess the impact of the possible failure of our assumption
that the counts follow the Poisson distribution is to estimate the dispersion by fitting
a general linear model using the quasi-Poisson family and log link to the observed
counts including only the intercept (null model) and offset by the expected
counts (suggested by Marilia Carvalho and associates). The dispersion is equal
to 2.2784, much greater than unity; this can be checked because R provides
access to a wide range of model-fitting functions. Likelihood ratio and Dean tests
also used by Gómez-Rubio, Ferrándiz, and López (2005) confirm the presence of
34
Roger Bivand Implementing Spatial Data Analysis Software Tools in R
37
36
35
under 2 3 – 3.5
34 2 – 2.5 over 3.5
2.5 – 3
33
–84 –82 –80 –78 –76
Figure 4. Local Empirical Bayes estimates for SIDS rates per 1000 using the 30-mile county
seat neighbors list; no local estimate is available for the two counties with no neighbors,
marked by stars.
35
Geographical Analysis
37
36
35
34
33
–84 –82 –80 –78 –76
Figure 5. Geographical Analysis Machine results for circle radius 30 km, and grid step
10 km, run on UTM county seat coordinates, reprojected onto geographical coordinates
for display, gray symbols, 0.002 a 0.001; black symbols, ao0.001.
4 library(DCluster)
Loading required package: boot
As its first argument, the opgam() function adopts a data frame of observed
and expected counts, and requires projected coordinates of points representing the
polygons within which data were collected, here Universal Transverse Mercator
(UTM) zone 18 coordinated of the country seats. Fig. 5 shows the results of running
the function using the North Carolina SIDS data on a 10 km grid and a cut-off radius
of 30 km, for test results of ao0.002. The similarities between Figs. 4 and 5 suggest
that in 1974–78, more SIDS cases were being observed in the highlighted counties
than should have been expected for their populations at risk.
36
Roger Bivand Implementing Spatial Data Analysis Software Tools in R
Figure 6. Moran Poisson-Gamma parametric bootstrap and the Empirical Bayes Moran
Monte Carlo simulation results; observed statistics are marked by dashed vertical lines.
Adjustments to Moran’s I
The DCluster package also contains a number of interesting innovations to standard
measures of spatial autocorrelation, such as Moran’s I, permitting the statistic to be
tested not only under assumptions of normality or randomization, or using Monte
Carlo tests—which are equivalent to a permutation bootstrap—but using para-
metric bootstrapping for three models of the data: multinomial, Poisson, and
Poisson-Gamma. Finally, we can calculate the adjusted Moran’s I statistic (Assun-
ção and Reis 1999), for the counts and populations at risk, assessing its significance
by simulation, implemented as function EBImoran.mc() in spdep using Monte
Carlo simulation, which is equivalent to a permutation bootstrap. For the purposes
of reproducible research, the random number generator and its seed are fixed. Fig.
6 shows histograms of the outcomes of the Moran Poisson-Gamma parametric
bootstrap and the Empirical Bayes Moran Monte Carlo simulation, both for 999
replications.
The test results leave no doubt that there is considerable spatial dependence in
the pattern of rates across the state. This result does not, however, tell us anything
about clustering, more that neighbors have similar values, perhaps reflecting the
nonstationarity reported by Cressie and Read (1985). At present, methods such as
those proposed by Lawson, Browne, and Vidal Rodeiro (2003), Haining (2003), and
Waller and Gotway (2004) involving Markov Chain Monte Carlo (MCMC) and
multilevel modeling may be preferred, although here R may be used for analysis
and/or to pre- and postprocess data.
37
Geographical Analysis
Acknowledgements
I thank the anonymous referees for their comments. I also thank the contributors to
spdep, including, Luc Anselin, Andrew Bernat, Marilia Carvalho, Stéphane Dray,
Rein Halbersma, Nicholas Lewin-Koh, Hisaji Ono, Michael Tiefelsdorf, and Danlin
Yu, for their patience and encouragement, and also many e-mail correspondents for
their fruitful questions and suggestions. This work was funded in part by EU
Contract Number Q5RS 2000 30183.
References
Anselin, L., R. J. G. M. Florax, and S. J. Rey. (2004). ‘‘Econometrics for Spatial Models; Recent
Advances.’’ In Advances in Spatial Econometrics: Methodology, Tools, Applications,
1–25, edited by L. Anselin, R. J. G. M. Florax, and S. J. Rey. Berlin: Springer.
38
Roger Bivand Implementing Spatial Data Analysis Software Tools in R
Anselin, L., I. Syabri, and Y. Kho. (2006). ‘‘GeoDa: An Introduction to Spatial Data Analysis.’’
Geographical Analysis 38, 5–22.
Assunção, R. M., and E. A. Reis. (1999). ‘‘A New Proposal to Adjust Moran’s I for Population
Density.’’ Statistics in Medicine 18, 2147–62.
Bailey, T. C., and A. C. Gatrell. (1995). Interactive Spatial Data Analysis. Harlow, UK:
Longman.
Becker, R. A., J. M. Chambers, and A. A. Wilks. (1988). The New S Language. London:
Chapman & Hall.
Bivand, R. S. (2000). ‘‘Using the R Statistical Data Analysis Language on GRASS 5.0 GIS Data
Base Files.’’ Computers & Geosciences 26, 1043–52.
Bivand, R. S. (2001). ‘‘More on Spatial Data.’’ R News 1(3), 13–17.
Bivand, R. S. (2002). ‘‘Spatial Econometrics Functions in R: Classes and Methods.’’ Journal of
Geographical Systems 4, 405–21.
Bivand, R. S. (2003). ‘‘Approaches to Classes for Spatial Data in R.’’ In Proceedings of the 3rd
International Workshop on Distributed Statistical Computing, Vienna, Austria, edited by
K. Hornik, F. Leisch, and A. Zeilis (http://www.ci.tuwien.ac.at/Conferences/DSC-2003/
Proceedings/Bivand.pdf).
Bivand, R. S., and A. Gebhardt. (2000). ‘‘Implementing Functions for Spatial Statistical
Analysis Using the R Language.’’ Journal of Geographical Systems 2, 307–17.
Bivand, R. S., and B. A. Portnov. (2004). ‘‘Exploring Spatial Data Analysis Techniques Using
R: The Case of Observations with No Neighbours.’’ In Advances in Spatial
Econometrics: Methodology, Tools, Applications, 121–42, edited by L. Anselin,
R. J. G. M. Florax, and S. J. Rey. Berlin: Springer.
Chambers, J. M. (1998). Programming with Data. New York: Springer.
Chambers, J. M., and T. J. Hastie. (1992). Statistical Models in S. London: Chapman & Hall.
Choynowski, M. (1959). ‘‘Maps Based on Probabilities.’’ Journal of the American Statistical
Association 54, 385–88.
Clayton, D., and J. Kaldor. (1987). ‘‘Empirical Bayes Estimates of Age-Standardized Relative
Risks for Use in Disease Mapping.’’ Journal of the American Statistical Association 84,
393–401.
Cressie, N. (1993). Statistics for Spatial Data. New York: Wiley.
Cressie, N., and N. H. Chan. (1989). ‘‘Spatial Modelling of Regional Variables.’’ Journal of
the American Statistical Association 84, 393–401.
Cressie, N., and T. R. C. Read. (1985). ‘‘Do Sudden Infant Deaths Come in Clusters?’’
Statistics and Decisions 3(Suppl. 2), 333–49.
Cressie, N., and T. R. C. Read. (1989). ‘‘Spatial Data-Analysis of Regional Counts.’’
Biometrical Journal 31, 699–719.
Dalgaard, P. (2002). Introductory Statistics with R. New York: Springer.
Gentleman, R., V. Carey, R. Huber, P. Irizarry, S. Dudoit, (eds.), (2005). Bioinformatics and
Computational Biology Solutions Using r and Bioconductor. New York: Springer.
Gómez-Rubio, V., J. Ferrándiz-Ferragud, and A. López-Quı́lez. (2005). ‘‘Detecting Clusters
of Disease with R,’’ Journal of Geographical Systems 7, 189–206.
Grunsky, E. C. (2002). ‘‘R: A Data Analysis and Statistical Programming Environment—an
Emerging Tool for the Geosciences.’’ Computers & Geosciences 28, 1219–22.
Haining, R. P. (2003). Spatial Data Analysis: Theory and Practice. Cambridge, UK:
Cambridge University Press.
39
Geographical Analysis
Ihaka, R., and R. Gentleman. 1996. ‘‘R: A Language for Data Analysis and Graphics.’’
Journal of Computational and Graphical Statistics 5, 299–314.
Kaluzny, S. P., S. C. Vega, T. P. Cardoso, and A. A. Shelly. (1996). S-PLUS SPATIALSTATS
User’s Manual Version 1.0. Seattle, WA: MathSoft Inc.
Lawson, A. B., W. J. Browne, and C. L. Vidal Rodeiro. (2003). Disease Mapping with
WinBUGS and MLwiN. Chichester, UK: Wiley.
Leisch, F., and A. Rossini. (2003). ‘‘Reproducible Statistical Research.’’ Chance 16(2), 46–50.
Marshall, R. M. (1991). ‘‘Mapping Disease And Mortality Rates Using Empirical Bayes
Estimators.’’ Applied Statistics 40, 283–94.
Openshaw, S., M. Charlton, C. Wymer, and A. W. Craft. (1987). ‘‘A Mark I Geographical
Analysis Machine for the Automated Analysis of Point Data Sets.’’ International Journal
of Geographical Information Systems 1, 335–58.
Pebesma, E. J. (2004). ‘‘Multivariable Geostatistics in S: The gstat Package.’’ Computers &
Geosciences 30, 683–91.
R Development Core Team. (2005). R: A Language and Environment for Statistical
Computing. Vienna, Austria: R Foundation for Statistical Computing (http://www.
R-project.org).
Ripley, B. D. (2001). ‘‘Spatial Statistics in R.’’ R News 1(2), 14–5.
Symons, M. J., R. C. Grimson, and Y. C. Yuan. (1983). ‘‘Clustering of Rare Events.’’
Biometrics 39, 193–205.
Waller, L. A., and C. A. Gotway. (2004). Applied Spatial Statistics for Public Health Data.
Hoboken, NJ: Wiley.
Venables, W. N., and B. D. Ripley. (2000). S Programming. New York: Springer.
Venables, W. N., and B. D. Ripley. (2002). Modern Applied Statistics with S, 4th ed. New
York: Springer.
40