Applied Spatial Data Analysis with R
Applied Spatial Data Analysis with R
Roger Bivand1
12 June 2014
1
Department of Economics, Norwegian School of Economics, Helleveien 30,
N-5045 Bergen, Norway; Roger.Bivand@nhh.no
Roger Bivand Applied Spatial Data Analysis with R
Spatial data in R
Handling spatial data in R: GRASS interface
Handling spatial data in R: methods
Worked examples in spatial statistics
Overview
1 Spatial data in R
Study area
Representing spatial data
Coordinate reference systems
Raster data
Exporting data
2 Handling spatial data in R: GRASS interface
3 Handling spatial data in R: methods
Topology predicates and operations
Overlay
4 Worked examples in spatial statistics
Disease mapping
Spatial autocorrelation
Regression
Bayesian spatial regression with INLA
Roger Bivand Applied Spatial Data Analysis with R
Study area
Spatial data in R
Representing spatial data
Handling spatial data in R: GRASS interface
Coordinate reference systems
Handling spatial data in R: methods
Raster data
Worked examples in spatial statistics
Exporting data
R refresher
R refresher
R refresher
R refresher
R refresher
R refresher
R refresher
When we talk of arguments to a function, we mean the
(possibly named) arguments passed to the function for
evaluation
We can access the arguments using the function help pages,
or briefly using args():
> args(sum)
function (..., na.rm = FALSE)
NULL
R refresher
The function as we will see later takes at least two arguments, a data
source name, here the current directory, and a layer name, here the name
of the shapefile without extension. Once we have checked the names in
the imported object, we can use the spplot method in sp to make a
map:
> olinda <- readOGR(".", "setor1")
> names(olinda)
0.8
0.6
0.4
0.2
0.0
Study area
Spatial data in R
Representing spatial data
Handling spatial data in R: GRASS interface
Coordinate reference systems
Handling spatial data in R: methods
Raster data
Worked examples in spatial statistics
Exporting data
Data frames
Object framework
Spatial objects
The foundation object is the Spatial class, with just two slots
(new-style class objects have pre-defined components called
slots)
The first is a bounding box, and is mostly used for setting up
plots
The second is a CRS class object defining the coordinate
reference system, and may be set to CRS(as.character(NA)),
its default value.
Operations on Spatial* objects should update or copy these
values to the new Spatial* objects being created
Spatial points
The most basic spatial data object is a point, which may have
2 or 3 dimensions
A single coordinate, or a set of such coordinates, may be used
to define a SpatialPoints object; coordinates should be of
mode double and will be promoted if not
The points in a SpatialPoints object may be associated with
a row of attributes to create a SpatialPointsDataFrame object
The coordinates and attributes may, but do not have to be
keyed to each other using ID values
SpatialPointsDataFrame Spatial
SpatialPoints bbox
coords.nrs proj4string
data
SpatialPoints
data.frame coords
Spatial
Spatial*DataFrames
Back to Olinda
The object also works within model fitting functions, like glm; note the
number of rows.
> str(model.frame(CASES ~ DEPRIV + offset(log(POP)), olinda), give.attr = FALSE)
SpatialGridDataFrame Spatial
SpatialGrid SpatialGrid bbox
data Spatial proj4string
grid
GridTopology
cellcentre.offset
SpatialPixelsDataFrame cellsize
SpatialPixels SpatialPixels cells.dim
data grid
grid.index SpatialPoints
SpatialPoints coords
Spatial
data.frame
A SpatialGridDataFrame object
The space shuttle flew a radar topography mission in 2000, giving 90m
resolution elevation data for most of the world. The data here have been
warped to a UTM projection, but for the WGS84 datum - we’ll see below
how to project and if need be datum transform Spatial objects:
> DEM <- readGDAL("UTM_dem.tif")
> summary(DEM)
9120000
9118000
9116000
9114000
9112000
9110000
[1] NA
code note
154 4225 # Corrego Alegre 1970-72
460 5524 # Corrego Alegre 1961
2741 5536 # Corrego Alegre 1961 / UTM zone 21S
2742 5537 # Corrego Alegre 1961 / UTM zone 22S
2743 5538 # Corrego Alegre 1961 / UTM zone 23S
2744 5539 # Corrego Alegre 1961 / UTM zone 24S
3140 22521 # Corrego Alegre 1970-72 / UTM zone 21S
3141 22522 # Corrego Alegre 1970-72 / UTM zone 22S
3142 22523 # Corrego Alegre 1970-72 / UTM zone 23S
3143 22524 # Corrego Alegre 1970-72 / UTM zone 24S
3144 22525 # Corrego Alegre 1970-72 / UTM zone 25S
The original data set CRS
[1] FALSE
> proj4string(DEM)
> set_ReplCRS_warn(TRUE)
[1] TRUE
Getting olinda to WGS84
As we see, although both olinda and DEM are UTM zone 25 south, they
differ in their ellipsoid and datum. Using spTransform methods in rgdal
we can undertake a datum shift for olinda, making it possible to
overplot in register:
> olinda1 <- spTransform(olinda, CRS(proj4string(DEM)))
9120000
9118000
9116000
9114000
9112000
9110000
Reading rasters
There are very many raster and image formats; some allow
only one band of data, others think data bands are RGB,
while yet others are flexible
There is a simple readAsciiGrid function in maptools that
reads ESRI Arc ASCII grids into SpatialGridDataFrame
objects; it does not handle CRS and has a single band
Much more support is available in rgdal in the readGDAL
function, which — like readOGR — finds a usable driver if
available and proceeds from there
Using arguments to readGDAL, subregions or bands may be
selected, which helps handle large rasters
This table summarises the classes provided by sp, and shows how
they build up to the objects of most practical use, the
Spatial*DataFrame family objects:
Writing objects
In rgdal, writeGDAL can write for example multi-band
GeoTiffs, but there are fewer write than read drivers; in
general CRS and geogreferencing are supported — see
gdalDrivers
The rgdal function writeOGR can be used to write vector files,
including those formats supported by drivers, including now
KML — see ogrDrivers
External software (including different versions) tolerate output
objects in varying degrees, quite often needing tricks - see
mailing list archives
In maptools, there are functions for writing sp objects to
shapefiles — writeSpatialShape, etc., as Arc ASCII grids —
writeAsciiGrid, and for using the R PNG graphics device for
outputting image overlays for Google Earth
Roger Bivand Applied Spatial Data Analysis with R
Study area
Spatial data in R
Representing spatial data
Handling spatial data in R: GRASS interface
Coordinate reference systems
Handling spatial data in R: methods
Raster data
Worked examples in spatial statistics
Exporting data
One of the nice things about linking to OSGeo software is that when
someone contributes a driver, it is there for other software too. With the
increasing availability of software like Google Earth (or Google Maps), the
ability to display data or results in a context that is familiar to the user
can be helpful, but we need geographical coordinates:
> olinda_ll <- spTransform(olinda1, CRS("+proj=longlat +datum=WGS84"))
> writeOGR(olinda_ll, dsn = "olinda_ll.kml", layer = "olinda_ll", driver = "KML",
+ overwrite_layer = TRUE)
GIS interfaces
R — GIS interfaces
Layering of shells
example saying where the data is stored, GRASS environment and location
but is otherwise a regular shell, from System shell and environment (csh, ksh, bash)
Installing GRASS
Returning results
9120000
9118000
First we vectorise the raster stream
9116000
network, and read it into the R session
for display:
9114000
> execGRASS("r.to.vect", input = "stream1",
+ output = "stream", feature = "line")
9112000
> stream1 <- readVECT6("stream")
9110000
Operating on geometries
[1] 243
[1] TRUE
[1] TRUE
Overlay operations
[1] TRUE
Spatial data in R Disease mapping
Handling spatial data in R: GRASS interface Spatial autocorrelation
Handling spatial data in R: methods Regression
Worked examples in spatial statistics Bayesian spatial regression with INLA
low
N/S
high
80
If the underlying distribution of the
data does not agree with our
60
assumption, we may get several
possible processes mixed up,
Frequency
40
overdispersion with spatial
dependence:
20
> table(findInterval(pm$pmap,
+ seq(0, 1, 1/10)))
1 2 3 4 5 6 7 8 9 10
77 17 26 9 12 16 8 11 8 57
0
pm$pmap
6
EB_ml
> library(DCluster)
> olinda2$RR <- olinda2$CASES/olinda2$Expected
> olinda2$EB_ml <- empbaysmooth(olinda2$CASES,
+ olinda2$Expected)$smthrr 3
RR
> spplot(olinda2, c("RR", "EB_ml"),
+ col.regions = brewer.pal(10,
+ "RdBu"), at = c(seq(0,
2
+ 1, 0.25), seq(1, 6,
+ 1)))
0
Spatial data in R Disease mapping
Handling spatial data in R: GRASS interface Spatial autocorrelation
Handling spatial data in R: methods Regression
Worked examples in spatial statistics Bayesian spatial regression with INLA
Neighbours
●
●
● ●
●●
● ●
●
●
● ●
●
●
●
●
●
● ● ●● ● ●
●
●
●
● ●● ● ●
● ●
● ●
●
●
●
● ●
●
●
●
●
● ● ●
●
● ● ●
contiguities of the tract polygons: ●
●●
●
●
● ●
● ●
●
●
●
●●
●
● ●
●
●
●
●
● ●
● ●
●
●
● ● ● ● ●
●
● ● ● ●● ● ● ● ● ●
●
● ● ●
> nb <- poly2nb(olinda2) ●
●
● ● ● ●
●
● ●
● ● ●
●
●
●
● ● ● ● ● ● ● ● ●
● ●
> nb ● ●● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ●
● ● ●
● ● ● ●
● ● ● ●
Neighbour list object: ●
● ●
● ●
●● ● ●
●●
●
●
●
●
● ● ●
Number of regions: 241 ●
● ●
●
●
●
●
●
● ● ●
●
●
●● ● ● ● ● ● ● ●
Number of nonzero links: 1324 ● ● ●
● ●
● ●
Percentage nonzero weights: 2.279575 ●
●
● ●
Average number of links: 5.493776 ●
●
● ●
●
● ● ●
●
●
6
EB_mm_local
5
If instead of shrinking to a global
rate, we shrink to a local rate, we 4
may be able to take unobserved
heterogeneity into account; here we 3
RR EB_ml
use the list of neighbours:
> olinda2$Observed <- olinda2$CASES 2
> eb2 <- EBlocal(olinda2$Observed,
+ olinda2$Expected, nb)
> olinda2$EB_mm_local <- eb2$est
1
Moran’s I
Histogram of t
10
0.15
●
●
●●
●
●
●
0.10
●
●●
●
●
8
●
●
●
●
●
●
0.05
●●
●
●
●
6
●
●●
●
●
Density
●
●●
●
●
●●
●
●●
●
●
●
●
t*
●●
●
●
●
●
●
●
●●
●
●
counts: ●
●
●
0.00
●
●●
●
●●
●
●
●●
●
4
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
> lw <- nb2listw(nb) ●●
●
−0.05
●
●●
●
●
●
●●
●●
●
●
●●
●
●
> set.seed(130709) ●●
2
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
> moran.boot <- boot(as(olinda2, ●
●
●
●
●
●
−0.10
●●●
+ "data.frame"), statistic = moranI.boot,
0 ●
+ R = 999, listw = lw, n = length(nb),
+ S0 = Szero(lw)) −0.1 0.1 0.2 0.3 0.4 −3 −2 −1 0 1 2 3
Moran’s I
Histogram of t
0.15
●
10
●●
0.10
●
8
●●
●
●
●
●
●
●
●
●
●
●
●●
0.05
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
6
●
●
●
●
Density
●
Negative Binomial: ●
●
●
●
●
●
●
●
●
●
●
t*
●
●
●
●
●
0.00
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
4
●
●
> moran.pgboot <- boot(as(olinda2, ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
+ "data.frame"), statistic = moranI.pboot, ●
●●
−0.05
●
●●
●
●
●●
●
●●
●
●
●
+ sim = "parametric", ran.gen = negbin.sim, ●
●
●
●
●
●
●
●
●
●
●
●
2
●
●
●●
●
●●
+ R = 999, listw = lw, n = length(nb), ●
●
●
●
●
●
●
●
●
●
●
●●
−0.10
●●
●
+ S0 = Szero(lw)) ●
●●
●
●
●
0 ●
data: m0p
P_B = 50.2478, p-value < 2.2e-16
alternative hypothesis: greater
40
f2 f3
Residual maps
r3 30
So we can also plot residual maps for
the three nested models, but getting the 20
palette divergence in the right place
takes a little more care: 10
−30
> summary(INLA_BYM)
Call:
c("inla(formula = CASES ~ DEPRIV + ndvi + DEM_resamp + f(INLA_ID, ", " model = \"bym\", graph = nb2mat
Time used:
Pre-processing Running inla Post-processing Total
0.1771 0.7633 0.0490 0.9894
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) -0.2314 0.1860 -0.5995 -0.2306 0.1318 -0.2289 0
DEPRIV 0.4183 0.3438 -0.2577 0.4184 1.0934 0.4185 0
ndvi 0.6982 0.5930 -0.4660 0.6977 1.8642 0.6967 0
DEM_resamp -0.0126 0.0063 -0.0249 -0.0126 -0.0002 -0.0127 0
Random effects:
Name Model
INLA_ID BYM model
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for INLA_ID (iid component) 5.4605 1.9042 2.7427 5.1196 10.0987 4.5074
Precision for INLA_ID (spatial component) 1.5733 0.5878 0.6928 1.4862 2.9701 1.3210
0
Spatial data in R Disease mapping
Handling spatial data in R: GRASS interface Spatial autocorrelation
Handling spatial data in R: methods Regression
Worked examples in spatial statistics Bayesian spatial regression with INLA
Summary