Overview of Statistical Analysis of Spatial Data
Overview of Statistical Analysis of Spatial Data
Geog 210C
Introduction to Spatial Data Analysis
Phaedon C. Kyriakidis
www.geog.ucsb.edu/∼phaedon
Department of Geography
University of California Santa Barbara
Santa Barbara, CA 93106-4060
phaedon@geog.ucsb.edu
Outline
Preliminaries
Points to Remember
Preliminaries
Notes
I any processing of spatial data, e.g., filtering or interpolation, affects any
inference made from them
I boundaries between above stages not always clear-cut
38.5
12
38
10
37.5 8
6
37
4
36.5
2
1981−82 NDJ average
36
−123.5 −123 −122.5 −122 −121.5 −121
Distinction between spatially continuous and area (lattice) data not always clear-cut,
particularly when the latter are derived via aggregation from the former
5.5
0.8 38.5
5
maple
38
0.6 4.5
37.5
4
0.4
37
3.5
hickory
36.5
0.2 3
1962−1981
36 2.5
−123.5 −123 −122.5 −122 −121.5 −121 −120.5 −120
0.0
0.0 0.2 0.4 0.6 0.8
Analysis objectives
I modeling of flow patterns = finding relationships between observed flows and
explanatory variables, e.g., number of trips from origins to destinations as
function of income
I classical analysis methods focus on patterns of aggregate interaction, rather
than individuals themselves; more recent focus is placed on understanding
individual preferences and choice modeling
I spatial location/allocation problems, and more generally spatial optimization
problems, typically involve network data
Methods for analyzing spatial interaction data are not covered in this course
2 2
1 1
value
value
0 0
−1 −1
−2 −2
−3 −3
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
x x
Point patterns
I detect clustering or regularity, as opposed to complete randomness, of event
locations in space and/or time
I if clustering is detected, investigate possible relations between clusters
and nearby “sources” or pertinent covariates
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 10 / 20
Why Spatial Statistics?
Spatial statistics
I multivariate statistics in a spatial/temporal context: each observation is
viewed as a realization from a different random variable, but such random
variables are auto-correlated in space and/or time
I each sample is not an independent piece of information, because precisely it
is redundant with other samples (due to the corresponding random variables
being auto-correlated)
I auto- and cross-correlation (in space and/or time) is explicitly accounted for
to establish confidence intervals for hypothesis testing
One can always choose to analyze spatial data with non-spatial statistics;
problems arise when confidence intervals need to be reported. . .
Statistical packages
I extremely versatile in modeling; recent improvements in visualization
I R and SpaceStat/GeoDa most popular in Geography
80
40 55 55 38 88 34 50 60 49 46 84 23
variable #2
70
60
41 30 26 35 38 24 21 46 22 42 45 14
50
14 56 37 34 08 18 19 36 48 23 08 29 40 ρ12 = 0.83
30
49 44 51 67 17 37 38 47 52 52 22 48 20
10
55 25 33 32 59 54 58 40 46 38 35 55 0
0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 20.17 m = 42.92 s = 18.32
Aggregation Scheme #1
100
Scatterplot
91.0 54.5 34.0 73.5 57.0 44.0
90
80
47.5 46.5 61.0 55.0 47.5 53.5
variable #2
70
60
35.5 30.5 31.0 33.5 32.0 29.5
50
10
80
47.5 46.5 61.0 55.0 47.5 53.5
variable #2
70
60
35.5 30.5 31.0 33.5 32.0 29.5
50
10
Aggregation Scheme #2
100
Scatterplot
90
63.5 75.0 63.5 37.5 66.0 29.0 61.0 67.5 67.0 37.5 71.0 26.5
80
variable #2
70
60
50
27.5 43.0 31.5 34.5 23.0 21.0 20.0 41.0 35.0 32.5 26.5 21.5
40 ρ12 = 0.94
30
20
52.0 34.5 42.0 49.5 38.0 45.5 48.0 43.5 49.0 45.0 28.5 51.5 10
0
0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 15.23 m = 42.92 s = 15.59
For a given aggregation extent, statistics and relationships between spatial attributes
depend on which individual values are aggregated and how
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 15 / 20
80
47.5 46.5 61.0 55.0 47.5 53.5
variable #2
70
60
35.5 30.5 31.0 33.5 32.0 29.5
50
10
80
40 55 55 38 88 34 50 60 49 46 84 23
variable #2
70
60
41 30 26 35 38 24 21 46 22 42 45 14
50
14 56 37 34 08 18 19 36 48 23 08 29 40 ρ12 = 0.83
30
49 44 51 67 17 37 38 47 52 52 22 48 20
10
55 25 33 32 59 54 58 40 46 38 35 55 0
0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 20.17 m = 42.92 s = 18.32
Statistics and relationships between spatial variables at a finer spatial resolution are
different than those derived at the original coarse resolution
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 16 / 20
Problems in Spatial Data Analysis
80
47.5 46.5 61.0 55.0 47.5 53.5
variable #2
70
60
35.5 30.5 31.0 33.5 32.0 29.5
50
10
80
55 40 38 55 34 88 50 60 49 46 84 23
variable #2
70
60
30 41 35 26 24 38 21 46 22 42 45 14
50
56 14 34 37 18 08 19 36 48 23 08 29 40 ρ12 = 0.21
30
44 49 67 51 37 17 38 47 52 52 22 48 20
10
25 55 32 33 54 59 58 40 46 38 35 55 0
0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 20.17 m = 42.92 s = 18.32
Multiple combinations of fine spatial resolution attribute values can lead to the same
aggregate values at a coarser resolution (equi-finality)
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 17 / 20
1
value
−1
−2
−3
0 10 20 30 40 50 60 70 80 90 100
x
First-order effects
Spatial pattern explained by environmental (or extrinsic) factors, e.g., attribute
value y (x) is high at location x due to another attribute value y 0 (x) at the same
location x, or another attribute value y 0 (x 0 ) at a nearby location x 0
Second-order effects
Spatial pattern explained by interaction (or intrinsic) factors, e.g., attribute value
y (x) is low at location x due to another (same-attribute) value y (x 0 ) at a nearby
location x 0 , provided both locations x and x 0 lie in the same “environment”
Recap I
Spatial data
I set of geo-referenced measurements with attribute values and coordinates
(topology & context also important)
I data types:
1. spatial point patterns – events
2. data continuously varying in space – fields
3. area or lattice data – objects
4. spatial interaction data – flows
Points to Remember
Recap II
Spatial statistics
I statistical framework for analysis and modeling of spatial data: accounts for
spatial auto-correlation and scale effects; allows assessing uncertainty in
spatial analysis results
I multivariate statistics tailored to the analysis of spatial data
Issues to be aware of
I any spatial analysis result is tied to a particular observation scale,
i.e., to the particular sample support(s); the Modifiable Area Unit Problem
(MAUP) and the Ecological Inference Problem (EIP) are consequences of this
I spatial process models typically distinguish between:
I first-order effects or environmental controls
I second-order effects or interactions (spatial auto-correlation)
this dichotomy does not apply to actual data, only to data generating models. . .