0% found this document useful (0 votes)
52 views10 pages

Overview of Statistical Analysis of Spatial Data

This document provides an overview of statistical analysis of spatial data. It outlines the types of spatial data including attributes varying continuously in space, area or lattice data, point pattern data, and spatial interaction or network data. It discusses why spatial statistics are important for analyzing these different types of spatial data and highlights some common problems in spatial data analysis. The goal is to provide a brief introduction to spatial data analysis.

Uploaded by

carles1972mm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views10 pages

Overview of Statistical Analysis of Spatial Data

This document provides an overview of statistical analysis of spatial data. It outlines the types of spatial data including attributes varying continuously in space, area or lattice data, point pattern data, and spatial interaction or network data. It discusses why spatial statistics are important for analyzing these different types of spatial data and highlights some common problems in spatial data analysis. The goal is to provide a brief introduction to spatial data analysis.

Uploaded by

carles1972mm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Overview of Statistical Analysis of Spatial Data

Geog 210C
Introduction to Spatial Data Analysis

Phaedon C. Kyriakidis
www.geog.ucsb.edu/∼phaedon

Department of Geography
University of California Santa Barbara
Santa Barbara, CA 93106-4060
phaedon@geog.ucsb.edu

Spring Quarter 2009

Outline

Preliminaries

Types of Spatial Data

Why Spatial Statistics?

Problems in Spatial Data Analysis

Points to Remember

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 2 / 20


Preliminaries

Introduction & Objectives


Spatial data
Geo-referenced attribute measurements; each measurement is associated with a
location (point) or an entity (region or object) in geographical (or other) space
I attribute measurement scale can be continuous or discrete,
e.g., chemical concentration, soil types, disease occurrences
I sample locations can have a regular or irregular spatial arrangement,
i.e., data locations on a raster (regular lattice) or scattered in space;
domain informed by a measurement is called the sample unit or support,
e.g., points, pixels, polygons
I spatial data often have an additional temporal component;
dynamic attribute evolution in space and time, spatiotemporal support

Objectives of this handout


I to provide a brief overview of types of spatial data
I to highlight the role of spatial statistics in analyzing data of each type

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 3 / 20

Preliminaries

Stages in Spatial Data Analysis


Exploratory analysis
I explore spatial data using cartographic (or other visual) representations
I statistical analysis for detecting possible sub-populations, outliers, trends,
relationships with neighboring values or other spatial variables

Modeling or confirmatory analysis


I establish parametric or non-parametric model(s) characterizing attribute
spatial distribution
I estimate model parameters from data; evaluate their statistical significance;
predict attribute values at other locations and/or future time instants

Notes
I any processing of spatial data, e.g., filtering or interpolation, affects any
inference made from them
I boundaries between above stages not always clear-cut

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 4 / 20


Types of Spatial Data

Attributes Varying Continuously in Space


Characteristics
I also known (unfortunately) as geostatistical data, e.g., temperature, rainfall,
elevation, population density
I measurements of nominal scale, e.g., land cover types, or interval/ratio scale,
e.g., sea floor depth
I often, sparse samples are available only at fixed set of locations
Bay Area rain gauge precipitation mm/day
39
14

38.5
12

38
10

37.5 8

6
37

4
36.5

2
1981−82 NDJ average
36
−123.5 −123 −122.5 −122 −121.5 −121

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 5 / 20

Types of Spatial Data

Area or Lattice Data


Characteristics
I attributes take values only at fixed set of areas or zones, e.g., administrative
districts, pixels of satellite images
I typically, all possible locations have been sampled; no attribute values
between sampling units (unless there are missing values)
SIDS Cases in North Carolina
37
50
36.5
36 40
35.5 30
35
20
34.5
34 10
From 1979 to 1984
33.5 0
−84 −83 −82 −81 −80 −79 −78 −77 −76

Distinction between spatially continuous and area (lattice) data not always clear-cut,
particularly when the latter are derived via aggregation from the former

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 6 / 20


Types of Spatial Data

Point Pattern Data


Characteristics
I series of point locations with recorded “events”, e.g., locations of trees,
disease or crime incidents
I point locations correspond to all possible events (mapped point pattern),
or to a subset (sampled point pattern)
I attribute values also possible at same locations, e.g., tree diameter,
magnitude of earthquakes (marked point pattern)
Lansing Woods tree locations
Bay Area earthquake magnitudes

5.5
0.8 38.5
5
maple
38
0.6 4.5

37.5
4
0.4
37
3.5
hickory
36.5
0.2 3

1962−1981
36 2.5
−123.5 −123 −122.5 −122 −121.5 −121 −120.5 −120
0.0
0.0 0.2 0.4 0.6 0.8

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 7 / 20

Types of Spatial Data

Spatial Interaction or Network Data


Characteristics
I attributes relate to pairs of points or areas: flows from origins to destinations,
e.g., patients “flow” from residences to hospitals
I less tangible flows, e.g., information, could be defined

Analysis objectives
I modeling of flow patterns = finding relationships between observed flows and
explanatory variables, e.g., number of trips from origins to destinations as
function of income
I classical analysis methods focus on patterns of aggregate interaction, rather
than individuals themselves; more recent focus is placed on understanding
individual preferences and choice modeling
I spatial location/allocation problems, and more generally spatial optimization
problems, typically involve network data

Methods for analyzing spatial interaction data are not covered in this course

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 8 / 20


Why Spatial Statistics?

Univariate Statistics and Spatial Pattern?


Two 1D attribute profiles with the same histogram:
1D population 1D population
3 3

2 2

1 1

value

value
0 0

−1 −1

−2 −2

−3 −3
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
x x

Shortcomings of univariate statistics


Univariate statistics, e.g., average, variance, histogram, do not suffice to describe
spatial pattern; the spatial arrangement of attribute values matters, too

Spatial auto-correlation an aspect of spatial pattern

Attribute values measured at “nearby” supports tend to be more “similar” than


those measured at “distant” supports; Tobler’s 1st law(?) of Geography

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 9 / 20

Why Spatial Statistics?

Role of Spatial Statistics in Spatial Data Analysis


Spatially continuous data
I model attribute spatial variation over study area from sampled point values
I predict attribute values at non-sampled locations (accounting for covariates)

Area (lattice) data


I detect and model spatial patterns or trends in area values; no prediction at
non-sampled locations, unless smoothing of existing values or imputation of
missing values is required
I use covariates or relationships with adjacent attribute values for inference,
e.g., disease rates in light of socioeconomic variables

Point patterns
I detect clustering or regularity, as opposed to complete randomness, of event
locations in space and/or time
I if clustering is detected, investigate possible relations between clusters
and nearby “sources” or pertinent covariates
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 10 / 20
Why Spatial Statistics?

Spatial Versus Non-Spatial Statistics


Classical statistics
I samples assumed realizations of independent and identically distributed
random variables (iid)
I most hypothesis testing procedures call for samples from iid random variables
I problems with inference and hypothesis testing in a spatial setting

Spatial statistics
I multivariate statistics in a spatial/temporal context: each observation is
viewed as a realization from a different random variable, but such random
variables are auto-correlated in space and/or time
I each sample is not an independent piece of information, because precisely it
is redundant with other samples (due to the corresponding random variables
being auto-correlated)
I auto- and cross-correlation (in space and/or time) is explicitly accounted for
to establish confidence intervals for hypothesis testing
One can always choose to analyze spatial data with non-spatial statistics;
problems arise when confidence intervals need to be reported. . .

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 11 / 20

Why Spatial Statistics?

Software for Statistical Analysis of Spatial Data


GIS-based
I ESRI’s Spatial Analyst, Geostatistical Analyst. . .
I opt for “close” or “loose” coupling with specialized external packages when
specific functionalities are missing from a GIS

Statistical packages
I extremely versatile in modeling; recent improvements in visualization
I R and SpaceStat/GeoDa most popular in Geography

Image processing packages


I mature technology, lots of new developments
I IDL and Matlab most popular in Remote Sensing and Electrical Engineering

Access to source code written in a straight-forward programming language


is critical for research development in an academic environment . . .
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 12 / 20
Problems in Spatial Data Analysis

Some Issues Specific to Spatial Data Analysis


A first look
I differences from times series analysis:
1. irregular sampling
2. lack of clear indexing; no notion of past-present-future
3. auto- and cross-correlation in multiple directions
I multi-source data associated with different spatial/temporal resolutions
I data often reported as aggregates over arbitrarily defined zones/areas;
statistics of aggregates are not the same as those of individuals:
1. Modifiable Area Unit Problem (MAUP)
2. Ecological Fallacy or Inference Problem (EIP)
I edge/boundary effects: samples near the edges of a study region have fewer
neighbors than samples in the interior; near-edge samples might bear the
effects of different spatial processes
I spatial process models typically distinguish between first- and second-order
effects, i.e., between environmental controls and interactions (distinction
between the two not always clear-cut)

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 13 / 20

Problems in Spatial Data Analysis

Modifiable Area-Unit Problem: Aggregation Effect


Two spatial variables and their univariate/bivariate statistics
Spatial Variable #1 Spatial Variable #2
100
Scatterplot
87 95 72 37 44 24 72 75 85 29 58 30
90

80
40 55 55 38 88 34 50 60 49 46 84 23
variable #2

70

60
41 30 26 35 38 24 21 46 22 42 45 14
50

14 56 37 34 08 18 19 36 48 23 08 29 40 ρ12 = 0.83
30

49 44 51 67 17 37 38 47 52 52 22 48 20

10

55 25 33 32 59 54 58 40 46 38 35 55 0
0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 20.17 m = 42.92 s = 18.32

Aggregation Scheme #1
100
Scatterplot
91.0 54.5 34.0 73.5 57.0 44.0
90

80
47.5 46.5 61.0 55.0 47.5 53.5
variable #2

70

60
35.5 30.5 31.0 33.5 32.0 29.5
50

35.0 35.5 13.0 27.5 35.5 18.5 40 ρ12 = 0.90


30

46.5 59.0 27.0 42.5 52.0 35.0 20

10

40.0 32.5 56.5 49.0 42.0 45.0 0


0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 16.79 m = 42.92 s = 12.65

Statistics and relationships between spatial attributes depend on aggregation extent


Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 14 / 20
Problems in Spatial Data Analysis

Modifiable Area-Unit Problem: Zonation Effect


Upscaling spatial variables using two different aggregation schemes
Aggregation Scheme #1
100
Scatterplot
91.0 54.5 34.0 73.5 57.0 44.0
90

80
47.5 46.5 61.0 55.0 47.5 53.5

variable #2
70

60
35.5 30.5 31.0 33.5 32.0 29.5
50

35.0 35.5 13.0 27.5 35.5 18.5 40 ρ12 = 0.90


30

46.5 59.0 27.0 42.5 52.0 35.0 20

10

40.0 32.5 56.5 49.0 42.0 45.0 0


0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 16.79 m = 42.92 s = 12.65

Aggregation Scheme #2
100
Scatterplot
90
63.5 75.0 63.5 37.5 66.0 29.0 61.0 67.5 67.0 37.5 71.0 26.5
80

variable #2
70

60

50
27.5 43.0 31.5 34.5 23.0 21.0 20.0 41.0 35.0 32.5 26.5 21.5
40 ρ12 = 0.94
30

20

52.0 34.5 42.0 49.5 38.0 45.5 48.0 43.5 49.0 45.0 28.5 51.5 10

0
0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 15.23 m = 42.92 s = 15.59

For a given aggregation extent, statistics and relationships between spatial attributes
depend on which individual values are aggregated and how
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 15 / 20

Problems in Spatial Data Analysis

Ecological Inference Problem I


Downscaling spatial variables
Observed variables
100
Scatterplot
91.0 54.5 34.0 73.5 57.0 44.0
90

80
47.5 46.5 61.0 55.0 47.5 53.5
variable #2

70

60
35.5 30.5 31.0 33.5 32.0 29.5
50

35.0 35.5 13.0 27.5 35.5 18.5 40 ρ12 = 0.90


30

46.5 59.0 27.0 42.5 52.0 35.0 20

10

40.0 32.5 56.5 49.0 42.0 45.0 0


0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 16.79 m = 42.92 s = 12.65

Spatial Variable #1 Spatial Variable #2


100
Scatterplot
87 95 72 37 44 24 72 75 85 29 58 30
90

80
40 55 55 38 88 34 50 60 49 46 84 23
variable #2

70

60
41 30 26 35 38 24 21 46 22 42 45 14
50

14 56 37 34 08 18 19 36 48 23 08 29 40 ρ12 = 0.83
30

49 44 51 67 17 37 38 47 52 52 22 48 20

10

55 25 33 32 59 54 58 40 46 38 35 55 0
0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 20.17 m = 42.92 s = 18.32

Statistics and relationships between spatial variables at a finer spatial resolution are
different than those derived at the original coarse resolution
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 16 / 20
Problems in Spatial Data Analysis

Ecological Inference Problem II


Under-determined inverse problem
Observed variables
100
Scatterplot
91.0 54.5 34.0 73.5 57.0 44.0
90

80
47.5 46.5 61.0 55.0 47.5 53.5

variable #2
70

60
35.5 30.5 31.0 33.5 32.0 29.5
50

35.0 35.5 13.0 27.5 35.5 18.5 40 ρ12 = 0.90


30

46.5 59.0 27.0 42.5 52.0 35.0 20

10

40.0 32.5 56.5 49.0 42.0 45.0 0


0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 16.79 m = 42.92 s = 12.65

Spatial Variable #1 Spatial Variable #2


100
Scatterplot
95 87 37 72 24 44 72 75 85 29 58 30
90

80
55 40 38 55 34 88 50 60 49 46 84 23

variable #2
70

60
30 41 35 26 24 38 21 46 22 42 45 14
50

56 14 34 37 18 08 19 36 48 23 08 29 40 ρ12 = 0.21
30

44 49 67 51 37 17 38 47 52 52 22 48 20

10

25 55 32 33 54 59 58 40 46 38 35 55 0
0 10 20 30 40 50 60 70 80 90 100
variable #1
m = 43.14 s = 20.17 m = 42.92 s = 18.32

Multiple combinations of fine spatial resolution attribute values can lead to the same
aggregate values at a coarser resolution (equi-finality)
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 17 / 20

Problems in Spatial Data Analysis

First- Versus Second-Order Effects


1D population
3

1
value

−1

−2

−3
0 10 20 30 40 50 60 70 80 90 100
x

First-order effects
Spatial pattern explained by environmental (or extrinsic) factors, e.g., attribute
value y (x) is high at location x due to another attribute value y 0 (x) at the same
location x, or another attribute value y 0 (x 0 ) at a nearby location x 0

Second-order effects
Spatial pattern explained by interaction (or intrinsic) factors, e.g., attribute value
y (x) is low at location x due to another (same-attribute) value y (x 0 ) at a nearby
location x 0 , provided both locations x and x 0 lie in the same “environment”

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 18 / 20


Points to Remember

Recap I
Spatial data
I set of geo-referenced measurements with attribute values and coordinates
(topology & context also important)
I data types:
1. spatial point patterns – events
2. data continuously varying in space – fields
3. area or lattice data – objects
4. spatial interaction data – flows

Spatial data analysis objectives


I exploratory analysis: looking for patterns/relationships
I confirmatory analysis: establishing spatial process models from spatial
patterns + model parameter estimation

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 19 / 20

Points to Remember

Recap II
Spatial statistics
I statistical framework for analysis and modeling of spatial data: accounts for
spatial auto-correlation and scale effects; allows assessing uncertainty in
spatial analysis results
I multivariate statistics tailored to the analysis of spatial data

Issues to be aware of
I any spatial analysis result is tied to a particular observation scale,
i.e., to the particular sample support(s); the Modifiable Area Unit Problem
(MAUP) and the Ecological Inference Problem (EIP) are consequences of this
I spatial process models typically distinguish between:
I first-order effects or environmental controls
I second-order effects or interactions (spatial auto-correlation)
this dichotomy does not apply to actual data, only to data generating models. . .

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 20 / 20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy