GIS Data Collection: Geographic Information Systems and Science, 2nd Edition
GIS Data Collection: Geographic Information Systems and Science, 2nd Edition
Geographic Information Systems and Science, 2nd edition Paul Longley, Michael Goodchild, David Maguire, and David Rhind.
2005 John Wiley & Sons, Ltd. ISBNs: 0-470-87000-1 (HB); 0-470-87001-X (PB)
200 PART III TECHNIQUES
Learning Objectives a total survey station. Secondary sources are digital and
analog datasets that were originally captured for another
purpose and need to be converted into a suitable digital
After reading this chapter you will be able to: format for use in a GIS project. Typical secondary
sources include raster scanned color aerial photographs
of urban areas and United States Geological Survey
■ Describe data collection workflows; (USGS) or Institut Géographique National, France (IGN)
paper maps that can be scanned and vectorized. This
■ Understand the primary data capture classification scheme is a useful organizing framework
techniques in remote sensing and surveying; for this chapter and, more importantly, it highlights
the number of processing-stage transformations that a
dataset goes through, and therefore the opportunities for
■ Be familiar with the secondary data capture
errors to be introduced. However, the distinctions between
techniques of scanning, manual digitizing, primary and secondary, and raster and vector, are not
vectorization, photogrammetry, and COGO always easy to determine. For example, is digital satellite
remote sensing data obtained on a DVD primary or
feature construction;
secondary? Clearly the commercial satellite sensor feeds
do not run straight into GIS databases, but to ground
■ Understand the principles of data transfer, stations where the data are pre-processed onto digital
sources of digital geographic data, and media. Here it is considered primary because usually the
data has undergone only minimal transformation since
geographic data formats;
being collected by the satellite sensors and because the
characteristics of the data make them suitable for virtually
■ Analyze practical issues associated with direct use in GIS projects.
managing data capture projects.
Primary geographic data sources are captured
specifically for use in GIS by direct measurement.
Secondary sources are those reused from earlier
studies or obtained from other systems.
9.1 Introduction Both primary and secondary geographic data may
be obtained in either digital or analog format (see
Section 3.7 for a definition of analog). Analog data
GIS can contain a wide variety of geographic data types must always be digitized before being added to a
originating from many diverse sources. Data collection geographic database. Analog to digital transformation
activities for the purposes of organizing the material in may involve the scanning of paper maps or photographs,
this chapter are split into data capture (direct data input) optical character recognition (OCR) of text describing
and data transfer (input of data from other systems). From geographic object properties, or the vectorization of
the perspective of creating geographic databases, it is selected features from an image. Depending on the
convenient to classify raster and vector geographic data as format and characteristics of the digital data, considerable
primary and secondary (Table 9.1). Primary data sources reformatting and restructuring may be required prior to
are those collected in digital format specifically for use in importing into a GIS. Each of these transformations alters
a GIS project. Typical examples of primary GIS sources the original data and will introduce further uncertainty into
include raster SPOT and IKONOS Earth satellite images, the data (see Chapter 6 for discussion of uncertainty).
and vector building-survey measurements captured using This chapter describes the data sources, techniques,
and workflows involved in GIS data collection. The
Table 9.1 Classification of geographic data for data collection processes of data collection are also variously referred
purposes with examples of each type to as data capture, data automation, data conversion,
data transfer, data translation, and digitizing. Although
Raster Vector there are subtle differences between these terms, they
essentially describe the same thing, namely, adding
Primary Digital satellite GPS geographic data to a database. Data capture refers to
remote-sensing measurements direct entry. Data transfer is the importing of existing
images digital data across a network connection (Internet, wide
Digital aerial Survey area network (WAN), or local area network (LAN)) or
photographs measurements from physical media such as CD ROMs, zip disks, or
Secondary Scanned maps or Topographic diskettes. This chapter focuses on the techniques of data
photographs maps collection; of equal, perhaps more, importance to a real-
Digital elevation models Toponymy world GIS implementation are project management, cost,
from topographic (placename) legal, and organization issues. These are covered briefly
map contours databases in Section 9.6 of this chapter as a prelude to more detailed
treatment in Chapters 17 through 20.
CHAPTER 9 GIS DATA COLLECTION 201
Table 9.2 Breakdown of costs (in $1000s) for two typical
client-server GIS as estimated by the authors Planning
$ % $ %
Evaluation Preparation
Hardware 30 3.4 250 8.6
Software 25 2.8 200 6.9
Data 400 44.7 450 15.5
Staff 440 49.1 2000 69.0
Total 895 100 2900 100
Editing / Digitizing /
Improvement Transfer
Table 9.2 shows a breakdown of costs (in $1000s)
for two typical client-server GIS implementations: one
with 10 seats (systems) and the other with 100. The
hardware costs include desktop clients and servers only Figure 9.1 Stages in data collection projects
(i.e., not network infrastructure). The data costs assume
the purchase of a landbase (e.g., streets, parcels, and land commences with planning, followed by preparation,
marks) and digitizing assets such as pipes and fittings digitizing/transfer (here taken to mean a range of primary
(water utility), conductors and devices (electrical utility), and secondary techniques such as table digitizing, sur-
or land and property parcels (local government). Staff vey entry, scanning, and photogrammetry), editing and
costs assume that all core GIS staff will be full-time, but improvement and, finally, evaluation.
that users will be part-time. Planning is obviously important to any project and data
In the early days of GIS, when geographic data were collection is no exception. It includes establishing user
very scarce, data collection was the main project task requirements, garnering resources (staff, hardware, and
and typically it consumed the majority of the available software), and developing a project plan. Preparation is
resources. Even today data collection still remains a time- especially important in data collection projects. It involves
consuming, tedious, and expensive process. Typically it many tasks such as obtaining data, redrafting poor-quality
accounts for 15–50% of the total cost of a GIS project map sources, editing scanned map images, and removing
(Table 9.2). Data capture costs can in fact be much more noise (unwanted data such as speckles on a scanned map
significant because in many organizations (especially image). It may also involve setting up appropriate GIS
those that are government funded) staff costs are often hardware and software systems to accept data. Digitizing
assumed to be fixed and are not used in budget accounting. and transfer are the stages where the majority of the effort
Furthermore, as the majority of data capture effort and will be expended. It is naı̈ve to think that data capture
expense tends to fall at the start of projects, data capture is really just digitizing, when in fact it involves very
costs often receive greater scrutiny from senior managers. much more as discussed below. Editing and improvement
If staff costs are excluded from a GIS budget then in follows digitizing/transfer. This covers many techniques
cash expenditure terms data collection can be as much as designed to validate data, as well as correct errors and
60–85% of costs. improve quality. Evaluation, as the name suggests, is
Data capture costs can account for up to 85% of the process of identifying project successes and failures.
These may be qualitative or quantitative. Since all large
the cost of a GIS.
data projects involve multiple stages, this workflow is
After an organization has completed basic data col- iterative with earlier phases (especially a first, pilot, phase)
lection tasks, the focus of a GIS project moves on to helping to improve subsequent parts of the overall project.
data maintenance. Over the multi-year lifetime of a GIS
project, data maintenance can turn out to be a far more
complex and expensive activity than initial data collec-
tion. This is because of the high volume of update trans-
actions in many systems (for example, changes in land 9.2 Primary geographic
parcel ownership, maintenance work orders on a high- data capture
way transport network, or logging military operational
activities) and the need to manage multi-user access to
operational databases. For more information about data Primary geographic capture involves the direct measure-
maintenance, see Chapter 10. ment of objects. Digital data measurements may be input
directly into the GIS database, or can reside in a tempo-
rary file prior to input. Although the former is preferable
9.1.1 Data collection workflow as it minimizes the amount of time and the possibility of
errors, close coupling of data collection devices and GIS
In all but the simplest of projects, data collection involves databases is not always possible. Both raster and vector
a series of sequential stages (Figure 9.1). The workflow GIS primary data capture methods are available.
202 PART III TECHNIQUES
9.2.1 Raster data capture spectrum measured), for each pixel, in each image. Until
recently, remote sensing satellites typically measured a
Much the most popular form of primary raster data cap- small number of bands, in the visible part of the spec-
ture is remote sensing. Broadly speaking, remote sens- trum. More recently a number of hyperspectral systems
ing is a technique used to derive information about the have come into operation that measure very large numbers
physical, chemical, and biological properties of objects of bands across a much wider part of the spectrum.
without direct physical contact (Section 3.6). Informa- Temporal resolution, or repeat cycle, describes the
tion is derived from measurements of the amount of frequency with which images are collected for the same
electromagnetic radiation reflected, emitted, or scattered area. There are essentially two types of commercial
from objects. A variety of sensors, operating throughout remote sensing satellite: Earth-orbiting and geostationary.
the electromagnetic spectrum from visible to microwave Earth-orbiting satellites collect information about different
wavelengths, are commonly employed to obtain measure- parts of the Earth surface at regular intervals. To maximize
utility, typically orbits are polar, at a fixed altitude and
ments (see Section 3.6.1). Passive sensors are reliant on
speed, and are Sun synchronous.
reflected solar radiation or emitted terrestrial radiation;
The French SPOT (Système Probatoire d’Observation
active sensors (such as synthetic aperture radar) gener-
de la Terre) 5 satellite launched in 2002, for example,
ate their own source of electromagnetic radiation. The
passes virtually over the poles at an altitude of 822 km
platforms on which these instruments are mounted are
sensing the same location on the Earth surface during
similarly diverse. Although Earth-orbiting satellites and
daylight every 26 days. The SPOT platform carries mul-
fixed-wing aircraft are by far the most common, heli-
tiple sensors: a panchromatic sensor measuring radiation
copters, balloons, masts, and booms are also employed
in the visible part of the electromagnetic spectrum at a
(Figure 9.2). As used here, the term remote sensing sub-
spatial resolution of 2.5 by 2.5 m; a multi-spectral sen-
sumes the fields of satellite remote sensing and aerial
sor measuring green, red, and reflected infrared radiation
photography.
at a spatial resolution of 10 by 10 m; a shortwave near-
Remote sensing is the measurement of physical, infrared sensor with a resolution of 20 by 20 m; and a
vegetation sensor measuring four bands at a spatial reso-
chemical, and biological properties of objects
lution of 1000 m. The SPOT system is also able to provide
without direct contact. stereo images from which digital terrain models and 3-D
From the GIS perspective, resolution is a key physical measurements can be obtained. Each SPOT scene covers
characteristic of remote sensing systems. There are three an area of about 60 by 60 km.
aspects to resolution: spatial, spectral, and temporal. All Much of the discussion so far has focused on
sensors need to trade off spatial, spectral, and temporal commercial satellite remote sensing systems. Of equal
properties because of storage, processing, and bandwidth importance, especially in medium- to large (coarse)-scale
considerations. For further discussion of the important GIS projects, is aerial photography. Although the data
topic of resolution see also Sections 3.4, 3.6.1, 4.1, 6.4.2, products resulting from remote sensing satellites and
7.1, and 16.1. aerial photography systems are technically very similar
(i.e., they are both images) there are some significant
Three key aspects of resolution are: spatial, differences in the way data are captured and can,
spectral, and temporal. therefore, be interpreted. The most notable difference
is that aerial photographs are normally collected using
Spatial resolution refers to the size of object that can analog optical cameras (although digital cameras are
be resolved and the most usual measure is the pixel size. becoming more widely used) and then later rasterized,
Satellite remote sensing systems typically provide data usually by scanning a film negative. The quality of the
with pixel sizes in the range 0.5 m–1 km. The resolution optics of the camera and the mechanics of the scanning
of cameras used for capturing aerial photographs usually process both affect the spatial and spectral characteristics
ranges from 0.1 m–5 m. Image (scene) sizes vary quite of the resulting images. Most aerial photographs are
widely between sensors – typical ranges include 900 by collected on an ad hoc basis using cameras mounted
900 to 3000 by 3000 pixels. The total coverage of remote in airplanes flying at low altitudes (3000–9000 m) and
sensing images is usually in the range 9 by 9 to 200 are either panchromatic (black and white) or color,
by 200 km. although multi-spectral cameras/sensors operating in the
Spectral resolution refers to the parts of the elec- non-visible parts of the electromagnetic spectrum are also
tromagnetic spectrum that are measured. Since differ- used. Aerial photographs are very suitable for detailed
ent objects emit and reflect different types and amounts surveying and mapping projects.
of radiation, selecting which part of the electromagnetic An important feature of satellite and aerial photog-
spectrum to measure is critical for each application area. raphy systems is that they can provide stereo imagery
Figure 9.3 shows the spectral signatures of water, green from overlapping pairs of images. These images are used
vegetation, and dry soil. Remote sensing systems may to create a 3-D analog or digital model from which 3-D
capture data in one part of the spectrum (referred to as a coordinates, contours, and digital elevation models can be
single band) or simultaneously from several parts (multi- created (see Section 9.3.2.4).
band or multi-spectral). The radiation values are usually Satellite and aerial photograph data offer a number
normalized and resampled to give a range of integers of advantages for GIS projects. The consistency of the
from 0–255 for each band (part of the electromagnetic data and the availability of systematic global coverage
CHAPTER 9 GIS DATA COLLECTION 203
3
5y
2 4y
3y
106 2y
8
5 1y
3 SP
SPOTT HRV
H V 1,1 22,, 3
3,, 4
180 d Pan 1
an 10 x 10
2 MSS 20 x 20
SPOT
T 5 HRG (2001;
(2001 not shown)sh IRS-1 AB
Pan 2.5 5 x 5
an 2.5 x 2.5; LISS-1 72.5 x 72.5
105 10 SWIR 20 x 20
MSS 10 x 10; 2 LISS-2 36.25 x 36.25
IRS-1 CD
8 55 d JERS-1 Pan
an 5.8 x 5.8
5.
44 d MSS 18 X 24 FRS-1, 2
FRS-1 LISS-3 23.5 x 23.5;
23.5 MIR 70 x 70
7
5 30 d L-band 18 x 18 C-band 30 x 30 WiFS 188 x 188
26 d SPIN-2
LANDS T 4,5
LANDSAT 4,
3 22 d KVR-1000 2 X 2
MSS 79 x 79
TK-350 10 X 10
2 16 d TM 30 x 30
LANDSAT
LANDS T 7 ETM+ (1999)
(1999
an 15 x 15
Pan 15; MSS 30 x 30
3
9d TIR 60 x 60
Temporal resolution in minutes
Quickbird
Quickbi
Qui kbird
d (2000)
(2000
104 10 000 0.82 x 0.82 ASTER (1999)
8 5d 3.28 x 3.28 EOS AM-1
VNIR 15 x 15 m
5 4d SWIR 30 x 30 m
3d TIR 90 x 90 m
SPOT 4
3 2d Vegetation
Vegetatio
etation
1 x 1 km
2 MODIS (1999)
EOS AM-1 ORBIM
ORBIMAGE
1d Land 0.25 x 0.25 km OrbView
OrbVi 2
RADARSAT
RA
RADARS
ARSAT Land 0.50 x 0.50 km Sea WiFS
103 1000 m C-band Ocean 1 x 1 km 1.13 x 1.13 km
8 12 hr 11-9, 9 Atmo 1 x 1 km
Imaging
EOS T/Space Imagin
EOSAT/Space Im ging 25 x 28 TIR 1 x 1 km
AVHRR
AVHR
VHRR
5 IKONOS (1999
ONOS (1999) 48-30 x 28 LAC k
C 1.1 x 1.1 km
Pan
an 1 x 1 32-25 x 28 GAC k
C 4 x 4 km
MSS 4 x 4 50 x 50
3
22-19 x 28
IRS-P5 (1999)
2 63-28 x 28
Pan
an 2.5 x 2.5
2.
100 x 100
ORBIMAGE
ORBIM
102 100 min OrbView
OrbVi w 3 (1999)
Pan
(1999
an 1 x 1
8 MSS 4 x 4
1 hr OrbView
OrbVi w 4 (2000)
(2000
5 Pan
an 1 x 1 METEOSAT
METEOS
MSS 4 x 4 VISIR 2.5 x 2.5 km
Hyperspectral
Hyperspect al 8 x 8 m TIR 5 x 5 km
3
GOES
2 Aerial Photography
Photograp
VIS 1 x 1 km
TIR 8 x 8 km
0.25 x 0.25 m (0.82 x 0.82 ft.)
1 x 1 m (3.28 x 3.28 ft.)
10 10 min NWS WSR-88D
8 Doppler Radar
1 x 1 km
5 4 x 4 km
3
2 1 km
1m 2 3 5 10 15 20 30 100 m 1000 m 5 km 10 km
Figure 9.2 Spatial and temporal characteristics of commonly used remote sensing systems and their sensors (Source: after Jensen
J.R. and Cowen D.C. 1999 ‘Remote sensing of urban/suburban infrastructure and socioeconomic attributes’, Photogrammetric
Engineering and Remote Sensing 65, 611–622)
204 PART III TECHNIQUES
Water
Green vegetation
60
Dry bare soil
Reflectance (%) 50
40
Green
Blue
Red
30
0
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6
Wavelength (µm)
Figure 9.3 Typical reflectance signatures for water, green vegetation, and dry soil (Source: after Jones C. 1997 Geographic
Information Systems and Computer Cartography. Reading, MA: Addison-Wesley Longman)
make satellite data especially useful for large-area, of this point is known, all subsequent points can be
small-scale projects (for example, mapping landforms collected in this coordinate system. If it is unknown then
and geology at the river catchment-area level) and for the survey will use a local or relative coordinate system
mapping inaccessible areas. The regular repeat cycles of (see Section 5.7).
commercial systems and the fact that they record radiation Since all survey points are obtained from survey
in many parts of the spectrum make such data especially measurements, their known locations are always relative
suitable for assessing the condition of vegetation (for to other points. Any measurement errors need to be
example, the moisture stress of wheat crops). Aerial apportioned between multiple points in a survey. For
photographs in particular are very useful for detailed example, when surveying a field boundary, if the last and
surveying and mapping of, for example, urban areas first points are not identical in survey terms (within the
and archaeological sites, especially those applications tolerance employed in the survey) then errors need to be
requiring 3-D data (see Chapter 12). apportioned between all points that define the boundary
On the other hand, the spatial resolution of commercial (see Section 6.3.4). As new measurements are obtained
satellites is too coarse for many large-scale projects and these may change the locations of points.
the data collection capability of many sensors is restricted Traditionally, surveyors used equipment like transits
by cloud cover. Some of this is changing, however, as and theodolites to measure angles, and tapes and chains
the new generation of satellite sensors now provide data to measure distances. Today these have been replaced
at 0.6 m spatial resolution and better, and radar data by electro-optical devices called total stations that can
can be obtained that are not affected by cloud cover. measure both angles and distances to an accuracy of 1 mm
The data volumes from both satellites and aerial cameras (Figure 9.4). Total stations automatically log data and the
can be very large and create storage and processing most sophisticated can create vector point, line, and area
problems for all but the most modern systems. The cost objects in the field, thus providing direct validation.
of data can also be prohibitive for a single project or The basic principles of surveying have changed very
organization. little in the past 100 years, although new technology has
considerably improved accuracy and productivity. Two
people are usually required to perform a survey, one to
operate the total station and the other to hold a reflective
9.2.2 Vector data capture prism that is placed at the object being measured. On some
remote-controlled systems a single person can control
Primary vector data capture is a major source of both the total station and the prism.
geographic data. The two main branches of vector data Ground survey is a very time-consuming and expen-
capture are ground surveying and GPS – which is covered sive activity, but it is still the best way to obtain highly
in Section 5.8 – although as more surveyors use GPS accurate point locations. Surveying is typically used for
routinely the distinction between the two is becoming capturing buildings, land and property boundaries, man-
increasingly blurred. holes, and other objects that need to be located accurately.
Ground surveying is based on the principle that the 3- It is also employed to obtain reference marks for use in
D location of any point can be determined by measuring other data capture projects. For example, large-scale aerial
angles and distances from other known points. Surveys photographs and satellite images are frequently georefer-
begin from a benchmark point. If the coordinate system enced using points obtained from ground survey.
CHAPTER 9 GIS DATA COLLECTION 205
Figure 9.4 A tripod-mounted Leica TPS1100 Total Station Figure 9.5 A large-format roll-feed image scanner
(Courtesy: Leica Geosystems) (Reproduced by permission of GTCO Calcomp, Inc.)
(A)
(B)
9.3.2.1 Manual digitizing are used for small tasks, but bigger (typically 44 by
Manually operated digitizers are much the simplest, 60 inches (112 by 152 cm)) freestanding table digitizers
cheapest, and most commonly used means of capturing are preferred for larger tasks (Figure 9.7). Both types of
vector objects from hardcopy maps. Digitizers come in digitizer usually have cursors with cross hairs mounted
several designs, sizes, and shapes. They operate on the in glass and buttons to control capture. Box 9.1 describes
principle that it is possible to detect the location of a the process of table digitizing.
cursor or puck passed over a table inlaid with a fine mesh
of wires. Digitizing table accuracies typically range from Manual digitizing is still the simplest, easiest, and
0.0004 inch (0.01 mm) to 0.01 inch (0.25 mm). Small cheapest method of capturing vector data from
digitizing tablets up to 12 by 24 inches (30 by 60 cm) existing maps.
CHAPTER 9 GIS DATA COLLECTION 207
Manual digitizing
Manual digitizing involves five basic steps. generalization. This type of information is
often defined in a data capture project
1. The map document is attached to the center
specification.
of the digitizing table using sticky tape.
4. Data capture involves recording the shape of
2. Because a digitizing table uses a local
vector objects using manual or stream mode
rectilinear coordinate system, the map and
digitizing as described in Section 9.3.2.1. A
the digitizer must be registered so that
common rule for vector GIS is to press Button
vector data can be captured in real-world
2 on the digitizing cursor to start a line,
coordinates. This is achieved by digitizing a
Button 1 for each intermediate vertex, and
series of four or more well-distributed
Button 2 to finish a line. There are other
control points (also called reference points
similar rules to control how points and
or tick marks) and then entering their
polygons are captured.
real-world values. The digitizer control
software (usually the GIS) will calculate a 5. Finally, after all objects have been captured
transformation and then automatically apply it is necessary to check for any errors. Easy
this to any future coordinates that ways to do this include using software to
are captured. identify geometric errors (such as polygons
that do not close or lines that do not
3. Before proceeding with data capture it is
intersect – see Figure 9.9), and producing a
useful to spend some time examining a map
test plot that can be overlaid on the
to determine rules about which features are
original document.
to be captured at what level of
Vertices defining point, line, and polygon objects are batch or semi-interactive mode. Batch vectorization takes
captured using manual or stream digitizing methods. an entire raster file and converts it to vector objects
Manual digitizing involves placing the center point of the in a single operation. Vector objects are created using
cursor cross hairs at the location for each object vertex software algorithms that build simple (spaghetti) line
and then clicking a button on the cursor to record the strings from the original pixel values. The lines can
location of the vertex. Stream-mode digitizing partially then be further processed to create topologically correct
automates this process by instructing the digitizer control polygons (Figure 9.8). A typical map will take only a few
to collect vertices automatically every time a distance or minutes to vectorize using modern hardware and software
time threshold is crossed (e.g., every 0.02 inch (0.5 mm) systems. See Section 10.7.1 for further discussion on
or 0.25 second). Stream-mode digitizing is a much faster structuring geographic data.
method, but it typically produces larger files with many Unfortunately, batch vectorization software is far
redundant coordinates. from perfect and post-vectorization editing is usually
required to clean up errors. To avoid large amounts of
9.3.2.2 Heads-up digitizing and vector editing, it is useful to undertake a little raster
vectorization editing of the original raster file prior to vectorization to
remove unwanted noise that may affect the vectorization
One of the main reasons for scanning maps (see Section process. For example, text that overlaps lines should be
9.3.1) is as a prelude to vectorization – the process of deleted and dashed lines are best converted into solid
converting raster data into vector data. The simplest way lines. Following vectorization, topological relationships
to create vectors from raster layers is to digitize vector are usually created for the vector objects. This process
objects manually straight off a computer screen using a may also highlight some previously unnoticed errors that
mouse or digitizing cursor. This method is called heads-up require additional editing.
digitizing because the map is vertical and can be viewed Batch vectorization is best suited to simple bi-level
without bending the head down. It is widely used for the maps of, for example, contours, streams, and highways.
selective capture of, for example, land parcels, buildings, For more complicated maps and where selective vec-
and utility assets. torization is required (for example, digitizing electric
Vectorization is the process of converting raster conductors and devices, or water mains and fittings off
data into vector data. The reverse is called topographic maps), interactive vectorization (also called
semi-automatic vectorization, line following, or tracing)
rasterization.
is preferred. In interactive vectorization, software is used
A faster and more consistent approach is to use to automate digitizing. The operator snaps the cursor
software to perform automated vectorization in either to a pixel, indicates a direction for line following, and
208 PART III TECHNIQUES
(A) (B)
Figure 9.8 Batch vectorization of a scanned map: (A) original raster file; (B) vectorized polygons. Adjacent raster cells with the
same attribute values are aggregated. Class boundaries are then created at the intersection between adjacent classes in the form of
vector lines
9.3.2.4 Photogrammetry
Photogrammetry is the science and technology of making
measurements from pictures, aerial photographs, and
Figure 9.10 Error induced by data cleaning. If the tolerance images. Although in the strict sense it includes 2-D
level is set large enough to correct the errors at A and B, the
measurements taken from single aerial photographs, today
loop at C will also (incorrectly) be closed
in GIS it is almost exclusively concerned with capturing
2.5-D and 3-D measurements from models derived from
stereo-pairs of photographs and images. In the case of
aerial photographs, it is usual to have 60% overlap along
B
each flight line and 30% overlap between flight lines.
A Similar layouts are used by remote sensing satellites. The
amount of overlap defines the area for which a 3-D model
can be created.
Photogrammetry is used to capture measurements
C
from photographs and other image sources.
To obtain true georeferenced Earth coordinates from a
model, it is necessary to georeference photographs using
control points (the procedure is essentially analogous to
that described for manual digitizing in Box 9.1). Control
E points can be defined by ground survey or nowadays more
D
usually with GPS (see Section 9.2.2.1 for discussion of
these techniques).
Measurements are captured from overlapping pairs of
photographs using stereoplotters. These build a model and
allow 3-D measurements to be captured, edited, stored,
Figure 9.11 Mismatches of adjacent spatial data sources that
and plotted. Stereoplotters have undergone three major
require rubber-sheeting
generations of development: analog (optical), analytic,
and digital. Mechanical analog devices are seldom used
Many errors in digitizing can be remedied by today, whereas analytical (combined mechanical and dig-
appropriately designed software. ital) and digital (entirely computer-based) are much more
common. It is likely that digital (soft-copy) photogram-
Further classes of problems arise when the products metry will eventually replace mechanical devices entirely.
of digitizing adjacent map sheets are merged together. There are many ways to view stereo models, including
Stretching of paper base maps, coupled with errors in a split screen with a simple stereoscope, and the use of
rectifying them on a digitizing table, give rise to the kinds special glasses to observe a red/green display or polarized
of mismatches shown in Figure 9.11. Rubber-sheeting is light. To manipulate 3-D cursors in the x, y, and z
the term used to describe methods for removing such planes, photogrammetry systems offer free-moving hand
errors on the assumption that strong spatial autocorrelation controllers, hand wheels and foot disks, and 3-D mice.
exists among errors. If errors tend to be spatially The options for extracting vector objects from 3-D models
autocorrelated up to a distance of x, say, then rubber- are directly analogous to those available for manual
sheeting will be successful at removing them, at least digitizing as described above: namely batch, interactive,
partially, provided control points can be found that are and manual (Sections 9.3.2.1 and 9.3.2.2). The obvious
210 PART III TECHNIQUES
Photograph
Input
Digital Imagery Scanner
3-D Scene
Figure 9.12 Typical photogrammetry workflow (after Tao C.V. 2002 ‘Digital photogrammetry: the future of spatial data collection’,
GeoWorld. www.geoplace.com/gw/2002/0205/0205dp.asp). (Reproduced by permission of GeoTec Media)
difference, however, is that there is a requirement for area of interest. Unfortunately, the complexity and high
capturing z (elevation) values. cost of equipment have restricted its use to large-scale
primary data capture projects and specialist data capture
9.3.2.4.1 Digital photogrammetry workflow organizations.
Figure 9.12 shows a typical workflow in digital pho-
togrammetry. There are three main parts to digital pho- 9.3.2.5 COGO data entry
togrammetry workflows: data input, processing, and prod-
uct generation. Data can be obtained directly from sensors COGO, a contraction of the term coordinate geometry, is
or by scanning secondary sources. a methodology for capturing and representing geographic
Orientation and triangulation are fundamental pho- data. COGO uses survey-style bearings and distances to
togrammetry processing tasks. Orientation is the pro- define each part of an object in much the same way as
cess of creating a stereo model suitable for viewing and described in Section 9.2.2.1. Some examples of COGO
extracting 3-D vector coordinates that describe geographic object construction tools are shown in Figure 9.14. The
objects. Triangulation (also called ‘block adjustment’) is Construct Along tool creates a point along a curve using
used to assemble a collection of images into a single a distance along the curve. The Line Construct Angle
model so that accurate and consistent information can be Bisector tool constructs a line that bisects an angle defined
obtained from large areas. by a from-point, through-point, to-point, and a length. The
Photogrammetry workflows yield several important Construct Fillet tool creates a circular arc tangent from
product outputs including digital elevation models (DEMs), two segments and a radius.
contours, orthoimages, vector features, and 3-D scenes. The COGO system is widely used in North America
DEMs – regular arrays of height values – are created by to represent land records and property parcels (also
‘matching’ stereo image pairs together using a series of called lots). Coordinates can be obtained from COGO
control points. Once a DEM has been created it is rela- measurements by geometric transformation (i.e., bearings
tively straightforward to derive contours using a choice of and distances are converted into x, y coordinates).
algorithms. Orthoimages are images corrected for varia- Although COGO data obtained as part of a primary data
tions in terrain using a DEM. They have become popular capture activity are used in some projects, it is more often
because of their relatively low cost of creation (when com- the case that secondary measurements are captured from
pared with topographic maps) and ease of interpretation as hardcopy maps and documents. Source data may be in
base maps. They can also be used as accurate data sources the form of legal descriptions, records of survey, tract
for heads-up digitizing (see Section 9.3.2.2). Vector feature (housing estate) maps, or similar documents.
extraction is still an evolving field and there are no widely COGO stands for coordinate geometry. It is a
applicable fully automated methods. The most successful
vector data structure and method of data entry.
methods use a combination of spectral analysis and spatial
rules that define context, shape, proximity, etc. Finally, 3- COGO data are very precise measurements and are
D scenes can be created by merging vector features with a often regarded as the only legally acceptable definition
DEM and an orthoimage (Figure 9.13). of land parcels. Measurements are usually very detailed
In summary, photogrammetry is a very cost-effective and data capture is often time consuming. Furthermore,
data capture technique that is sometimes the only practical commonly occurring discrepancies in the data must be
method of obtaining detailed topographic data about an manually resolved by highly qualified individuals.
CHAPTER 9 GIS DATA COLLECTION 211
Construct Along
9.4 Obtaining data from external
distance sources (data transfer)
(or ratio)
curve
One major decision that needs to be faced at the start of
a GIS project is whether to build or buy part or all of a
database. All the preceding discussion has been concerned
Line Construct Angle Bisector
with techniques for building databases from primary and
to-point secondary sources. This section focuses on how to import
1/2α length or transfer data into a GIS that has been captured by
others. Some datasets are freely available, but many of
1/2α them are sold as a commodity from a variety of outlets
from- including, increasingly, Internet sites.
through-point point There are many sources and types of geographic data.
Space does not permit a comprehensive review of all geo-
Construct Fillet graphic data sources here, but a small selection of key
sources is listed in Table 9.3. In any case, the character-
segment 2 istics and availability of datasets are constantly changing
segment 1 so those seeking an up-to-date list should consult one of
radius the good online sources described below. Section 18.4.3
also discusses the characteristics of geographic informa-
Figure 9.14 Example COGO construction tools used to tion and highlights several issues to bear in mind when
represent geographic features using data collected by others.
212 PART III TECHNIQUES
Table 9.3 Examples of some digital data sources that can be imported into a GIS. NMOs = National Mapping Organizations,
USGS = United States Geologic Survey, NGA = US National Geospatial-Intelligence Agency, NASA = National Aeronautics and
Space Administration, DEM = Digital Elevation Model, EPS = US Environmental Protection Agency, WWF = World Wildlife Fund
for Nature, FEMA = Federal Emergency Management Agency, EBIS = ESRI Business Information Solutions
Basemaps
Geodetic framework Many NMOs, e.g., USGS and Ordnance Definition of framework, map projections, and
Survey geodetic transformations
General topographic map data NMOs and military agencies, e.g., NGA Many types of data at detailed to medium scales
Elevation NMOs, military agencies, and several DEMs, contours at local, regional, and global
commercial providers, e.g., USGS, SPOT levels
Image, NASA
Transportation National governments, and several Highway/street centerline databases at national
commercial vendors, e.g., TeleAtlas and levels
NAVTEQ
Hydrology NMOs and government agencies National hydrological databases are available for
many countries
Toponymy NMOs, other government agencies and Gazetteers of placenames at global and national
commercial providers levels
Satellite images Commercial and military providers, e.g., See Figure 9.2 for further details
Landsat, SPOT, IRS, IKONOS, Quickbird
Aerial photographs Many private and public agencies Scales vary widely, typically from 1:500–1:20 000
Environmental
Wetlands National agencies, e.g., US National Government wetlands inventory
Wetlands Inventory
Toxic release sites National Environmental Protection Details of thousands of toxic sites
Agencies, e.g., EPA
World eco-regions World Wildlife Fund for Nature (WWF) Habitat types, threatened areas, biological
distinctiveness
Flood zones Many national and regional government National flood risk areas
agencies, e.g., FEMA
Socio-economic
Population census National governments, with value added Typically every 10 years with annual estimates
by commercial providers
Lifestyle classifications Private agencies (e.g., CACI and Experian) Derived from population censuses and other
socio-economic data
Geodemographics Private agencies (e.g., Claritas and EBIS) Many types of data at many scales and prices
Land and property ownership National governments Street, property, and cadastral data
Administrative areas National governments Obtained from maps at scales of
1:5000–1:750 000
The best way to find geographic data is to search been created as part of national and global spatial data
the Internet. Several types of resources and technologies infrastructure initiatives (SDI).
are available to assist searching, and are described in
detail in Section 11.2. These include specialist geo- The best way to find geographic data is to search
graphic data catalogs and stores, as well as the sites the Internet using one of the specialist geolibraries
of specific geographic data vendors (some websites or SDI geographic data geoportals.
are shown in Table 9.4 and the history of one ven-
dor is described in Box 9.2). Particularly good sites are
the Data Store (www.datastore.co.uk/) and the AGI 9.4.1 Geographic data formats
(Association for Geographic Information) Resource List
(www.geo.ed.ac.uk/home/giswww.html). These sites One of the biggest problems with data obtained from
provide access to information about the characteristics external sources is that they can be encoded in many dif-
and availability of geographic data. Some also have facil- ferent formats. There are so many different geographic
ities to purchase and download data directly. Probably the data formats because no single format is appropriate for all
most useful resources for locating geographic data are the tasks and applications. It is not possible to design a format
geolibraries and geoportals (see Section 11.2) that have that supports, for example, both fast rendering in police
CHAPTER 9 GIS DATA COLLECTION 213
command and control systems, and sophisticated topo- Many GIS software systems are now able to read
logical analysis in natural resource information systems: directly AutoCAD DWG and DXF, Microstation DGN,
the two are mutually incompatible. Also, given the great and Shapefile, VPF, and many image formats. Unfortu-
diversity of geographic information a single comprehen- nately, direct read support can only easily be provided
sive format would simply be too large and cumbersome. for relatively simple product-oriented formats. Complex
The many different formats that are in use today have formats, such as SDTS, were designed for exchange pur-
evolved in response to diverse user requirements. poses and require more advanced processing before they
Given the high cost of creating databases, many tools can be viewed (e.g., multi-pass read and feature assembly
have been developed to move data between systems from several parts).
and to reuse data through open application programming Data can be transferred between systems by direct
interfaces (APIs). In the former case, the approach has
read into memory or via an intermediate
been to develop software that is able to translate data
file format.
(Figure 9.16), either by a direct read into memory, or via
an intermediate file format. In the latter case, software More than 25 organizations are involved in the stan-
developers have created open interfaces to allow access dardization of various aspects of geographic data and
to data. geoprocessing; several of them are country and domain
214 PART III TECHNIQUES
Table 9.4 Selected websites containing information about geographic data sources
AGI GIS Resource List www.geo.ed.ac.uk/home/giswww.html Indexed list of several hundred sites
The Data Store www.data-store.co.uk/ UK, European, and worldwide data catalog
Geospatial One-Stop www.geodata.gov Geoportal providing metadata and direct access to
over 50 000 datasets
MapMart www.mapmart.com/ Extensive data and imagery provider
EROS Data Center edc.usgs.gov/ US government data archive
Terraserver www.terraserver-usa.com/ High-resolution aerial imagery and topo maps
Geography Network www.GeographyNetwork.com Global online data and map services
National Geographic Society www.nationalgeographic.com Worldwide maps
GeoConnections www.connect.gc.ca/en/692-e.asp Canadian government’s geographic data over the
Web
EuroGeographics www.eurogeographics.org/eng/ Coalition of European NMOs offering topographic
01 about.asp map data
GEOWorld Data Directory www.geoplace.com List of GIS data companies
The Data Depot www.gisdatadepot.com Extensive collection of mainly free geographic data
depot
Quality Price
Figure 9.17 Relationship between quality, speed, and price in
9.5 Capturing attribute data data collection (Source: after Hohl 1997)