0% found this document useful (0 votes)
137 views18 pages

GIS Data Collection: Geographic Information Systems and Science, 2nd Edition

This document discusses GIS data collection, including primary and secondary data capture techniques. Primary data involves direct measurement through remote sensing or surveying. Secondary data is derived from other sources and involves processes like scanning, digitizing, and photogrammetry. The document outlines important practical issues for managing data capture projects, and notes that data collection is time-consuming and expensive but important for GIS. Effective planning is needed to execute data collection projects successfully.

Uploaded by

Tiffany Al
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views18 pages

GIS Data Collection: Geographic Information Systems and Science, 2nd Edition

This document discusses GIS data collection, including primary and secondary data capture techniques. Primary data involves direct measurement through remote sensing or surveying. Secondary data is derived from other sources and involves processes like scanning, digitizing, and photogrammetry. The document outlines important practical issues for managing data capture projects, and notes that data collection is time-consuming and expensive but important for GIS. Effective planning is needed to execute data collection projects successfully.

Uploaded by

Tiffany Al
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

9 GIS data collection

Data collection is one of the most time-consuming and expensive, yet


important, of GIS tasks. There are many diverse sources of geographic data
and many methods available to enter them into a GIS. The two main methods
of data collection are data capture and data transfer. It is useful to distinguish
between primary (direct measurement) and secondary (derivation from other
sources) data capture for both raster and vector data types. Data transfer
involves importing digital data from other sources. There are many practical
issues associated with planning and executing an effective GIS data collection
plan. This chapter reviews the main methods of GIS data capture and transfer
and introduces key practical management issues.

Geographic Information Systems and Science, 2nd edition Paul Longley, Michael Goodchild, David Maguire, and David Rhind.
 2005 John Wiley & Sons, Ltd. ISBNs: 0-470-87000-1 (HB); 0-470-87001-X (PB)
200 PART III TECHNIQUES

Learning Objectives a total survey station. Secondary sources are digital and
analog datasets that were originally captured for another
purpose and need to be converted into a suitable digital
After reading this chapter you will be able to: format for use in a GIS project. Typical secondary
sources include raster scanned color aerial photographs
of urban areas and United States Geological Survey
■ Describe data collection workflows; (USGS) or Institut Géographique National, France (IGN)
paper maps that can be scanned and vectorized. This
■ Understand the primary data capture classification scheme is a useful organizing framework
techniques in remote sensing and surveying; for this chapter and, more importantly, it highlights
the number of processing-stage transformations that a
dataset goes through, and therefore the opportunities for
■ Be familiar with the secondary data capture
errors to be introduced. However, the distinctions between
techniques of scanning, manual digitizing, primary and secondary, and raster and vector, are not
vectorization, photogrammetry, and COGO always easy to determine. For example, is digital satellite
remote sensing data obtained on a DVD primary or
feature construction;
secondary? Clearly the commercial satellite sensor feeds
do not run straight into GIS databases, but to ground
■ Understand the principles of data transfer, stations where the data are pre-processed onto digital
sources of digital geographic data, and media. Here it is considered primary because usually the
data has undergone only minimal transformation since
geographic data formats;
being collected by the satellite sensors and because the
characteristics of the data make them suitable for virtually
■ Analyze practical issues associated with direct use in GIS projects.
managing data capture projects.
Primary geographic data sources are captured
specifically for use in GIS by direct measurement.
Secondary sources are those reused from earlier
studies or obtained from other systems.
9.1 Introduction Both primary and secondary geographic data may
be obtained in either digital or analog format (see
Section 3.7 for a definition of analog). Analog data
GIS can contain a wide variety of geographic data types must always be digitized before being added to a
originating from many diverse sources. Data collection geographic database. Analog to digital transformation
activities for the purposes of organizing the material in may involve the scanning of paper maps or photographs,
this chapter are split into data capture (direct data input) optical character recognition (OCR) of text describing
and data transfer (input of data from other systems). From geographic object properties, or the vectorization of
the perspective of creating geographic databases, it is selected features from an image. Depending on the
convenient to classify raster and vector geographic data as format and characteristics of the digital data, considerable
primary and secondary (Table 9.1). Primary data sources reformatting and restructuring may be required prior to
are those collected in digital format specifically for use in importing into a GIS. Each of these transformations alters
a GIS project. Typical examples of primary GIS sources the original data and will introduce further uncertainty into
include raster SPOT and IKONOS Earth satellite images, the data (see Chapter 6 for discussion of uncertainty).
and vector building-survey measurements captured using This chapter describes the data sources, techniques,
and workflows involved in GIS data collection. The
Table 9.1 Classification of geographic data for data collection processes of data collection are also variously referred
purposes with examples of each type to as data capture, data automation, data conversion,
data transfer, data translation, and digitizing. Although
Raster Vector there are subtle differences between these terms, they
essentially describe the same thing, namely, adding
Primary Digital satellite GPS geographic data to a database. Data capture refers to
remote-sensing measurements direct entry. Data transfer is the importing of existing
images digital data across a network connection (Internet, wide
Digital aerial Survey area network (WAN), or local area network (LAN)) or
photographs measurements from physical media such as CD ROMs, zip disks, or
Secondary Scanned maps or Topographic diskettes. This chapter focuses on the techniques of data
photographs maps collection; of equal, perhaps more, importance to a real-
Digital elevation models Toponymy world GIS implementation are project management, cost,
from topographic (placename) legal, and organization issues. These are covered briefly
map contours databases in Section 9.6 of this chapter as a prelude to more detailed
treatment in Chapters 17 through 20.
CHAPTER 9 GIS DATA COLLECTION 201
Table 9.2 Breakdown of costs (in $1000s) for two typical
client-server GIS as estimated by the authors Planning

10 seats 100 seats

$ % $ %
Evaluation Preparation
Hardware 30 3.4 250 8.6
Software 25 2.8 200 6.9
Data 400 44.7 450 15.5
Staff 440 49.1 2000 69.0
Total 895 100 2900 100
Editing / Digitizing /
Improvement Transfer
Table 9.2 shows a breakdown of costs (in $1000s)
for two typical client-server GIS implementations: one
with 10 seats (systems) and the other with 100. The
hardware costs include desktop clients and servers only Figure 9.1 Stages in data collection projects
(i.e., not network infrastructure). The data costs assume
the purchase of a landbase (e.g., streets, parcels, and land commences with planning, followed by preparation,
marks) and digitizing assets such as pipes and fittings digitizing/transfer (here taken to mean a range of primary
(water utility), conductors and devices (electrical utility), and secondary techniques such as table digitizing, sur-
or land and property parcels (local government). Staff vey entry, scanning, and photogrammetry), editing and
costs assume that all core GIS staff will be full-time, but improvement and, finally, evaluation.
that users will be part-time. Planning is obviously important to any project and data
In the early days of GIS, when geographic data were collection is no exception. It includes establishing user
very scarce, data collection was the main project task requirements, garnering resources (staff, hardware, and
and typically it consumed the majority of the available software), and developing a project plan. Preparation is
resources. Even today data collection still remains a time- especially important in data collection projects. It involves
consuming, tedious, and expensive process. Typically it many tasks such as obtaining data, redrafting poor-quality
accounts for 15–50% of the total cost of a GIS project map sources, editing scanned map images, and removing
(Table 9.2). Data capture costs can in fact be much more noise (unwanted data such as speckles on a scanned map
significant because in many organizations (especially image). It may also involve setting up appropriate GIS
those that are government funded) staff costs are often hardware and software systems to accept data. Digitizing
assumed to be fixed and are not used in budget accounting. and transfer are the stages where the majority of the effort
Furthermore, as the majority of data capture effort and will be expended. It is naı̈ve to think that data capture
expense tends to fall at the start of projects, data capture is really just digitizing, when in fact it involves very
costs often receive greater scrutiny from senior managers. much more as discussed below. Editing and improvement
If staff costs are excluded from a GIS budget then in follows digitizing/transfer. This covers many techniques
cash expenditure terms data collection can be as much as designed to validate data, as well as correct errors and
60–85% of costs. improve quality. Evaluation, as the name suggests, is
Data capture costs can account for up to 85% of the process of identifying project successes and failures.
These may be qualitative or quantitative. Since all large
the cost of a GIS.
data projects involve multiple stages, this workflow is
After an organization has completed basic data col- iterative with earlier phases (especially a first, pilot, phase)
lection tasks, the focus of a GIS project moves on to helping to improve subsequent parts of the overall project.
data maintenance. Over the multi-year lifetime of a GIS
project, data maintenance can turn out to be a far more
complex and expensive activity than initial data collec-
tion. This is because of the high volume of update trans-
actions in many systems (for example, changes in land 9.2 Primary geographic
parcel ownership, maintenance work orders on a high- data capture
way transport network, or logging military operational
activities) and the need to manage multi-user access to
operational databases. For more information about data Primary geographic capture involves the direct measure-
maintenance, see Chapter 10. ment of objects. Digital data measurements may be input
directly into the GIS database, or can reside in a tempo-
rary file prior to input. Although the former is preferable
9.1.1 Data collection workflow as it minimizes the amount of time and the possibility of
errors, close coupling of data collection devices and GIS
In all but the simplest of projects, data collection involves databases is not always possible. Both raster and vector
a series of sequential stages (Figure 9.1). The workflow GIS primary data capture methods are available.
202 PART III TECHNIQUES

9.2.1 Raster data capture spectrum measured), for each pixel, in each image. Until
recently, remote sensing satellites typically measured a
Much the most popular form of primary raster data cap- small number of bands, in the visible part of the spec-
ture is remote sensing. Broadly speaking, remote sens- trum. More recently a number of hyperspectral systems
ing is a technique used to derive information about the have come into operation that measure very large numbers
physical, chemical, and biological properties of objects of bands across a much wider part of the spectrum.
without direct physical contact (Section 3.6). Informa- Temporal resolution, or repeat cycle, describes the
tion is derived from measurements of the amount of frequency with which images are collected for the same
electromagnetic radiation reflected, emitted, or scattered area. There are essentially two types of commercial
from objects. A variety of sensors, operating throughout remote sensing satellite: Earth-orbiting and geostationary.
the electromagnetic spectrum from visible to microwave Earth-orbiting satellites collect information about different
wavelengths, are commonly employed to obtain measure- parts of the Earth surface at regular intervals. To maximize
utility, typically orbits are polar, at a fixed altitude and
ments (see Section 3.6.1). Passive sensors are reliant on
speed, and are Sun synchronous.
reflected solar radiation or emitted terrestrial radiation;
The French SPOT (Système Probatoire d’Observation
active sensors (such as synthetic aperture radar) gener-
de la Terre) 5 satellite launched in 2002, for example,
ate their own source of electromagnetic radiation. The
passes virtually over the poles at an altitude of 822 km
platforms on which these instruments are mounted are
sensing the same location on the Earth surface during
similarly diverse. Although Earth-orbiting satellites and
daylight every 26 days. The SPOT platform carries mul-
fixed-wing aircraft are by far the most common, heli-
tiple sensors: a panchromatic sensor measuring radiation
copters, balloons, masts, and booms are also employed
in the visible part of the electromagnetic spectrum at a
(Figure 9.2). As used here, the term remote sensing sub-
spatial resolution of 2.5 by 2.5 m; a multi-spectral sen-
sumes the fields of satellite remote sensing and aerial
sor measuring green, red, and reflected infrared radiation
photography.
at a spatial resolution of 10 by 10 m; a shortwave near-
Remote sensing is the measurement of physical, infrared sensor with a resolution of 20 by 20 m; and a
vegetation sensor measuring four bands at a spatial reso-
chemical, and biological properties of objects
lution of 1000 m. The SPOT system is also able to provide
without direct contact. stereo images from which digital terrain models and 3-D
From the GIS perspective, resolution is a key physical measurements can be obtained. Each SPOT scene covers
characteristic of remote sensing systems. There are three an area of about 60 by 60 km.
aspects to resolution: spatial, spectral, and temporal. All Much of the discussion so far has focused on
sensors need to trade off spatial, spectral, and temporal commercial satellite remote sensing systems. Of equal
properties because of storage, processing, and bandwidth importance, especially in medium- to large (coarse)-scale
considerations. For further discussion of the important GIS projects, is aerial photography. Although the data
topic of resolution see also Sections 3.4, 3.6.1, 4.1, 6.4.2, products resulting from remote sensing satellites and
7.1, and 16.1. aerial photography systems are technically very similar
(i.e., they are both images) there are some significant
Three key aspects of resolution are: spatial, differences in the way data are captured and can,
spectral, and temporal. therefore, be interpreted. The most notable difference
is that aerial photographs are normally collected using
Spatial resolution refers to the size of object that can analog optical cameras (although digital cameras are
be resolved and the most usual measure is the pixel size. becoming more widely used) and then later rasterized,
Satellite remote sensing systems typically provide data usually by scanning a film negative. The quality of the
with pixel sizes in the range 0.5 m–1 km. The resolution optics of the camera and the mechanics of the scanning
of cameras used for capturing aerial photographs usually process both affect the spatial and spectral characteristics
ranges from 0.1 m–5 m. Image (scene) sizes vary quite of the resulting images. Most aerial photographs are
widely between sensors – typical ranges include 900 by collected on an ad hoc basis using cameras mounted
900 to 3000 by 3000 pixels. The total coverage of remote in airplanes flying at low altitudes (3000–9000 m) and
sensing images is usually in the range 9 by 9 to 200 are either panchromatic (black and white) or color,
by 200 km. although multi-spectral cameras/sensors operating in the
Spectral resolution refers to the parts of the elec- non-visible parts of the electromagnetic spectrum are also
tromagnetic spectrum that are measured. Since differ- used. Aerial photographs are very suitable for detailed
ent objects emit and reflect different types and amounts surveying and mapping projects.
of radiation, selecting which part of the electromagnetic An important feature of satellite and aerial photog-
spectrum to measure is critical for each application area. raphy systems is that they can provide stereo imagery
Figure 9.3 shows the spectral signatures of water, green from overlapping pairs of images. These images are used
vegetation, and dry soil. Remote sensing systems may to create a 3-D analog or digital model from which 3-D
capture data in one part of the spectrum (referred to as a coordinates, contours, and digital elevation models can be
single band) or simultaneously from several parts (multi- created (see Section 9.3.2.4).
band or multi-spectral). The radiation values are usually Satellite and aerial photograph data offer a number
normalized and resampled to give a range of integers of advantages for GIS projects. The consistency of the
from 0–255 for each band (part of the electromagnetic data and the availability of systematic global coverage
CHAPTER 9 GIS DATA COLLECTION 203

0.3 0.5 1m 5 10 20 30 100 m


107
8 15 y
5 10 y

3
5y
2 4y
3y
106 2y
8
5 1y

3 SP
SPOTT HRV
H V 1,1 22,, 3
3,, 4
180 d Pan 1
an 10 x 10
2 MSS 20 x 20
SPOT
T 5 HRG (2001;
(2001 not shown)sh IRS-1 AB
Pan 2.5 5 x 5
an 2.5 x 2.5; LISS-1 72.5 x 72.5
105 10 SWIR 20 x 20
MSS 10 x 10; 2 LISS-2 36.25 x 36.25
IRS-1 CD
8 55 d JERS-1 Pan
an 5.8 x 5.8
5.
44 d MSS 18 X 24 FRS-1, 2
FRS-1 LISS-3 23.5 x 23.5;
23.5 MIR 70 x 70
7
5 30 d L-band 18 x 18 C-band 30 x 30 WiFS 188 x 188
26 d SPIN-2
LANDS T 4,5
LANDSAT 4,
3 22 d KVR-1000 2 X 2
MSS 79 x 79
TK-350 10 X 10
2 16 d TM 30 x 30
LANDSAT
LANDS T 7 ETM+ (1999)
(1999
an 15 x 15
Pan 15; MSS 30 x 30
3
9d TIR 60 x 60
Temporal resolution in minutes

Quickbird
Quickbi
Qui kbird
d (2000)
(2000
104 10 000 0.82 x 0.82 ASTER (1999)
8 5d 3.28 x 3.28 EOS AM-1
VNIR 15 x 15 m
5 4d SWIR 30 x 30 m
3d TIR 90 x 90 m
SPOT 4
3 2d Vegetation
Vegetatio
etation
1 x 1 km
2 MODIS (1999)
EOS AM-1 ORBIM
ORBIMAGE
1d Land 0.25 x 0.25 km OrbView
OrbVi 2
RADARSAT
RA
RADARS
ARSAT Land 0.50 x 0.50 km Sea WiFS
103 1000 m C-band Ocean 1 x 1 km 1.13 x 1.13 km
8 12 hr 11-9, 9 Atmo 1 x 1 km
Imaging
EOS T/Space Imagin
EOSAT/Space Im ging 25 x 28 TIR 1 x 1 km
AVHRR
AVHR
VHRR
5 IKONOS (1999
ONOS (1999) 48-30 x 28 LAC k
C 1.1 x 1.1 km
Pan
an 1 x 1 32-25 x 28 GAC k
C 4 x 4 km
MSS 4 x 4 50 x 50
3
22-19 x 28
IRS-P5 (1999)
2 63-28 x 28
Pan
an 2.5 x 2.5
2.
100 x 100
ORBIMAGE
ORBIM
102 100 min OrbView
OrbVi w 3 (1999)
Pan
(1999
an 1 x 1
8 MSS 4 x 4
1 hr OrbView
OrbVi w 4 (2000)
(2000
5 Pan
an 1 x 1 METEOSAT
METEOS
MSS 4 x 4 VISIR 2.5 x 2.5 km
Hyperspectral
Hyperspect al 8 x 8 m TIR 5 x 5 km
3
GOES
2 Aerial Photography
Photograp
VIS 1 x 1 km
TIR 8 x 8 km
0.25 x 0.25 m (0.82 x 0.82 ft.)
1 x 1 m (3.28 x 3.28 ft.)
10 10 min NWS WSR-88D
8 Doppler Radar
1 x 1 km
5 4 x 4 km

3
2 1 km
1m 2 3 5 10 15 20 30 100 m 1000 m 5 km 10 km

0.2 0.3 0.5 .8 1.0 2 3 5 10 2 3 5 102 2 3 5 103 2 34 5 8 104

Nominal spatial resolution in meters

Figure 9.2 Spatial and temporal characteristics of commonly used remote sensing systems and their sensors (Source: after Jensen
J.R. and Cowen D.C. 1999 ‘Remote sensing of urban/suburban infrastructure and socioeconomic attributes’, Photogrammetric
Engineering and Remote Sensing 65, 611–622)
204 PART III TECHNIQUES

Water
Green vegetation
60
Dry bare soil

Reflectance (%) 50

40

Green
Blue

Red
30

20 Near infrared Middle infrared


10

0
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6
Wavelength (µm)

Figure 9.3 Typical reflectance signatures for water, green vegetation, and dry soil (Source: after Jones C. 1997 Geographic
Information Systems and Computer Cartography. Reading, MA: Addison-Wesley Longman)

make satellite data especially useful for large-area, of this point is known, all subsequent points can be
small-scale projects (for example, mapping landforms collected in this coordinate system. If it is unknown then
and geology at the river catchment-area level) and for the survey will use a local or relative coordinate system
mapping inaccessible areas. The regular repeat cycles of (see Section 5.7).
commercial systems and the fact that they record radiation Since all survey points are obtained from survey
in many parts of the spectrum make such data especially measurements, their known locations are always relative
suitable for assessing the condition of vegetation (for to other points. Any measurement errors need to be
example, the moisture stress of wheat crops). Aerial apportioned between multiple points in a survey. For
photographs in particular are very useful for detailed example, when surveying a field boundary, if the last and
surveying and mapping of, for example, urban areas first points are not identical in survey terms (within the
and archaeological sites, especially those applications tolerance employed in the survey) then errors need to be
requiring 3-D data (see Chapter 12). apportioned between all points that define the boundary
On the other hand, the spatial resolution of commercial (see Section 6.3.4). As new measurements are obtained
satellites is too coarse for many large-scale projects and these may change the locations of points.
the data collection capability of many sensors is restricted Traditionally, surveyors used equipment like transits
by cloud cover. Some of this is changing, however, as and theodolites to measure angles, and tapes and chains
the new generation of satellite sensors now provide data to measure distances. Today these have been replaced
at 0.6 m spatial resolution and better, and radar data by electro-optical devices called total stations that can
can be obtained that are not affected by cloud cover. measure both angles and distances to an accuracy of 1 mm
The data volumes from both satellites and aerial cameras (Figure 9.4). Total stations automatically log data and the
can be very large and create storage and processing most sophisticated can create vector point, line, and area
problems for all but the most modern systems. The cost objects in the field, thus providing direct validation.
of data can also be prohibitive for a single project or The basic principles of surveying have changed very
organization. little in the past 100 years, although new technology has
considerably improved accuracy and productivity. Two
people are usually required to perform a survey, one to
operate the total station and the other to hold a reflective
9.2.2 Vector data capture prism that is placed at the object being measured. On some
remote-controlled systems a single person can control
Primary vector data capture is a major source of both the total station and the prism.
geographic data. The two main branches of vector data Ground survey is a very time-consuming and expen-
capture are ground surveying and GPS – which is covered sive activity, but it is still the best way to obtain highly
in Section 5.8 – although as more surveyors use GPS accurate point locations. Surveying is typically used for
routinely the distinction between the two is becoming capturing buildings, land and property boundaries, man-
increasingly blurred. holes, and other objects that need to be located accurately.
Ground surveying is based on the principle that the 3- It is also employed to obtain reference marks for use in
D location of any point can be determined by measuring other data capture projects. For example, large-scale aerial
angles and distances from other known points. Surveys photographs and satellite images are frequently georefer-
begin from a benchmark point. If the coordinate system enced using points obtained from ground survey.
CHAPTER 9 GIS DATA COLLECTION 205

Figure 9.4 A tripod-mounted Leica TPS1100 Total Station Figure 9.5 A large-format roll-feed image scanner
(Courtesy: Leica Geosystems) (Reproduced by permission of GTCO Calcomp, Inc.)

and the resolution required, it can take from 30 seconds


9.3 Secondary geographic to 30 minutes or more to scan a map.
data capture Scanned maps and documents are used extensively
in GIS as background maps and data stores.
Geographic data capture from secondary sources is the There are three main reasons to scan hardcopy media
process of creating raster and vector files and databases for use in GIS:
from maps, photographs, and other hard-copy documents.
Scanning is used to capture raster data. Table digitizing, ■ Documents, such as building plans, CAD drawings,
heads-up digitizing, stereo-photogrammetry, and COGO property deeds, and equipment photographs are
data entry are used for vector data. scanned to reduced wear and tear, improve access,
provide integrated database storage, and to index them
geographically (e.g., building plans can be attached to
9.3.1 Raster data capture using building objects in geographic space).
scanners ■ Film and paper maps, aerial photographs, and images
are scanned and georeferenced so that they provide
A scanner is a device that converts hard-copy analog geographic context for other data (typically vector
media into digital images by scanning successive lines layers). This type of unintelligent image or
across a map or document and recording the amount of background geographic wall-paper is very popular in
light reflected from a local data source (Figure 9.5). The systems that manage equipment and land and property
differences in reflected light are normally scaled into bi- assets (Figure 9.6).
level black and white (1 bit per pixel), or multiple gray ■ Maps, aerial photographs, and images are scanned
levels (8, 16, or 32 bits). Color scanners output data prior to vectorization (see below), and sometimes as a
into 8-bit red, green, and blue color bands. The spatial prelude to spatial analysis.
resolution of scanners varies widely from as little as
200 dpi (8 dots per mm) to 2400 dpi (96 dots per mm) and An 8 bit (256 gray level) 400 dpi (16 dots per mm)
beyond. Most GIS scanning is in the range 400–900 dpi scanner is a good choice for scanning maps for use as
(16–40 dots per mm). Depending on the type of scanner a background GIS reference layer. For a color aerial
206 PART III TECHNIQUES

(A)

(B)

Figure 9.6 An example of raster background data (black and


white aerial photography) underneath vector data (land parcels)

photograph that is to be used for subsequent photo-


interpretation and analysis, a color (8 bit for each of
three bands) 900 dpi (40 dots per mm) scanner is more
appropriate. The quality of data output from a scanner
is determined by the nature of the original source
material, the quality of the scanning device, and the
type of preparation prior to scanning (e.g., redrafting
key features or removing unwanted marks will improve
output quality).

9.3.2 Vector data capture


Secondary vector data capture involves digitizing vector
objects from maps and other geographic data sources. The
most popular methods are manual digitizing, heads-up
digitizing and vectorization, photogrammetry, and COGO Figure 9.7 Digitizing equipment: (A) Digitizing table,
data entry. (B) cursor (Reproduced by permission of GTCO Calcomp, Inc.)

9.3.2.1 Manual digitizing are used for small tasks, but bigger (typically 44 by
Manually operated digitizers are much the simplest, 60 inches (112 by 152 cm)) freestanding table digitizers
cheapest, and most commonly used means of capturing are preferred for larger tasks (Figure 9.7). Both types of
vector objects from hardcopy maps. Digitizers come in digitizer usually have cursors with cross hairs mounted
several designs, sizes, and shapes. They operate on the in glass and buttons to control capture. Box 9.1 describes
principle that it is possible to detect the location of a the process of table digitizing.
cursor or puck passed over a table inlaid with a fine mesh
of wires. Digitizing table accuracies typically range from Manual digitizing is still the simplest, easiest, and
0.0004 inch (0.01 mm) to 0.01 inch (0.25 mm). Small cheapest method of capturing vector data from
digitizing tablets up to 12 by 24 inches (30 by 60 cm) existing maps.
CHAPTER 9 GIS DATA COLLECTION 207

Technical Box 9.1

Manual digitizing
Manual digitizing involves five basic steps. generalization. This type of information is
often defined in a data capture project
1. The map document is attached to the center
specification.
of the digitizing table using sticky tape.
4. Data capture involves recording the shape of
2. Because a digitizing table uses a local
vector objects using manual or stream mode
rectilinear coordinate system, the map and
digitizing as described in Section 9.3.2.1. A
the digitizer must be registered so that
common rule for vector GIS is to press Button
vector data can be captured in real-world
2 on the digitizing cursor to start a line,
coordinates. This is achieved by digitizing a
Button 1 for each intermediate vertex, and
series of four or more well-distributed
Button 2 to finish a line. There are other
control points (also called reference points
similar rules to control how points and
or tick marks) and then entering their
polygons are captured.
real-world values. The digitizer control
software (usually the GIS) will calculate a 5. Finally, after all objects have been captured
transformation and then automatically apply it is necessary to check for any errors. Easy
this to any future coordinates that ways to do this include using software to
are captured. identify geometric errors (such as polygons
that do not close or lines that do not
3. Before proceeding with data capture it is
intersect – see Figure 9.9), and producing a
useful to spend some time examining a map
test plot that can be overlaid on the
to determine rules about which features are
original document.
to be captured at what level of

Vertices defining point, line, and polygon objects are batch or semi-interactive mode. Batch vectorization takes
captured using manual or stream digitizing methods. an entire raster file and converts it to vector objects
Manual digitizing involves placing the center point of the in a single operation. Vector objects are created using
cursor cross hairs at the location for each object vertex software algorithms that build simple (spaghetti) line
and then clicking a button on the cursor to record the strings from the original pixel values. The lines can
location of the vertex. Stream-mode digitizing partially then be further processed to create topologically correct
automates this process by instructing the digitizer control polygons (Figure 9.8). A typical map will take only a few
to collect vertices automatically every time a distance or minutes to vectorize using modern hardware and software
time threshold is crossed (e.g., every 0.02 inch (0.5 mm) systems. See Section 10.7.1 for further discussion on
or 0.25 second). Stream-mode digitizing is a much faster structuring geographic data.
method, but it typically produces larger files with many Unfortunately, batch vectorization software is far
redundant coordinates. from perfect and post-vectorization editing is usually
required to clean up errors. To avoid large amounts of
9.3.2.2 Heads-up digitizing and vector editing, it is useful to undertake a little raster
vectorization editing of the original raster file prior to vectorization to
remove unwanted noise that may affect the vectorization
One of the main reasons for scanning maps (see Section process. For example, text that overlaps lines should be
9.3.1) is as a prelude to vectorization – the process of deleted and dashed lines are best converted into solid
converting raster data into vector data. The simplest way lines. Following vectorization, topological relationships
to create vectors from raster layers is to digitize vector are usually created for the vector objects. This process
objects manually straight off a computer screen using a may also highlight some previously unnoticed errors that
mouse or digitizing cursor. This method is called heads-up require additional editing.
digitizing because the map is vertical and can be viewed Batch vectorization is best suited to simple bi-level
without bending the head down. It is widely used for the maps of, for example, contours, streams, and highways.
selective capture of, for example, land parcels, buildings, For more complicated maps and where selective vec-
and utility assets. torization is required (for example, digitizing electric
Vectorization is the process of converting raster conductors and devices, or water mains and fittings off
data into vector data. The reverse is called topographic maps), interactive vectorization (also called
semi-automatic vectorization, line following, or tracing)
rasterization.
is preferred. In interactive vectorization, software is used
A faster and more consistent approach is to use to automate digitizing. The operator snaps the cursor
software to perform automated vectorization in either to a pixel, indicates a direction for line following, and
208 PART III TECHNIQUES

(A) (B)

Figure 9.8 Batch vectorization of a scanned map: (A) original raster file; (B) vectorized polygons. Adjacent raster cells with the
same attribute values are aggregated. Class boundaries are then created at the intersection between adjacent classes in the form of
vector lines

the software then automatically digitizes lines. Typically, Overshoot


many parameters can be tuned to control the density of
points (level of generalization), the size of gaps (blank (A)
pixels in a line) that will be jumped, and whether to pause B
A
at junctions for operator intervention or always to trace
in a specific direction (most systems require that all poly-
gons are ordered either clockwise or counterclockwise). C
Interactive vectorization is still quite labor intensive, but
generally it results in much greater productivity than man-
ual or heads-up digitizing. It also produces high-quality
D E
data, as software is able to represent lines more accu-
rately and consistently than can humans. It is for these
reasons that specialized data capture groups much prefer
vectorization to manual digitizing. Undershoot
(B)
B
A
9.3.2.3 Measurement error
Data capture, like all geographic workflows, is likely to
generate errors. Because digitizing is a tedious and hence C
error-prone practice, it presents a source of measurement Dangling
segment
errors – as when the operator fails to position the cur-
sor correctly, or fails to record line segments. Figure 9.9 D E
presents some examples of human errors that are com-
monly introduced in the digitizing procedure. They are:
overshoots and undershoots where line intersections are
inexact (Figure 9.9A); invalid polygons which are topo-
logically inconsistent because of omission of one or more (C)
lines, or omission of tag data (Figure 9.9B); and sliver B
A
polygons, in which multiple digitizing of the common
boundary between adjacent polygons leads to the creation
of additional polygons (Figure 9.9C). C
Sliver
Most GIS packages include standard software func- polygon
tions, which can be used to restore integrity and clean
(or rather obscure, depending upon your viewpoint!) E
obvious measurement errors. Such operations are best D
carried out immediately after digitizing, in order that
omissions may be easily rectified. Data cleaning oper-
ations require sensitive setting of threshold values, or Figure 9.9 Examples of human errors in digitizing:
else damage can be done to real-world features, as (A) undershoots and overshoots; (B) invalid polygons; and
Figure 9.10 shows. (C) sliver polygons
CHAPTER 9 GIS DATA COLLECTION 209
spaced less than x apart. For the same reason, the shapes
of features that are less than x across will tend to have
little distortion, while very large shapes may be badly
B distorted. The results of calculating areas (Section 14.3),
or other geometric operations that rely only on relative
A
position, will be accurate as long as the areas are small,
but will grow rapidly with feature size. Thus it is
important for the user of a GIS to know which operations
depend on relative position, and over what distance; and
where absolute position is important (of course, the term
absolute simply means relative to the Earth frame, defined
by the Equator and the Greenwich Meridian, or relative
C over a very long distance: see Section 5.6). Analogous
procedures and problems characterize the rectification of
raster datasets – be they scanned images of paper maps
or satellite measurements of the curved Earth surface.

9.3.2.4 Photogrammetry
Photogrammetry is the science and technology of making
measurements from pictures, aerial photographs, and
Figure 9.10 Error induced by data cleaning. If the tolerance images. Although in the strict sense it includes 2-D
level is set large enough to correct the errors at A and B, the
measurements taken from single aerial photographs, today
loop at C will also (incorrectly) be closed
in GIS it is almost exclusively concerned with capturing
2.5-D and 3-D measurements from models derived from
stereo-pairs of photographs and images. In the case of
aerial photographs, it is usual to have 60% overlap along
B
each flight line and 30% overlap between flight lines.
A Similar layouts are used by remote sensing satellites. The
amount of overlap defines the area for which a 3-D model
can be created.
Photogrammetry is used to capture measurements
C
from photographs and other image sources.
To obtain true georeferenced Earth coordinates from a
model, it is necessary to georeference photographs using
control points (the procedure is essentially analogous to
that described for manual digitizing in Box 9.1). Control
E points can be defined by ground survey or nowadays more
D
usually with GPS (see Section 9.2.2.1 for discussion of
these techniques).
Measurements are captured from overlapping pairs of
photographs using stereoplotters. These build a model and
allow 3-D measurements to be captured, edited, stored,
Figure 9.11 Mismatches of adjacent spatial data sources that
and plotted. Stereoplotters have undergone three major
require rubber-sheeting
generations of development: analog (optical), analytic,
and digital. Mechanical analog devices are seldom used
Many errors in digitizing can be remedied by today, whereas analytical (combined mechanical and dig-
appropriately designed software. ital) and digital (entirely computer-based) are much more
common. It is likely that digital (soft-copy) photogram-
Further classes of problems arise when the products metry will eventually replace mechanical devices entirely.
of digitizing adjacent map sheets are merged together. There are many ways to view stereo models, including
Stretching of paper base maps, coupled with errors in a split screen with a simple stereoscope, and the use of
rectifying them on a digitizing table, give rise to the kinds special glasses to observe a red/green display or polarized
of mismatches shown in Figure 9.11. Rubber-sheeting is light. To manipulate 3-D cursors in the x, y, and z
the term used to describe methods for removing such planes, photogrammetry systems offer free-moving hand
errors on the assumption that strong spatial autocorrelation controllers, hand wheels and foot disks, and 3-D mice.
exists among errors. If errors tend to be spatially The options for extracting vector objects from 3-D models
autocorrelated up to a distance of x, say, then rubber- are directly analogous to those available for manual
sheeting will be successful at removing them, at least digitizing as described above: namely batch, interactive,
partially, provided control points can be found that are and manual (Sections 9.3.2.1 and 9.3.2.2). The obvious
210 PART III TECHNIQUES

Photograph
Input
Digital Imagery Scanner

Processing Orientation & Triangulation

DEM Orthoimagery Feature Extraction


Product
Generation

Contour Map Vectors

3-D Scene

Figure 9.12 Typical photogrammetry workflow (after Tao C.V. 2002 ‘Digital photogrammetry: the future of spatial data collection’,
GeoWorld. www.geoplace.com/gw/2002/0205/0205dp.asp). (Reproduced by permission of GeoTec Media)

difference, however, is that there is a requirement for area of interest. Unfortunately, the complexity and high
capturing z (elevation) values. cost of equipment have restricted its use to large-scale
primary data capture projects and specialist data capture
9.3.2.4.1 Digital photogrammetry workflow organizations.
Figure 9.12 shows a typical workflow in digital pho-
togrammetry. There are three main parts to digital pho- 9.3.2.5 COGO data entry
togrammetry workflows: data input, processing, and prod-
uct generation. Data can be obtained directly from sensors COGO, a contraction of the term coordinate geometry, is
or by scanning secondary sources. a methodology for capturing and representing geographic
Orientation and triangulation are fundamental pho- data. COGO uses survey-style bearings and distances to
togrammetry processing tasks. Orientation is the pro- define each part of an object in much the same way as
cess of creating a stereo model suitable for viewing and described in Section 9.2.2.1. Some examples of COGO
extracting 3-D vector coordinates that describe geographic object construction tools are shown in Figure 9.14. The
objects. Triangulation (also called ‘block adjustment’) is Construct Along tool creates a point along a curve using
used to assemble a collection of images into a single a distance along the curve. The Line Construct Angle
model so that accurate and consistent information can be Bisector tool constructs a line that bisects an angle defined
obtained from large areas. by a from-point, through-point, to-point, and a length. The
Photogrammetry workflows yield several important Construct Fillet tool creates a circular arc tangent from
product outputs including digital elevation models (DEMs), two segments and a radius.
contours, orthoimages, vector features, and 3-D scenes. The COGO system is widely used in North America
DEMs – regular arrays of height values – are created by to represent land records and property parcels (also
‘matching’ stereo image pairs together using a series of called lots). Coordinates can be obtained from COGO
control points. Once a DEM has been created it is rela- measurements by geometric transformation (i.e., bearings
tively straightforward to derive contours using a choice of and distances are converted into x, y coordinates).
algorithms. Orthoimages are images corrected for varia- Although COGO data obtained as part of a primary data
tions in terrain using a DEM. They have become popular capture activity are used in some projects, it is more often
because of their relatively low cost of creation (when com- the case that secondary measurements are captured from
pared with topographic maps) and ease of interpretation as hardcopy maps and documents. Source data may be in
base maps. They can also be used as accurate data sources the form of legal descriptions, records of survey, tract
for heads-up digitizing (see Section 9.3.2.2). Vector feature (housing estate) maps, or similar documents.
extraction is still an evolving field and there are no widely COGO stands for coordinate geometry. It is a
applicable fully automated methods. The most successful
vector data structure and method of data entry.
methods use a combination of spectral analysis and spatial
rules that define context, shape, proximity, etc. Finally, 3- COGO data are very precise measurements and are
D scenes can be created by merging vector features with a often regarded as the only legally acceptable definition
DEM and an orthoimage (Figure 9.13). of land parcels. Measurements are usually very detailed
In summary, photogrammetry is a very cost-effective and data capture is often time consuming. Furthermore,
data capture technique that is sometimes the only practical commonly occurring discrepancies in the data must be
method of obtaining detailed topographic data about an manually resolved by highly qualified individuals.
CHAPTER 9 GIS DATA COLLECTION 211

Figure 9.13 Example 3-D scene as generated from a photogrammetry workflow

Construct Along
9.4 Obtaining data from external
distance sources (data transfer)
(or ratio)
curve
One major decision that needs to be faced at the start of
a GIS project is whether to build or buy part or all of a
database. All the preceding discussion has been concerned
Line Construct Angle Bisector
with techniques for building databases from primary and
to-point secondary sources. This section focuses on how to import
1/2α length or transfer data into a GIS that has been captured by
others. Some datasets are freely available, but many of
1/2α them are sold as a commodity from a variety of outlets
from- including, increasingly, Internet sites.
through-point point There are many sources and types of geographic data.
Space does not permit a comprehensive review of all geo-
Construct Fillet graphic data sources here, but a small selection of key
sources is listed in Table 9.3. In any case, the character-
segment 2 istics and availability of datasets are constantly changing
segment 1 so those seeking an up-to-date list should consult one of
radius the good online sources described below. Section 18.4.3
also discusses the characteristics of geographic informa-
Figure 9.14 Example COGO construction tools used to tion and highlights several issues to bear in mind when
represent geographic features using data collected by others.
212 PART III TECHNIQUES
Table 9.3 Examples of some digital data sources that can be imported into a GIS. NMOs = National Mapping Organizations,
USGS = United States Geologic Survey, NGA = US National Geospatial-Intelligence Agency, NASA = National Aeronautics and
Space Administration, DEM = Digital Elevation Model, EPS = US Environmental Protection Agency, WWF = World Wildlife Fund
for Nature, FEMA = Federal Emergency Management Agency, EBIS = ESRI Business Information Solutions

Type Source Details

Basemaps
Geodetic framework Many NMOs, e.g., USGS and Ordnance Definition of framework, map projections, and
Survey geodetic transformations
General topographic map data NMOs and military agencies, e.g., NGA Many types of data at detailed to medium scales
Elevation NMOs, military agencies, and several DEMs, contours at local, regional, and global
commercial providers, e.g., USGS, SPOT levels
Image, NASA
Transportation National governments, and several Highway/street centerline databases at national
commercial vendors, e.g., TeleAtlas and levels
NAVTEQ
Hydrology NMOs and government agencies National hydrological databases are available for
many countries
Toponymy NMOs, other government agencies and Gazetteers of placenames at global and national
commercial providers levels
Satellite images Commercial and military providers, e.g., See Figure 9.2 for further details
Landsat, SPOT, IRS, IKONOS, Quickbird
Aerial photographs Many private and public agencies Scales vary widely, typically from 1:500–1:20 000
Environmental
Wetlands National agencies, e.g., US National Government wetlands inventory
Wetlands Inventory
Toxic release sites National Environmental Protection Details of thousands of toxic sites
Agencies, e.g., EPA
World eco-regions World Wildlife Fund for Nature (WWF) Habitat types, threatened areas, biological
distinctiveness
Flood zones Many national and regional government National flood risk areas
agencies, e.g., FEMA
Socio-economic
Population census National governments, with value added Typically every 10 years with annual estimates
by commercial providers
Lifestyle classifications Private agencies (e.g., CACI and Experian) Derived from population censuses and other
socio-economic data
Geodemographics Private agencies (e.g., Claritas and EBIS) Many types of data at many scales and prices
Land and property ownership National governments Street, property, and cadastral data
Administrative areas National governments Obtained from maps at scales of
1:5000–1:750 000

The best way to find geographic data is to search been created as part of national and global spatial data
the Internet. Several types of resources and technologies infrastructure initiatives (SDI).
are available to assist searching, and are described in
detail in Section 11.2. These include specialist geo- The best way to find geographic data is to search
graphic data catalogs and stores, as well as the sites the Internet using one of the specialist geolibraries
of specific geographic data vendors (some websites or SDI geographic data geoportals.
are shown in Table 9.4 and the history of one ven-
dor is described in Box 9.2). Particularly good sites are
the Data Store (www.datastore.co.uk/) and the AGI 9.4.1 Geographic data formats
(Association for Geographic Information) Resource List
(www.geo.ed.ac.uk/home/giswww.html). These sites One of the biggest problems with data obtained from
provide access to information about the characteristics external sources is that they can be encoded in many dif-
and availability of geographic data. Some also have facil- ferent formats. There are so many different geographic
ities to purchase and download data directly. Probably the data formats because no single format is appropriate for all
most useful resources for locating geographic data are the tasks and applications. It is not possible to design a format
geolibraries and geoportals (see Section 11.2) that have that supports, for example, both fast rendering in police
CHAPTER 9 GIS DATA COLLECTION 213

Biographical Box 9.2

Don Cooke, geographic data provider


Don Cooke (Figure 9.15) took a part-time job with the New Haven Census
Use Study while finishing his senior year at Yale in 1967. Cooke’s three
years of Army artillery survey plus an introductory Fortran class gave him
GIS credentials typical of most people in the field at the time. Cooke and Bill
Maxfield were charged with making computer maps of census and local
data. It quickly became apparent that computerized base maps linking
census geometry, street addressing, and coordinates were a prerequisite
to computer mapping. DIME (Dual Independent Map Encoding) was their
solution, probably the first implementation of a topological data structure
Figure 9.15 Don Cooke,
with redundant encoding for error correction. Cooke, Maxfield, and Jack
geographic data provider
Sweeney founded Urban Data Processing, Inc. (UDP) in 1968 to bring
geocoding, computer mapping, and demographic analysis to the private
sector. The Census Bureau adopted DIME which evolved into the nationwide TIGER database during
the 1980s.
When Harte-Hanks bought UDP in 1980, Cooke founded Geographic Data Technology (GDT) to
commercialize Census DIME and later TIGER files. By the late 1990s, GDT had grown to 500 employees
and in 2004 was acquired by TeleAtlas. Cooke remains in his role as Founder, and the TeleAtlas North
America operation (effectively a combination of ETAK and GDT) faces NAVTEQ as a competitor in GIS and
Navigation markets.
Cooke served on the National Academy of Sciences Mapping Science Committee for four years, and
on the Board of the Urban and Regional Information Systems Association (URISA), where he founded the
first Special Interest Group (SIG) focusing on GIS. He is an active proponent of GIS and GPS technology in
education at all levels. His leadership in this area helped GDT win ‘School-to-Careers Company of the Year’
recognition from the National Alliance of Business. He is the author of ‘Fun with GPS’, a GIS primer written
for owners of consumer GPS receivers.
On the subject of the current state of GIS Don says: ‘Suddenly it seems really easy to explain what GIS
is. Most people have some contact or context; they’ve used MapQuest or know someone who has a GPS.
People with GPS think nothing of finding mapping services through Google; they overlay their GPS tracks
and points on USGS Digital Raster Graphics and Digital Ortho Quarter Quads without even knowing those
terms or messing with GeoSpatial One-Stop (see Box 11.4).’
Thinking about the future, he muses: ‘I like to picture a near-term future where every high-school
graduate has collected GPS data for a project and mapped it with GIS; we’re already there in some schools.
The best thing about this is more often than not their mapping has been for a community project and
they’ve seen through the experience how they can participate in and contribute to their community.’

command and control systems, and sophisticated topo- Many GIS software systems are now able to read
logical analysis in natural resource information systems: directly AutoCAD DWG and DXF, Microstation DGN,
the two are mutually incompatible. Also, given the great and Shapefile, VPF, and many image formats. Unfortu-
diversity of geographic information a single comprehen- nately, direct read support can only easily be provided
sive format would simply be too large and cumbersome. for relatively simple product-oriented formats. Complex
The many different formats that are in use today have formats, such as SDTS, were designed for exchange pur-
evolved in response to diverse user requirements. poses and require more advanced processing before they
Given the high cost of creating databases, many tools can be viewed (e.g., multi-pass read and feature assembly
have been developed to move data between systems from several parts).
and to reuse data through open application programming Data can be transferred between systems by direct
interfaces (APIs). In the former case, the approach has
read into memory or via an intermediate
been to develop software that is able to translate data
file format.
(Figure 9.16), either by a direct read into memory, or via
an intermediate file format. In the latter case, software More than 25 organizations are involved in the stan-
developers have created open interfaces to allow access dardization of various aspects of geographic data and
to data. geoprocessing; several of them are country and domain
214 PART III TECHNIQUES
Table 9.4 Selected websites containing information about geographic data sources

Source URL Description

AGI GIS Resource List www.geo.ed.ac.uk/home/giswww.html Indexed list of several hundred sites
The Data Store www.data-store.co.uk/ UK, European, and worldwide data catalog
Geospatial One-Stop www.geodata.gov Geoportal providing metadata and direct access to
over 50 000 datasets
MapMart www.mapmart.com/ Extensive data and imagery provider
EROS Data Center edc.usgs.gov/ US government data archive
Terraserver www.terraserver-usa.com/ High-resolution aerial imagery and topo maps
Geography Network www.GeographyNetwork.com Global online data and map services
National Geographic Society www.nationalgeographic.com Worldwide maps
GeoConnections www.connect.gc.ca/en/692-e.asp Canadian government’s geographic data over the
Web
EuroGeographics www.eurogeographics.org/eng/ Coalition of European NMOs offering topographic
01 about.asp map data
GEOWorld Data Directory www.geoplace.com List of GIS data companies
The Data Depot www.gisdatadepot.com Extensive collection of mainly free geographic data
depot

Having obtained a potentially useful source of geo-


graphic information the next task is to import it into a
GIS database. If the data are already in the native format
of the target GIS software system, or the software has a
direct read capability for the format in question, then this
is a relatively straightforward task. If the data are not com-
patible with the target GIS software then the alternatives
are to ask the data supplier to convert the data to a com-
patible format, or to use a third-party translation software
system, such as the Feature Manipulation Engine from
Safe Software (www.safe.com lists over 60 supported
geographic data formats) to convert the data. Geographic
data translation software must address both syntactic and
semantic translation issues. Syntactic translation involves
converting specific digital symbols (letters and numbers)
between systems. Semantic translation is concerned with
converting the meaning inherent in geographic informa-
Figure 9.16 Comparison of data access by translation and
tion. While the former is relatively simple to encode and
direct read decode, the latter is much more difficult and has seldom
met with much success to date.
Although the task of translating geographic informa-
specific. At the global level, the ISO (International tion between systems was described earlier as relatively
Standards Organization) is responsible for coordinating straightforward, those that have tried this in practice will
efforts through the work of technical committees TC realize that things on the ground are seldom quite so
211 and 287. In Europe, CEN (Comité Européen de simple. Any number of things can (and do!) go wrong.
Normalisation) is engaged in geographic standardiza- These range from corrupted media, to incomplete data
tion. At the national level, there are many complemen- files, wrong versions of translators, and different interpre-
tary bodies. One other standards-forming organization of tations of a format specification, to basic user error.
particular note is OGC (Open Geospatial Consortium: There are two basic strategies used for data translation:
www.opengeospatial.org), a group of vendors, aca- one is direct and the other uses a neutral intermediate
demics, and users interested in the interoperability of format. For small systems that involve the translation of a
geographic systems (see Box 11.1). To date there have small number of formats, the first is the simplest. Directly
been promising OGC-coordinated efforts to standardize translating data back and forth between the internal
on simple feature access (simple geometric object types), structures of two systems requires two new translators
metadata catalogs, and Web access. (A to B, B to A). Adding two further systems will require
The most efficient way to translate data between 12 translators to share data between all systems (A to
systems is usually via a common intermediate
B, A to C, A to D, B to A, B to C, B to D, C to
A, C to B, C to D, D to A, D to B, and D to C). A
file format.
more efficient way of solving this problem is to use the
CHAPTER 9 GIS DATA COLLECTION 215
concept of a data switchyard and a common intermediate Speed
file format. Systems now need only to translate to and
from the common format. The four systems will now
need only eight translators instead of 12 (A to Neutral,
B to Neutral, C to Neutral, D to Neutral, Neutral to A,
Neutral to B, Neutral to C, and Neutral to D). The more
systems there are the more efficient this becomes. This is
one of the key principles underlying the need for common
file interchange formats.

Quality Price
Figure 9.17 Relationship between quality, speed, and price in
9.5 Capturing attribute data data collection (Source: after Hohl 1997)

In any data collection project there is a fundamental


All geographic objects have attributes of one type or tradeoff between quality, speed, and price. Collecting
another. Although attributes can be collected at the same high-quality data quickly is possible, but it is also
time as vector geometry, it is usually more cost-effective very expensive. If price is a key consideration then
to capture attributes separately. In part, this is because lower-quality data can be collected over a longer period
attribute data capture is a relatively simple task that can be (Figure 9.17).
undertaken by lower-cost clerical staff. It is also because GIS data collection projects can be carried out
attributes can be entered by direct data loggers, manual intensively or over a longer period. A key decision facing
keyboard entry, optical character recognition (OCR) or, managers of such projects is whether to pursue a strategy
increasingly, voice recognition, which do not require of incremental or very rapid collection. Incremental data
expensive hardware and software systems. Much the most collection involves breaking the data collection project
common method is direct keyboard data entry into a into small manageable subprojects. This allows data
spreadsheet or database. For some projects, a custom data collection to be undertaken with lower annual resource
entry form with in-built validation is preferred. On small and funding levels (although total project resource
projects single entry is used, but for larger, more complex requirements may be larger). It is a good approach for
projects data are entered twice and then compared as a inexperienced organizations that are embarking on their
validation check. first data collection project because they can learn and
An essential requirement for separate data entry is a adapt as the project proceeds. On the other hand, these
common identifier (also called a key) that can be used to longer-term projects run the risk of employee turnover
relate object geometry and attributes together following and burnout, as well as changing data, technology, and
data capture (see Figure 10.2 for a diagrammatic expla- organizational priorities.
nation of relating geometry and attributes). Whichever approach is preferred, a pilot project carried
Metadata are a special type of non-geometric data out on part of the study area and a selection of the
that are increasingly being collected. Some metadata data types can prove to be invaluable. A pilot project
are derived automatically by the GIS software system can identify problems in workflow, database design,
(for example, length and area, extent of data layer, and personnel, and equipment. A pilot database can also be
count of features), but some must be explicitly collected used to test equipment and to develop procedures for
(for example, owner name, quality estimate, and original quality assurance. Many projects require a test database
source). Explicitly collected metadata can be entered in for hardware and software acceptance tests, as well as
the same way as other attributes, as described above. For to facilitate software customization. It is essential that
further information about metadata see Section 11.2. project managers are prepared to discard all the data
obtained during a pilot data collection project, so that the
main phase can proceed unconstrained.
A further important decision is whether data collec-
tion should use in-house or external resources. It is now
9.6 Managing a data collection increasingly common to outsource geographic data col-
project lection to specialist companies that usually undertake the
work in areas of the world with very low labor costs
(e.g., India and Thailand). Three factors influencing this
The subject of managing a GIS project is given extensive decision are: cost/schedule, quality, and long-term rami-
treatment later in this book in Chapters 17–20. The fications. Specialist external data collection agencies can
management of data capture projects is discussed briefly often perform work faster, cheaper, with higher quality
here both because of its critical importance and because than in-house staff, but because of the need for real cash to
there are several unique issues. That said, most of the pay external agencies this may not be possible. In the short
general principles for any GIS project apply to data term, project costs, quality, and time are the main consid-
collection: the need for a clearly articulated plan, adequate erations, but over time dependency on external groups
resources, appropriate funding, and sufficient time. may become a problem.
216 PART III TECHNIQUES

Questions for further study Further reading


Hohl P. (ed) 1997 GIS Data Conversion: Strategies,
1. Using the websites listed in Table 9.4 as a starting Techniques and Management. Santa Fe, NM: OnWord
point, evaluate the suitability of free geographic data Press.
for your home region or country for use in a GIS Jones C. 1997 Geographic Information Systems and Com-
project of your choice. puter Cartography. Reading, MA: Addison-Wesley
2. What are the advantages of batch vectorization over Longman.
manual table digitizing? Lillesand T.M., Kiefer R.W. and Chipman R.W. 2003
3. What quality assurance steps would you build into a Remote Sensing and Image Interpretation (5th edn).
data collection project designed to construct a Hoboken, NJ: Wiley.
database of land parcels for tax assessment? Paine D.P. and Kiser J.D. 2003 Aerial Photography and
4. Why do so many geographic data formats exist? Image Interpretation (2nd edn). Hoboken, NJ: Wiley.
Walford N. 2002 Geographical Data: Characteristics and
Which ones are most suitable for selling vector
Sources. Hoboken, NJ: Wiley.
data?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy