0% found this document useful (0 votes)
271 views175 pages

Gis 5 Units Notes

Uploaded by

Sinu Sha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
271 views175 pages

Gis 5 Units Notes

Uploaded by

Sinu Sha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 175

UNIT 1 FUNDAMENTALS OF GIS 9

Introduction to GIS - Basic spatial concepts - Coordinate Systems - GIS and Information Systems
– Definitions – History of GIS - Components of a GIS – Hardware, Software, Data, People,
Methods – Proprietary and open source Software - Types of data – Spatial, Attribute data- types
of attributes – scales/ levels of measurements.

1.1 Introduction to GIS:


What is GIS?
Today we shall be talking about what GIS is and how it is structured. The use of GIS is now
widespread in archaeology, but in order to understand why and what it can do for us as
archaeologists, we first need to understand what it actually is. As an acronym, GIS is usually
taken to stand for either Geographic Information Systems or Geographic Information Science.
Unfortunately, there is some disagreement over what actually defines a GIS, with some people
even arguing that the term itself is not useful at all. However, the software exists and performs
many tasks of great use when studying the past. As such, we can follow Wheatley and Gillings’
definition of GIS as “computer systems whose main purpose is to store, manipulate, analyse and
present information about geographic space.” This is a definition that could also describe other
technologies, such as CAD: however, the key difference with GIS is in its abilities to both
integrate multiple sources of data and to analyse space.
What does GIS do?
What, therefore, does GIS actually do? Several things, in fact:
It allows users to map multiple different sources of geographic data within a single computerized
environment. Different data sources are usually treated as layers, which may be reordered and
switched on and off at will, set to varying transparencies, and manipulated through tools such as
zooming, panning, and sometimes rotating.
It allows users to employ many different and powerful tools to analyse the spatial distribution of
their data. This spatial analysis can provide a route into discovering and unlocking previously
unseen patterns in our data, shedding new light on unknown aspects of the past.
It also allows users to produce paper and electronic maps for inclusion in their work and for the
dissemination of their results to the wider archaeological, historical and public communities.
Depending on the GIS software used, this might include animations or interactive maps delivered
over the internet.
1.2 Basic spatial concepts:

1
In order to understand Geographic information systems (GIS) basic concepts and its relation
with the environmental management we need to see what GIS actually means; and will try to put
forward some evidence how it can benefits the environmental management.
This would be a series of articles explaining brief history, technological aspects, best practices
and the practical applications of GIS to understand environmental studies.
As we study the history of GIS we came to know that in 1960’s the GIS term was came into
existence but there were fewer people and professionals involved in it. In 1990’s more researchers
were taking GIS as a researching tool but the real boost for GIS was in 2005 when Google
launched Google Maps and Google Earth web applications, this is where everyone came to know
the importance of GIS.
Google maps and Google earth provided people with the solutions of maps but data interpretation
and data preparation was still not included in it, and for high level data analysis and decision
making GIS professionals were required.
GIS is the science of location based services to know which thing is where and why? The process
is to collect data from different sources, displaying it over the maps, later performing spatial
analysis on that data which helps in making decisions and predictions.
There are three major and basic components of Geographic information (GI) technologies
which have changed and revolutionized the concept of handling the locations and spatial data.We
will be discussing these technologies in our upcoming articles, but for now we will go through
the basics concepts of GItechnologies.
 Global Positioning System (GPS)
 Remote Sensing (RS)
 Geographic Information System (GIS)
As we all are familiar with GPS, it’s a system which tells geographical location from the earth’s
surface through satellite. It saves time, money and has more accuracy than any other
methods.Previously companies used to hire expensive surveyors who had to physically visit the
locations to gather the desired information, it was a great hassle in the past and sometimes it was
impossible to gather the accurate and precise information. But with the technological
advancements GPS is accessible to every part of the world.
Remote Sensing (RS) is about collecting and measuring data without having a direct contact with
the objects; use of satellite, aircraft and now drones are used to capture this information of earth’s
surface.It saves time and money from the expensive physical field surveys. For environmental
studies (RS) is more commonly used technology.

2
GIS is a robust set of tools for collecting and retrieving data, transforming it into information
and displaying that information on maps taken from the real world. The integration of GPS, RS
and other data modelling technologies provides information which helps in dealing with the
changes that are integral for environmental protection, surveillance and disaster management.

Geographic information system (GIS) is software that converts data into productive
information by getting data from GPS and RS, and then analyzes the data and displays it as
productive information. It gives an inexpensive way of map production, displaying the
information on the map and makes the analysis easier.
In conclusion GIS is the integration of GPS and RS, and the core concept of GIS applications
development is to make decisions based on the data gained from different sources, converts them
into information so it can fulfils the business, environmental, technological needs.

1.3 Coordinate Systems


A geographic coordinate system (GCS) uses a three-dimensional spherical surface to define
locations on the earth. A GCS is often incorrectly called a datum, but a datum is only one part of
a GCS. A GCS includes an angular unit of measure, a prime meridian, and a datum (based on a
spheroid).
A point is referenced by its longitude and latitude values. Longitude and latitude are angles
measured from the earth's center to a point on the earth's surface. The angles often are measured

3
in degrees (or in grads). The following illustration shows the world as a globe with longitude and
latitude values.

In the spherical system, horizontal lines, or east–west lines, are lines of equal latitude, or parallels.
Vertical lines, or north–south lines, are lines of equal longitude, or meridians. These lines
encompass the globe and form a gridded network called a graticule.
The line of latitude midway between the poles is called the equator. It defines the line of zero
latitude. The line of zero longitude is called the prime meridian. For most geographic coordinate
systems, the prime meridian is the longitude that passes through Greenwich, England. Other
countries use longitude lines that pass through Bern, Bogota, and Paris as prime meridians. The
origin of the graticule (0,0) is defined by where the equator and prime meridian intersect. The
globe is then divided into four geographical quadrants that are based on compass bearings from
the origin. North and south are above and below the equator, and west and east are to the left and
right of the prime meridian.

This
illustration shows the parallels and meridians that form a graticule.
Latitude and longitude values are traditionally measured either in decimal degrees or in degrees,
minutes, and seconds (DMS). Latitude values are measured relative to the equator and range from
-90° at the South Pole to +90° at the North Pole. Longitude values are measured relative to the
prime meridian. They range from -180° when traveling west to 180° when traveling east. If the
prime meridian is at Greenwich, then Australia, which is south of the equator and east of
Greenwich, has positive longitude values and negative latitude values.

4
It may be helpful to equate longitude values with X and latitude values with Y. Data defined on
a geographic coordinate system is displayed as if a degree is a linear unit of measure. This method
is basically the same as the Plate Carrée projection.
Learn more about the Plate Carrée projection
Although longitude and latitude can locate exact positions on the surface of the globe, they are
not uniform units of measure. Only along the equator does the distance represented by one
degree of longitude approximate the distance represented by one degree of latitude. This is
because the equator is the only parallel as large as a meridian. (Circles with the same radius as
the spherical earth are called great circles. The equator and all meridians are great circles.)
Above and below the equator, the circles defining the parallels of latitude get gradually smaller
until they become a single point at the North and South Poles where the meridians converge. As
the meridians converge toward the poles, the distance represented by one degree of longitude
decreases to zero. On the Clarke 1866 spheroid, one degree of longitude at the equator equals
111.321 km, while at 60° latitude it is only 55.802 km. Because degrees of latitude and longitude
don't have a standard length, you can’t measure distances or areas accurately or display the data
easily on a flat map or computer screen.

1.4 GIS and Information Systems


A geographic information system (GIS) is a computer system for capturing, storing, checking,
and displaying data related to positions on Earth’s surface. By relating seemingly unrelated data,
GIS can help individuals and organizations better understand spatial patterns and relationships.

GIS technology is a crucial part of spatial data infrastructure, which the White House defines as
“the technology, policies, standards, human resources, and related activities necessary to acquire,
process, distribute, use, maintain, and preserve spatial data.”

GIS can use any information that includes location. The location can be expressed in many
different ways, such as latitude and longitude, address, or ZIP code.

Many different types of information can be compared and contrasted using GIS. The system can
include data about people, such as population, income, or education level. It can include
information about the landscape, such as the location of streams, different kinds of vegetation,

5
and different kinds of soil. It can include information about the sites of factories, farms, and
schools; or storm drains, roads, and electric power lines.

With GIS technology, people can compare the locations of different things in order to discover
how they relate to each other. For example, using GIS, a single map could include sites that
produce pollution, such as factories, and sites that are sensitive to pollution, such as wetlands and
rivers. Such a map would help people determine where water supplies are most at risk.

Data Capture
Data Formats
GIS applications include both hardware and software systems. These applications may include
cartographic data, photographic data, digital data, or data in spreadsheets.
Cartographic data are already in map form, and may include such information as the location of
rivers, roads, hills, and valleys. Cartographic data may also include survey data, mapping
information which can be directly entered into a GIS.

Photographic interpretation is a major part of GIS. Photo interpretation involves analyzing


aerial photographs and assessing the features that appear.

Digital data can also be entered into GIS. An example of this kind of information is computer
data collected by satellites that show land use—the location of farms, towns, and forests.

Remote sensing provides another tool that can be integrated into a GIS. Remote sensing
includes imagery and other data collected from satellites, balloons, and drones.

Finally, GIS can also include data in table or spreadsheet form, such as population demographics.
Demographics can range from age, income, and ethnicity to recent purchases and Internet
browsing preferences.

GIS technology allows all these different types of information, no matter their source or original
format, to be overlaid on top of one another on a single map. GIS uses location as the key index
variable to relate these seemingly unrelated data.

6
Putting information into GIS is called data capture. Data that are already in digital form, such as
most tables and images taken by satellites, can simply be uploaded into GIS. Maps, however,
must first be scanned, or converted to digital format.

The two major types of GIS file formats are raster and vector. Raster formats are grids of cells or
pixels. Raster formats are useful for storing GIS data that vary, such as elevation or satellite
imagery. Vector formats are polygons that use points (called nodes) and lines. Vector formats are
useful for storing GIS data with firm borders, such as school districts or streets.

Spatial Relationships
GIS technology can be used to display spatial relationships and linear networks. Spatial
relationships may display topography, such as agricultural fields and streams. They may also
display land-use patterns, such as the location of parks and housing complexes.

Linear networks, sometimes called geometric networks, are often represented by roads, rivers,
and public utility grids in a GIS. A line on a map may indicate a road or highway. With GIS
layers, however, that road may indicate the boundary of a school district, public park, or other
demographic or land-use area. Using diverse data capture, the linear network of a river may be
mapped on a GIS to indicate the stream flow of different tributaries.

GIS must make the information from all the various maps and sources align, so they fit together
on the same scale. A scale is the relationship between the distance on a map and the actual
distance on Earth.

Often, GIS must manipulate data because different maps have different projections. A projection
is the method of transferring information from Earth’s curved surface to a flat piece of paper or
computer screen. Different types of projections accomplish this task in different ways, but all
result in some distortion. To transfer a curved, three-dimensional shape onto a flat surface
inevitably requires stretching some parts and squeezing others.

A world map can show either the correct sizes of countries or their correct shapes, but it can’t do
both. GIS takes data from maps that were made using different projections and combines them
so all the information can be displayed using one common projection.

7
GIS Maps

Once all of the desired data have been entered into a GIS system, they can be combined to produce
a wide variety of individual maps, depending on which data layers are included. One of the most
common uses of GIS technology involves comparing natural features with human activity.

For instance, GIS maps can display what manmade features are near certain natural features, such
as which homes and businesses are in areas prone to flooding.

GIS technology also allows to “dig deep” in a specific area with many kinds of information. Maps
of a single city or neighborhood can relate such information as average income, book sales, or
voting patterns. Any GIS data layer can be added or subtracted to the same map.
GIS maps can be used to show information about numbers and density. For example,
GIS can show how many doctors there are in a neighborhood compared with the area’s population.
With GIS technology, researchers can also look at change over time. They can use
satellite data to study topics such as the advance and retreat of ice cover in polar regions, and how
that coverage has changed through time. A police precinct might study changes in crime data to
help determine where to assign officers.
One important use of time-based GIS technology involves creating time-lapse
photography that shows processes occurring over large areas and long periods of time. For
example, data showing the movement of fluid in ocean or air currents help scientists better
understand how moisture and heat energy move around the globe.
GIS technology sometimes allows users to access further information about specific areas
on a map. A person can point to a spot on a digital map to find other information stored in the
GIS about that location. For example, a user might click on a school to find how many students
are enrolled, how many students there are per teacher, or what sports facilities the school has.
GIS systems are often used to produce three-dimensional images. This is useful, for
example, to geologists studying earthquake faults.
GIS technology makes updating maps much easier than updating maps created manually.
Updated data can simply be added to the existing GIS program. A new map can then

8
be printed or displayed on screen. This skips the traditional process of drawing a map, which can
be time-consuming and expensive.

GIS Jobs
People working in many different fields use GIS technology. GIS technology can be used for
scientific investigations, resource management, and development planning.

Many retail businesses use GIS to help them determine where to locate a new store. Marketing
companies use GIS to decide to whom to market those stores and restaurants, and where that
marketing should be.

Scientists use GIS to compare population statistics to resources such as drinking water. Biologists
use GIS to track animal migration patterns.

City, state, or federal officials use GIS to help plan their response in the case of a natural disaster
such as an earthquake or hurricane. GIS maps can show these officials what neighborhoods are
most in danger, where to locate emergency shelters, and what routes people should take to reach
safety.

Engineers use GIS technology to support the design, implementation, and management of
communication networks for the phones we use, as well as the infrastructure necessary for
Internet connectivity. Other engineers may use GIS to develop road networks and transportation
infrastructure.

9
There is no limit to the kind of information that can be analyzed using GIS technology. GIS
(geographic information system)GIS allows multiple layers of information to be displayed on a
single map.

1.5 What is GIS – Definition?


Geographic Information System (GIS) is a computer system build to capture, store, manipulate,
analyze, manage and display all kinds of spatial or geographical data. GIS application are tools
that allow end users to perform spatial query, analysis, edit spatial data and create hard copy
maps. In simple way GIS can be define as an image that is referenced to the earth or has x and y
coordinate and it’s attribute values are stored in the table. These x and y coordinates are based on
different projection system and there are various types of projection system. Most of the time GIS
is used to create maps and to print. To perform the basic task in GIS, layers are combined, edited
and designed.

GIS can be used to solve the location based question such as “What is located here” or Where to
find particular features? GIS User can retrieve the value from the map, such as how much is the
forest area on the land use map. This is done using the query builder tool. Next important features
of the GIS is the capability to combine different layers to show new information. For example,
you can combine elevation data, river data, land use data and many more to show information
about the landscape of the area. From map you can tell where is high lands or where is the best
place to build house, which has the river view . GIS helps to find new
information.
10
How GIS Works:
Visualizing Data: The geographic data that is stored in the databases are displayed in the GIS
software. Combining Data: Layers are combined to form a maps of desire. The Query: To search
the value in the layer or making a geographic queries.
Definition by others:
A geographic information system (GIS) lets us visualize, question, analyze, and interpret data to
understand relationships, patterns, and trends. (ESRI)
In the strictest sense, a GIS is a computer system capable of assembling, storing, manipulating,
and displaying geographically referenced information (that is data identified according to their
locations). (USGS)
Advantage of GIS:
 Better decision made by government people
 Improve decision making with the help of layered information
 Citizen engagement due to better system
 Help to identify communities that is under risk or lacking infrastructure
 Helps in identifying criminology matters
 Better management of natural resources
 Better communication during emergency situation
 Cost savings due to better decision
 Finding different kinds of trends within the community
 Planning the demographic changes
11
1.6 History of GIS:
Modern GIS has seen series of development. GIS has evolved with the computer system. Here
are the brief events that has happened for the development of the GIS system.
The first application of the concept was in 1832 when Charles Picquet created a map representing
cholera outbreak across 48 districts of Paris. This map was an early version of a heat map, which
would later revolutionize several industries.

Year 1854 – The term GIS that used scientific method to create maps was used by John Snow in
1854. He used points on London residential map to plot outbreak of Cholera.

Year 1960 – Modern computerized GIS system began in year 1960.

12
Year 1962 – Dr. Roger Tomlinson created and developed Canadian Geographic Information
System (CGIS) to store, analyze and manipulate data that was collected for the Canada Land
Inventory (CLI). This software had the capacity to overlay, measurement and digitizing
(converting scan hardcopy map to digital data). It is never provided in commercial format but Dr.
Tomlinson is the father of GIS.
Dr. Roger Tomlinson (1933-2014)

Year 1980 – This period saw rise of commercial GIS software’s like M&S Computing,
Environmental Systems Research Institute (ESRI) and Computer Aided Resource Information
System (CARIS). These all software were similar to CGIS with more functionality and user-
friendliness. Among all the above the most popular today is ESRI products like ArcGIS, ArcView
which hold almost 80 % of global market.

1.7 Component of GIS:

1.7.1 Hardware:
Hardware is Computer on which GIS software runs. Nowadays there are a different range of
computer, it might be Desktop or server based. ArcGIS Server is server based computer where
13
GIS software runs on network computer or cloud based. For computer to perform well all
hardware component must have high capacity. Some of the hardware components are:
Motherboard, Hard driver, processor, graphics card, printer and so on. These all component
function together to run a GIS software smoothly.
1. Motherboard: It is board where major hardware parts are installed or It is a place where
all components gets hooked up.
2. Hard Drive: It is also called hard disk, place to store data.
3. Processor: Processor is the major component in computer, it performs calculation. It is
called as Central processing Unit (CPU).
4. RAM: Random Access Memory (RAM) where all running programs load temporarily.
5. Printer: It is output device and used to print image, map or document. There are various
type of printer available in market.
6. External Disk: These are portable storage space such as USB drive, DVD, CD or external
disk.
7. Monitor: It is a screen for displaying output information. Nowadays there are various
type of monitor: CRT (cathode ray tube), LCD (Liquid Crystal Display), LED (Light
Emitting Diodes) and more.

1.7.2 Software:
GIS Software provides tools and functions to input and store spatial data or geographic data. It
provides tool to perform geographic query, run analysis model and display geographic data in the
map form. GIS software uses Relation Database Management System (RDBMS) to store the
geographic data. Software talks with the database to perform geographic query.
Software: Next component is GIS software which provide tools to run and edit spatial
information. It helps to query, edit, run and display GIS data. It uses RDBMS (Relational
Database Management System) to store the data. Few GIS software list: ArcGis, ArcView 3.2,
QGIS, SAGA GIS.
Software Components:
1. GIS Tools: Key tools to support the browsing of the GIS data
2. RDBMS: Relational Database Management System to store GIS data. GIS Software
retrieve from RDBMS or insert data into RDBMS.
3. Query Tools: Tools that work with database management system for querying, insertion,
deletion and other SQL (Standard Query Language).
4. GUI: Graphical User Interface that helps user and Software to interact well.

14
5. Layout: Good layout window to design map.

1.7.3 Data:
The most important and expensive component of the Geographic Information System is Data
which is generally known as fuel for GIS. GIS data is combination of graphic and tabular data.
Graphic can be vector or raster. Both type of data can be created in house using GIS software or
can be purchased. The process of creating the GIS data from the analog data or paper format is
called digitization. Digitization process involves registering of raster image using few GCP
(ground control point) or known coordinates. This process is widely known as rubber sheeting or
georefrencing. Polygon, lines and points are created by digitizing raster image. Raster image itself
can be registered with coordinates which is widely known as rectifying the image. Registered
image are mostly exported in TIFF format. As mentioned above, GIS data can be Raster or Vector.
GIS Data Types:
 Raster: Raster image store information in a cell based manner. It can be aerial photo,
satellite image, Digital Elevation Model (DEM). Raster images normally store continuous
data.
 Vector: Vector data are discrete. It store information in x, y coordinate format. There are
three types of Vector data: Lines, Points and Area.

1.7.4 People:
People are the user of the GIS system.People use all above three component to run a GIS system.
Today’s computer are fast and user friendly which makes it easy to perform geographic queries,
analysis and displaying maps. Today everybody uses GIS to perform their daily job. People are
user of Geographic Information System. They run the GIS software. Hardware and software have
seen tremendous development which made people easy to run the GIS software. Also computer
are affordable so people are using for GIS task. These task may be creating simple map or
performing advance GIS analysis. The people are main component for the successful GIS.

1.7.5 Methods: For successful GIS operation a well-designed plan and business operation rules
are important. Methods can vary with different organizations. Any organization has documented
their process plan for GIS operation. These document address number question about the GIS
methods: number of GIS expert required, GIS software and hardware, Process to store the data,

15
what type of DBMS (database management system) and more. Well designed plan will address
all these question.

1.8 Proprietary and Open source Software


The term open source refers to software whose source code — the medium in which programmers
create and modify software — is freely available on the Internet; by contrast, the source code for
proprietary commercial software is usually a closely guarded secret.

The most well-known example of open source software is the Linux operating system, but there
are open source software products available for every conceivable purpose.

Open source software is distributed under a variety of licensing terms, but almost all have two
things in common: the software can be used without paying a license fee, and anyone can modify
the software to add capabilities not envisaged by its originators.

A standard is a technology specification whose details are made widely available, allowing many
companies to create products that will work interchangeably and be compatible with each other.
Any modern technology product relies on thousands of standards in its design — even the gasoline
you put in your car is blended to meet several highly-detailed specifications that the car’s
designers rely on.

For a standard to be considered an open standard, the specification and rights to implement it
must be freely available to anyone without signing non-disclosure agreements or paying royalties.
The best example of open standards at work is the Internet — virtually all of the technology
specifications it depends on are open, as is the process for defining new ones.

An Application Programming Interface (API) is a feature of a software application that allows


other software to inter-operate with it, automatically invoking its functionality and exchanging
data with it. The definition of an API is a form of technology standard. The term open API doesn’t
yet have a universally accepted definition, but it’s generally expected to be “open” in the same
manner as an open standard.

The common theme of “openness” in the above definitions is the ability of diverse parties to
create technology that interoperates. When evaluating your organization’s current and

16
anticipated software needs, consider a solution’s capability to interoperate as an important
criterion. To extend the value of your technology investment, select a software solution that is
based on open standards and APIs that facilitate interoperability and has the capability for direct
integration between various vendors’ products.
Difference Between Open Source and Proprietary Software
There’s no easy way to find out which is the better software development model for your
business, open-source or proprietary.
Open-source has its plate full of developers and programmers who are least intimidated by the
idea of commercializing software, but it poses threat to the commercial software industry who
are most threatened by the notion of open-source software.
The difference between the two is fairly clear because each model has its fair share of pros and
cons. However, weighing down the options between open-source and proprietary to find which
one’s superior is a difficult task.
As with any decision making complexities, you can only be certain about “it depends”. Clearly,
one has a little edge over the other in terms of features and characteristics which definitely set
them apart.
The idea that one totally contradicts the other is not exactly true. This article explains the
difference between the two.
What is Open-Source Software?
It all started with Richard Stallman who developed the GNU project in 1983 which fueled the
free software movement which eventually led to the revolutionary open-source software
movement.
The movement catapulted the notion of open-source collaboration under which developers and
programmers voluntarily agreed to share their source code openly without any restrictions.
The community of people working with the software would allow anyone to study and modify
the open-source code for any purpose they want. The open-source movement broke all the barriers
between the developers/programmers and the software vendors encouraging everyone to open
collaboration. Finally, the label “open-source software” was made official at a strategy session in
Palo Alto, California in 1998 to encourage the worldwide acceptance of this new term which
itself is reminiscent of the academic freedom.
The idea is to release the software under the open licenses category so that anyone could see,
modify, and distribute the source code as deemed necessary.
It’s a certification mark owned by the Open Source Initiative (OSI). The term open source
software refers to the software that is developed and tested through open collaboration meaning

17
anyone with the required academic knowledge can access the source code, modify it, and
distribute his own version of the updated code.
Any software under the open source license is intended to be shared openly among users and
redistributed by others as long as the distribution terms are compliant with the OSI’s open source
definition. Programmers with access to a program’s source code are allowed to manipulate parts
of code by adding or modifying features that would not have worked otherwise.

What is Proprietary Software?


Unlike open source, there are some software the source code of which can only be modified by
the individual or organization who created it.
The owner or publisher of the software holds intellectual property rights of the source code
exclusively. We call this type of software “proprietary software” because only the original
owner(s) of the software are legally allowed to inspect and modify the source code.
In simple terms, proprietary software is software that is solely owned by the individual or the
organization that developed it. Proprietary software, as the name suggests, are exclusive property
of their creators or publishers and anyone outside the community are not allowed to use, modify,
copy or distribute modified versions of the software.
The owner of is the exclusive copyright holder of the software and only he has the rights to modify
or add features to the program’s source code. He is the sole owner of the program who can sell
it under some concrete conditions which should be followed by the users in order to
avoid any legal disputes.
18
Unlike open source software, the internal structure of proprietary software is not exposed and the
restrictions are imposed upon the users by the End User License Agreement (EULA), the
conditions of which are to be legally followed by the end users regarding the software.
Examples of proprietary software include iTunes, Windows, macOS, Google Earth, Unix, Adobe
Flash Player, Microsoft Word, etc.

Difference between Open-Source and Proprietary Software


Control of Open-Source and Proprietary Software
The idea alone that developers and programmers are allowed to examine and modify the source
code as deemed necessary shouts aloud control. More control means more flexibility, which
means non-programmers can also benefit from the open collaboration. Proprietary software, on
the contrary, restricts control only to the owner of the software.
Security of Open-Source and Proprietary Software
Because anyone with the required knowledge can add or modify additional features to the
program’s source code to make it work better, it allows better sustainability of the software as
indiscrepancies in the software can be rectified and corrected repeatedly. As developers can work
without any restrictions, it allows them to rectify errors that might have missed by the original
developers or publishers.
Driver Support of Open-Source and Proprietary Software
Open-source software packages often have missing drivers which is natural when you have an
open community of users with access to every single line of code. The software may include code
modified by one or more individuals, each subject to different terms and conditions. The lack of
formal support or sometimes use of generic drivers can put the project at risk. Proprietary software
means closed group support which means better performance.
Usability of Open-Source and Proprietary Software
Unlike open-source projects, proprietary ones are typically designed keeping in mind a limited
group of end users with limited skills. They target a small knit circle of end users unlike
19
projects accomplished within open source communities. Users outside the programming
community won’t even look at the source code let alone modify it.
Opacity of Open-Source and Proprietary Software
The viewing restrictions barred the end users from modifying the code let alone debugging it
effectively with no control over possible workarounds. The internal structure of proprietary
software is strictly closed-access meaning they lack transparency which makes it virtually
impossible for users to even suggest modifications or optimizations to the software. Open source,
on the other hand, promotes open collaboration which means lesser bugs and faster bug fixes with
fewer complexities.

20
1.9 Types of GIS Data:
A geodatabase is a database that is in some way referenced to locations on the earth. Coupled
with this data is usually data known as attribute data. Attribute data generally defined as additional
information, which can then be tied to spatial data.
What types of GIS Data are there?
GIS data can be separated into two categories: spatially referenced data which is represented by
vector and raster forms (including imagery) and attribute tables which is represented in tabular
format. Within the spatial referenced data group, the GIS data can be further classified into two
different types: vector and raster. Most GIS software applications mainly focus on the usage and
manipulation of vector geodatabases with added components to work with raster-based
geodatabases.

21
Vector data
Vector data is split into three types: polygon, line (or arc) and point data. Polygons are used to
represent areas such as the boundary of a city (on a large scale map), lake, or forest. Polygon
features are two dimensional and therefore can be used to measure the area and perimeter of a
geographic feature. Polygon features are most commonly distinguished using either a thematic
mapping symbology (color schemes), patterns, or in the case of numeric gradation, a color
gradation scheme could be used.
In this view of a polygon based dataset, frequency of fire in an area is depicted showing a graduate
color symbology.

IN THIS VIEW OF A POLYGON BASED DATASET, FREQUENCY OF FIRE IN AN AREA IS DEPICTED SHOWING A GRADUATE
COLOR SYMBOLOGY.

Line (or arc) data is used to represent linear features. Common examples would be rivers, trails,
and streets. Line features only have one dimension and therefore can only be used to measure
length. Line features have a starting and ending point. Common examples would be road
centerlines and hydrology. Symbology most commonly used to distinguish arc features from one
another are line types (solid lines versus dashed lines) and combinations using colors and line
thicknesses. In the example below roads are distinguished from the stream network by designating
the roads as a solid black line and the hydrology a dashed blue line.

22
Point data is most commonly used to represent nonadjacent features and to represent discrete data
points. Points have zero dimensions, therefore you can measure neither length or area with this
dataset. Examples would be schools, points of interest, and in the example below, bridge and
culvert locations. Point features are also used to represent abstract points. For instance, point
locations could represent city locations or place names.
GIS point data showing the location of bridges and culverts.

Both line and point feature data represent polygon data at a much smaller scale. They help reduce
clutter by simplifying data locations. As the features are zoomed in, the point location of a school
is more realistically represented by a series of building footprints showing the physical location
of the campus. Line features of a street centerline file only represent the physical location of the
street. If a higher degree of spatial resolution is needed, a street curbwidth file would be used to
show the width of the road as well as any features such as medians and right- of-ways (or
sidewalks).

Raster Data

23
Raster data (also known as grid data) represents the fourth type of feature: surfaces. Raster data
is cell-based and this data category also includes aerial and satellite imagery. There are two types
of raster data: continuous and discrete. An example of discrete raster data is population density.
Continuous data examples are temperature and elevation measurements. There are also three
types of raster datasets: thematic data, spectral data, and pictures (imagery).

Digital Elevation Model (DEM) showing elevation.

This example of a thematic raster dataset is called a Digital Elevation Model (DEM). Each cell
presents a 30m pixel size with an elevation value assigned to that cell. The area shown is the
Topanga Watershed in California and gives the viewer and understand of the topography of the
region.

Each cell contains one value representing the dominate value of that cell. Raster datasets are
intrinsic to most spatial analysis. Data analysis such as extracting slope and aspect from Digital
Elevation Models occurs with raster datasets. Spatial hydrology modeling such as extracting
watersheds and flow lines also uses a raster-based system. Spectral data presents aerial or
24
satellite imagery which is then often used to derive vegetation geologic information by classifying
the spectral signatures of each type of feature.

Raster data showing vegetation classification. The vegetation data was derived from NDVI
classification of a satellite image.

What results from the effect of converting spatial data location information into a cell based raster
format is called stairstepping. The name derives from the image of exactly that, the square cells
along the borders of different value types look like a staircase viewed from the side.
Unlike vector data, raster data is formed by each cell receiving the value of the feature that
dominates the cell. The stairstepping look comes from the transition of the cells from one value
to another. In the image above the dark green cell represents chamise vegetation. This means that
the dominate feature in that cell area was chamise vegetation. Other features such as developed
land, water or other vegetation types may be present on the ground in that area. As the feature in
the cell becomes more dominantly urban, the cell is attributed the value for developed land, hence
the pink shading.

Spatial and Geographical data


Spatial data support in database is important for efficiently storing, indexing and querying of data
on the basis of spatial location. For example, suppose that we want to store a set of polygons in a
database and to query the database to find all polygons that intersect a given polygon. We cannot
use standard index structures, such as B-trees or hash indices, to answer such a query efficiently.
Efficient processing of the above query would require special-purpose index structures, such as
R-trees for the task.

Two types of Spatial data are particularly important:

25
Computer-aided-design (CAD)data, which include spatial information about how objects-such as
building, cars, or aircraft-are constructed. Other important example of computer-aided-design
databases are integrated-circuit and electronic-device layouts.

CAD systems traditionally stored data in memory during editing or other processing, and wrote
the data back to a file at the end of a session of editing. The drawbacks of such a schema include
cost(programming complexity, as well as time cost) of transforming data from one form to anther,
and the need to read in an entire file even if only parts of it are required. For large design of an
entire airplane, it may be impossible to hold the complete design in memory. Designers of object
oriented database were motivated in large part by the database requirements of CAD systems.
Object-oriented database represent components of design as objects, and the connections between
the objects indicate how the design is structure.

Geographic data such as road maps, land-usage maps, topographic elevation maps, political maps
showing boundaries, land-ownership maps, and so on. Geographical information system are
special purpose databases for storing geographical data. Geographical data are differ from design
data in certain ways. Maps and satellite images are typical examples of geographic data. Maps
may provide not only location information associated with locations such as elevations. Soil type,
land type and annual rainfall.
1.10 Spatial vs Attributes data
GIS Data is the key component of a GIS and has two general types: Spatial and Attribute data.
Spatial data are used to provide the visual representation of a geographic space and is stored as
raster and vector types. Hence, this data is a combination of location data and a value data to
render a map, for example.

Attribute data are descriptions, measurements, and/or classifications of geographic features in a


map. Attribute data can be classified into 4 levels of measurement: nominal, ordinal, interval and
ratio. The nominal level is the lowest level of measurement for distinguishing features
quantitatively using type or class (e.g. tree species). Ordinal data are ranked into hierarchies but
does not show any magnitude of difference (e.g. city hierarchy). The interval measurement
indicates the distance between the ranks of measured elements, but a starting point is arbitrarily
assigned (e.g. Celsius Temperature). Ratio measurements, the highest level of measurements,
includes an absolute starting point. Data of this category include property value and distance.

26
Attribute data is the detailed data used in combination with spatial data to create a GIS. The more
available and appropriate attribute data used with spatial data, the more complete a GIS is as a
management reporting and analysis tool.
Sources of Spatial & Attribute Data
 Spatial data can be obtained from satellite images or scanned maps and similar resources.
This data can then be digitised into vector data or maintained as raster graphic data.
Essentially, any format of a geographical image with location or co-ordinate points can be
used as spatial data.
 Attribute data can be obtained from a number of sources or data can be captured
specifically for you application. Some popular sources of attribute data are from town
planning and management departments, policing and fire departments, environmental
groups, online media.
What is Attribute Data?
Attribute data are descriptions or measurements of geographic features in a map. It refers to
detailed data that combines with spatial data. Attribute data helps to obtain the meaningful
information of a map. Every feature has characteristics that we can describe. For example,
assume a building. It has a built year, the number of floors, etc. Those are attributes. Attributes
are the facts we know, but not visible such as the built year. It can also represent the absence of
a feature.
Difference Between Attribute Data and Spatial Data

Figure: GIS

27
Usually, a table helps to display attribute data. Each row represents a single feature. In a GIS,
clicking on the row will highlight the corresponding feature on the map.
What is Spatial Data?
Spatial data consists of points, lines, polygons or other geographic and geometric data primitives
that we can map by location. It is possible to maintain spatial data as vector data or raster data.
Each provides information connected to geographical locations. Vector data consist of sequential
points or vertices to define a linear segment. It has an x coordinate and a y coordinate.
Furthermore, raster data consists of a matrix of cells or pixels arranged into rows and columns.
Each cell contains a value representing information.

Difference Between Attribute Data and Spatial Data.

 Definition:Attribute data refers to the characteristics of geographical features that are


quantitative and/or qualitative in nature while spatial data refers to all types of data
objects or elements that are present in a geographical space or horizon. Thus, this is the
main difference between attribute data and spatial data.
 Methods of achieving:Town planning and management departments, fire departments,
environmental groups and online media help to obtain attribute data while satellite
images and scanned maps help to obtain spatial data.

28
 Usage:Attribute data describes the characteristics of a geographical feature while spatial
data describes the absolute and relative location of a geographic feature. Hence, this is
another difference between attribute data and spatial data.
 Conclusion:GIS helps to analyze resources such as water, urban areas, roads, coasts,
vegetation, etc. It also allows solving problems related to pollution, forestry, health,
agriculture, health and many other areas. The main difference between Attribute Data
and Spatial Data is that the attribute data describes the characteristics of a geographical
feature while spatial data describes the absolute and relative location of geographic
features.
1.11 Types of attributes:
There are two components to GIS data: spatial information (coordinate and projection information
for spatial features) and attribute data. Attribute data is information appended in tabular format
to spatial features. The spatial data is the where and attribute data can contain information about
the what, where, and why. Attribute data provides characteristics about spatial data.
Types of Attribute Data
Attribute data can be store as one of five different field types in a table or database: character,
integer, floating, date, and BLOB.
1. Character Data
The character property (or string) is for text based values such as the name of a street or
descriptive values such as the condition of a street. Character attribute data is stored as a series
of alphanumeric symbols.
Aside from descriptors, character fields can contain other attribute values such as categories and
ranks. For example, a character field may contain the categories for a street: avenue, boulevard,
lane, or highway. A character field could also contain the rank, which is a relative ordering of
features. For example, a ranking of the traffic load of the street with “1” being the street with the
highest traffic.
Character data can be sorted in ascending (A to Z) and descending (Z to A) order. Since numbers
are considered text in this field, those numbers will be sorted alphabetically which means that a
number sequence of 1, 2, 9, 11, 13, 22 would be sorted in ascending order as 1, 11,
13, 2, 22, 9.
Because character data is not numeric, calculations (sum, average, median, etc.) can’t be
performed on this type of field, even if the value stored in the field are numbers (to do that, the

29
field type would need to be converted to a numeric field). Character fields can be summarized
to produced counts (e.g. the number of features that have been categorized as “avenue”).
2. Numeric Data

Integer and floating are numerical values (see: the difference between floating and integer values).
Within the integer type, the is a further division between short and long integer values. As would
be expected, short integers store numeric values without fractional values for a shorter range than
long integers. Floating point attribute values store numeric values with fractional values.
Therefore, floating point values are for numeric values with decimal points (i.e numbers to the
right of the decimal point as opposed to whole values).
Numeric values will be sorted in sequentially either in ascending (1 to 10) or descending (10 to
1) order.
Numerical value fields can have operations performed such as calculating the sum or average
value. Numerical field values can be a count (e.g. the total number of students at a school) or be
a ratio (e.g. the percentage of students that are girls at a school).
3. Date/Time Data
Date fields contains date and time values.
4. BLOB Data
BLOB stands for binary large object and this attribute type is used for storing information such
images, multimedia, or bits of code in a field. This field stores object linking and embedding
(OLE) which are objects created in other applications such as images and multimedia and linked
from the BLOB field

Attribute data for a road in GIS.


30
1.12 Scales of Measurement / Level of Measurement
The Four Scales of Measurement
Data can be classified as being on one of four scales: nominal, ordinal, interval or ratio. Each
level of measurement has some important properties that are useful to know. For example, only
the ratio scale has meaningful zeros.
1. Nominal Scale. Nominal variables (also called categorical variables) can be placed into
categories. They don’t have a numeric value and so cannot be added, subtracted, divided or
multiplied. They also have no order; if they appear to have an order then you probably have
ordinal variables instead. A pie chart displays groups of nominal variables (i.e. categories).
Survey on Why People Travels.

2. Ordinal Scale. The ordinal scale contains things that you can place in order. For example,
hottest to coldest, lightest to heaviest, richest to poorest. Basically, if you can rank data by 1st,
2nd, 3rd place (and so on), then you have data that’s on an ordinal scale.
Ordinal scale: The ordinal scale classifies according to rank.

3. Interval Scale. An interval scale has ordered numbers with meaningful divisions. Temperature
is on the interval scale: a difference of 10 degrees between 90 and 100 means the same as 10
degrees between 150 and 160. Compare that to high school ranking (which is ordinal), where the
difference between 1st and 2nd might be .01 and between 10th and 11th .5. If you have
meaningful divisions, you have something on the interval scale.
31
Measurement scales

4. Ratio Scale. The ratio scale is exactly the same as the interval scale with one major difference:
zero is meaningful. For example, a height of zero is meaningful (it means you don’t exist).
Compare that to a temperature of zero, which while it exists, it doesn’t mean anything in particular
(although admittedly, in the Celsius scale it’s the freezing point for water).
Weight is measured on the ratio scale.

32
UNIT II SPATIAL DATA MODELS 9
Database Structures – Relational, Object Oriented – ER diagram - spatial data models – Raster
Data Structures – Raster Data Compression - Vector Data Structures - Raster vs Vector Models-
TIN and GRID data models - OGC standards - Data Quality.

2.1 What are Data?


Data are basic facts or values. Every task a computer carries out works with data in some way.
Without data, computers would be pretty useless. It is therefore important to understand what
data are and how to represent and organize data. The term 'data' is considered plural in the
scientific community, as in 'The data are collected,' not 'The data is collected.' However, not
everyone follows this, so sometimes you will see 'data' used as singular.
Database Structure
A database is an organized collection of data. Instead of having all the data in a list with a random
order, a database provides a structure to organize the data. One of the most common data
structures is a database table. A database table consists of rows and columns. A database table is
also called a two-dimensional array. An array is like a list of values, and each value is identified
by a specific index. A two-dimensional array uses two indices, which correspond to the rows and
columns of a table.
In database terminology, each row is called a record. A record is also called an object or an entity.
In other words, a database table is a collection of records. The records in a table are the objects
you are interested in, such as the books in a library catalog or the customers in a sales database.
A field corresponds to a column in the table and represents a single value for each record. A field
is also called an attribute. In other words, a record is a collection of related attributes that make
up a single database entry.
The example shows a simple database table of customers. Each customer has a unique identifier
(Customer ID), a name, and a telephone number. These are the fields. The first row is called the
header row and indicates the name of each field. Following the header row, each record is a unique
customer.
Notice a few things about the table. First, all the data values in a single field or column are of the
same kind - they are the same data type. Second, the data values in a single record or row can
consist of different types, such as numbers and text. Third, there are no empty rows or columns.
Individual data values can be missing, but there are no blank records or fields. These properties
make a database table quite different from a table in a word processing or spreadsheet application.

33
The database structure imposes certain constraints on the data values, which makes it more
reliable. For example, for the phone number, you cannot enter text, since that wouldn't make
sense.
While this example is quite simple, you can easily imagine what else could be stored in such a
database. For example, you could store the customer's mailing address, billing information,
history of past purchases, etc. For an organization with many thousands of customers, this quickly
becomes a large database. To use a large database effectively, you can use a database management
system (DBMS). A DBMS is specialized software to input, store, retrieve and manage all the
data.

2.2 Relational and Object Oriented spatial data models:


The Relational Model
The main data storage concept in the relational model is a table of records, referred to as a relation,
or simply a table. The records in a table contain a fixed number of fields, which must all be
different from each other, and all records are of identical format. There is, therefore, a simple row
and column structure. In relational database terminology the rows, or records, are also referred to
as tuples, while the columns of fields are sometimes referred to as domains. Each record of a table
stores an entity or a relationship and is uniquely identified by means of a primary key which
consists of one field, or a combination of two or more fields in the record. The need for composite
keys, consisting of more than one field, arises if no one field can be guaranteed unique. The fields
of an entity table store attributes of the entity to which the table corresponds. Table 1 illustrates
an example for Settlement.
Settlement name Settlement status Settlement population County name
Gittings Village 243 Downshire
Bogton Town 31520 Downshire
Puffings Village 412 Binglia
Pondside City 112510 Mereshire
Craddock Town 21940 Binglia
Bonnet Town 28266 Binglia
Drain Village 940 Mereshire

Relational operators:
Retrieval from a relational database involves creating, perhaps temporarily, new relations which
are subsets or combinations of the permanently stored relations. There are several relational

34
algebra operators that can be used to search and manipulate relations in order to perform such
retrievals. Some of these operators are selection, project, union and join. Other standard operators
include product, divide and intersection. From the user's point of view, the operators are not
named as such but are implemented by means of the standard Structured Query Language
(SQL) using a number of commands and key words. For example, the command
SELECT settlement-name, county-name
FROM Settlement
will create a new table which consists only of the settlement name and county fields of the
Settlement table.
The selection (or restrict) operation is concerned with retrieving a subset of the records of a table
on the basis of retrieval criteria expressed in terms of the contents of one or more of the fields in
each record. For example, to retrieve all settlements in the county of Mereshire with a population
greater than 20000, the SQL command would be
SELECT
FROM Settlement
WHERE county-name = Mereshire AND settlement-population > 20000
Note that the WHERE condition consists of a logical expression. This query could have been
combined with a projection operation by specifying field names after the SELECT command.
The join operator is more complicated than projection and selection in that its purpose is to
combine fields from two or more tables. The operator depends on the tables being related to each
other by means of a common field.
Important features of Relational Data Bases
 Primary and Foreign keys
 Relational joins
 Normal forms
The Primary and Foreign Keys
The Relational approach has important implications for the design of data base tables. Since each
table or relation represents a set, it cannot, therefore, have any rows whose entire contents are
duplicated. Secondly, as each row must be different to every other, it follows that a value in a
single column, or a combination of values in multiple columns, can be used to define a primary
key for the table, which allows each row to be uniquely identified. The uniqueness properly allows
the primary key to serve as the sole row level addressing mechanism in the relational data base
model.

35
A field that stores the key of another table is called a foreign key. It is important to realize that
the primary key of a table and any foreign keys that it may store consist of logical data items
which may be attributes such as names or some allocated numerical identifier. They do not consist
of physical addresses in the database. They will, however, be used as the basis of indexing
mechanisms which the database management system uses to provide efficient query processing.
Relational joins
The mechanism for linking data in different tables is called a relational join. Values in a column
or columns in one table are matched to corresponding values in a column or columns in a second
table. Matching is frequently based on a primary key in one table linked to a column in the second,
which is termed a foreign key. An example of the join mechanism is shown below :

Figure .:Example of the join mechanism


Normal Forms
A certain amount of necessary data redundancy is implicitly in the relational model because the
join mechanism matches column values between tables. Without careful design, unnecessary
redundancy may be introduced into the database. One of the tasks of a database designer would
be to reduce all information to normalised form. Thus a relational database table can be regarded
as representing a set of entities, each of which is stored in a record of the table.
Alternatively, a
table can represent a relationship which links key fields of associated entities.
There are several degrees of normalisation. They differ in various respects, including the extent
to which data items within a record are dependent upon each other, as opposed to having an
independent All the tables must contain rows and columns and column values must be atomic,

36
that is they do not contain repeating groups of data, such as multiple values of a census variable
for different years.
The second requirement of normal form is that every column, which is not part of the primary
key, must be fully dependent on the primary key. The third normal form requires that every non-
primary key column must be non-transitively dependent on the primary key.
Nevertheless, the fundamental working rule for most circumstances ensure that each attribute of
a table represents a fact about the primary key, the whole primary key and nothing, but the primary
key, while this is entirely valid from the design view point, it must also be said that practical
implementation requirements may, on occasion, override theoretical considerations and lead to
tables being merged and denormalized, usually for performance reasons.
Advantages and disadvantages of relational systems
The Advantages can be summarized as follows:
Rigorous design methodology based on sound theoretical foundations
All the other data base structures can be reduced to a set of relational tables, so they are the
most general form of data representation
Ease of use and implementation compared to other types of system
Modifiability, which allows new tables and new rows of data within tables to be added
without difficulty
Flexibility in ad-hoc data retrieval because of the relational joins mechanism and powerful
query language facility.
The Disadvantages are as follows:
A greater requirement for processing resources with increasing numbers of users on a given
system than with the other types of data base.
On heavily loaded systems, queries involving multiple relational joins may give slower
response times than are desirable. This problem can largely be mitigated by effective use of
indexing and other optimization strategies, together with the continued improvements in price
performance in computing hardware from mainframes to PC’s.
The DBMS provides a wide range of ready-made data manipulation tools, so programming effort
can be concentrated on algorithms for spatial analysis and user interface requirements. Though, a
data base approach has several advantages over file system approach, GIS system designers prefer
the latter approach for storage of digital map coordinates. This had led to the development of two
different approaches to implementation, based on either a hybrid or an integrated data model.

37
Object Oriented Databases
A recent trend in both software engineering and in database design is towards the use of object-
oriented techniques. For the purposes of geographical databases these techniques are of great
interest since they hold the promise to overcome significant shortcomings, from the point of view
of GIS, of the widely used relational database methods.
Normal queries to a GIS require spatial data processing operations which such standard query
languages cannot currently handle. Object-oriented techniques provide the tools for building
databases which, unlike relational databases, model complex spatial objects. The database
representations of objects include, in addition to stored data, specialized procedures for spatial
searching and for executing queries which may require geometric and topological data processing.
Objects in an object oriented database are intended to correspond to classes of real- world object
and are implemented by combining data, which describe the object attributes, with the procedures,
or methods, which operate on them.
Accessing an object involves sending a message to it, which results in the addressed object using
its internal methods to respond to the message. A variety of types of message may be sent to an
individual object, depending upon its properties and the methods that it has implemented.
Examples of the types of message that might be sent to a polygon class of object would be to
return its coordinates, to return the result of a measurement, such as area or perimeter calculation,
or to display the polygon on a graphics device.
An individual object is an instantiation, or a particular example, of a class of objects, and as such
it is uniquely identified within the database with an object identifier. An object class may inherit
the properties, data attributes and methods of one or more other object classes. Thus having
defined typical object classes, new ones may be created which are combinations of or subclasses
of existing ones.

2.3 ER diagram

An entity relationship diagram (ERD) shows the relationships of entity sets stored in a database.
An entity in this context is an object, a component of data. An entity set is a collection of similar
entities. These entities can have attributes that define its properties. By defining the entities, their
attributes, and showing the relationships between them, an ER diagram illustrates the logical
structure of databases. ER diagrams are used to sketch out the design of a database.

Documenting an Existing Database Using Data

38
There are two reasons to create a database diagram. You're either designing a new schema or you
need to document your existing structure. If you have an existing database you need to to
document, you create a database diagram using data directly from your database. You can export
your database structure as a CSV file,then have a program generate the ERD automatically. This
will be the most accurate potrait of your database and will require no drawing on your part. Here's
an example of a very basic database structure generated from data.

If you want to create a new plan, you can also edit the generated diagram and collaborate with
your team on what changes to make.

Common Entity Relationship Diagram Symbols

An ER diagram is a means of visualizing how the information a system produces is related. There
are five main components of an ERD:

39
 Entities, which are represented by rectangles. An entity is an object or concept about

which you want to store information. A weak entity is an entity that


must defined by a foreign key relationship with another entity as it cannot be uniquely

identified by its own attributes alone.


 Actions, which are represented by diamond shapes, show how two entities share

information in the database. In some cases, entities can be


self-linked. For example, employees can supervise other employees.

 Attributes, which are represented by ovals. A key attribute is the unique, distinguishing
characteristic of the entity. For example, an employee's social security number might be
the employee's key attribute.

A multivalued attribute can have more than one value. For example,

an employee entity can have multiple skill values. A derived


attribute is based on another attribute. For example, an employee's monthly

salary is based on the employee's annual salary.


40
 Connecting lines, solid lines that connect attributes to show the relationships of entities
in the diagram.
 Cardinality specifies how many instances of an entity relate to one instance of another
entity. Ordinality is also closely linked to cardinality. While cardinality specifies the
occurrences of a relationship, ordinality describes the relationship as either mandatory or
optional. In other words, cardinality specifies the maximum number of relationships and
ordinality specifies the absolute minimum number of relationships.

 There are many notation styles that express cardinality.

Information Engineering Style

41
42
ER Diagram Uses

When documenting a system or process, looking at the system in multiple ways increases the
understanding of that system. ERD diagrams are commonly used in conjunction with a data flow
diagram to display the contents of a data store. They help us to visualize how data is connected
in a general way, and are particularly useful for constructing a relational database.

Entity Relationship Diagram Tutorial


43
Here are some best practice tips for constructing an ERD:

 Identify the entities. The first step in making an ERD is to identify all of the entities you
will use. An entity is nothing more than a rectangle with a description of something that
your system stores information about. This could be a customer, a manager, an invoice, a
schedule, etc. Draw a rectangle for each entity you can think of on your page. Keep
them spaced out a bit.

 Identify relationships. Look at two entities, are they related? If so draw a solid line
connecting the two entities.
 Describe the relationship. How are the entities related? Draw an action diamond between
the two entities on the line you just added. In the diamond write a brief description of how
they are related.
 Add attributes. Any key attributes of entities should be added using oval-shaped
symbols.
 Complete the diagram. Continue to connect the entities with lines, and adding diamonds
to describe each relationship until all relationships have been described. Each of your
entities may not have any relationships, some may have multiple relationships. That is
okay.

2.4 Spatial Data Structures models


Spatial data structures describe the rules that are used to represent geographic data in geographic
information systems (GIS). Geographic data includes information about the location, size and
shape of objects or phenomena on or near the surface of the earth, as well as their non-spatial
characteristics.
Geographic data in GIS is represented at several different levels of abstraction, each level
depending on those beneath it:
• Conceptual spatial data models describe how geographic objects (for example, rivers) or
phenomena are represented in GIS.
• Logical spatial data models describe how geographic data are represented in a database
management system (for example, as database tables).

44
• Spatial data structures describe the methods and formats for physical storage and processing of
geographic information in GIS.
Spatial data structures are the core of a GIS and fundamentally affect its performance and
capabilities. Thus an understanding of spatial data structures is important in the study of
geographic data management.
Data structures can be divided into two groups. Vector data structures represent geographic
objects or phenomenon as distinct geometries with specific characteristics and may also include
topology.
Raster data structures represent geographic objects or phenomenon as a grid over which a given
characteristic varies continuously. Generally, raster data structures are suitable for continuously
varying phenomena like temperature, while vector data structures are suitable for the
representation of conceptually distinct objects like land ownership parcels.

2.5 Raster Data Structures


Grid (or raster) data structures represent the world as a grid of cells that have a location and an
attribute value or set of values for that cell. There are a number of different ways in which the
grid may be physically represented within a GIS.
Most simply, a grid may be represented as a list of coordinates (cell row and column) and an
attribute value or set of values. However, such an approach is not usually efficient for data
retrieval because adjacent points in the world are not adjacent to each other in the representation,
and does not support convenient aggregation of cells to allow a large area to be viewed efficiently.
For this reason, hierarchical (or pyramidal if three dimensional) data structures have been
developed that progressively divide the gridded region.
The quad-tree data structure (Figure (a)) is a hierarchical structure with a single cell representing
the entire region at the root, which is then progressively subdivided into four blocks (typically
NW, NE, SW and SE). If a quad-tree block is filled with cells that have the same attribute value,
that value is stored. Otherwise the subdivision process is repeated. Irregular quad-tree data
structures (Figure (b)) may also be used. In this case, the area is still divided into four blocks
successively, but the position of the boundaries between the blocks is determined by the contents
of the area. The r-tree (or rectangle-tree) is similar to the irregular quad-tree but with more
flexibility. Any number of rectangles may be formed around features, objects or clusters, areas
that are empty may be ignored and rectangles may overlap (Figure (c)). Raster data structures are
sometimes used to improve the efficiency of vector data retrieval by

45
representing objects as geometries, but also referencing the geometries against cells in a
hierarchy. This approach allows geographically neighbouring geometries to be found more
efficiently.

Figure : Raster Data Structures

Raster data model

A raster is an array of cells, where each cell has a value representing a specific portion of an
object or a feature.
A point may be represented by a single cell, a line by a sequence of neighbouring cells and a
polygon by a collection of contiguous cells.

All cells in a raster must be the same size, determining the resolution. The cells can be any size,
but they should be small enough to accomplish the most detailed analysis. A cell can represent a
square kilometer, a square meter, or even a square centimeter.

46
Cells are arranged in rows and columns, an arrangement that produces a Cartesian matrix. The
rows of the matrix are parallel to the x-axis of the Cartesian plane, and the columns to the y- axis.
Each cell has a unique row and column address.

Resolution and storage size

Resolution can affect the data storage size. Storage requirements increase by the square of the
image dimensions.

800 x
545 pixels 325KB Zoom in by 5

400 x
272 pixels 91KB Zoom in by 5

47
200 x
136 pixels 26KB Zoom in by 5

2.6 Raster Data Compression

We can distinguish different ways of storing raster data, which basically vary in storage size and
consequently in the geometric organisation of the storage. The following types of geometric
elements are identified:

 Lines
 Stripes
 Tiles
 Areas (e.g. Quad-trees)
 Hierarchy of resolution

Raster data are managed easily in computers; all commonly used programming languages well
support array handling. However, a raster when stored in a raw state with no compression can be
extremely inefficient in terms of computer storage space. As already said the way of improving
raster space efficiency is data compression.

Illustrations and short texts are used to describe different methods of raster data storage and raster
data compression techniques.

Full raster coding (no-compression)

By convention, raster data is normally stored row by row from the top left corner.

48
Example: The Swiss Digital elevation model (DHM25-Matrixmodell in decimeters)

In the header file are stored the following information:

 The dimension of the matrix (number of columns and rows)


 The x-coordinates of the South-West (lower left) corner
 The y-coordinates of the South-West (lower left) corner
 The cell size
 the code used for no data (i.e. missing) values

ncols 480 nrows 450 xllcorner 878923 yllcorner 207345 cellsize 25


nodata_value -9999 6855 6855 6855 6851 6851 6837 6824 6815 6808 6855 6857
6858 6858 6850 6839 6826 6814 6809 6854 6863 6865 6865 6849 6840 6826 6812
6803 6853 6852 6873 6886 6886 6853 6822 6804 6748 6847 6848 6886 6902 6904
6855 6808 6762 6686 6850 6859 6903 6903 6881 6806 6739 6681 6615 6845 6857
6879 6856 6795 6706 6638 6589 6539 6801 6827 6825 6769 6670 6597 6562 6522
6497 6736 6760 6735 6661 6592 6546 6517 6492 6487 ...

Runlength coding (lossless)

Geographical data tends to be "spatially autocorrelated", meaning that objects which are close to
each other tend to have similar attributes:

"All things are related, but nearby things are more related than distant things" (Tobler 1970)
Because of this principle, we expect neighboring pixels to have similar values. Therefore, instead
of repeating pixel values, we can code the raster as pairs of numbers - (run length, value).

49
The runlength coding is a widely used compression technique for raster data. The primary data
elements are pairs of values or tuples, consisting of a pixel value and a repetition count which
specifies the number of pixels in the run. Data are built by reading successively row by row
through the raster, creating a new tuple every time the pixel value changes or the end of the row
is reached.

Describes the interior of an area by run-lengths, instead of the boundary.

In multiple attribute case there a more options available:

We can note in Codes - III, that if a run is not required to break at the end of each line we can
compress data even further.

Chain coding (lossless)

Blockwise coding (lossless)

This method is a generalization of run-length encoding to two dimensions. Instead of sequences


of 0s or 1s, square blocks are counted. For each square the position, the size and, the contents of
the pixels are stored.

50
Quandtree coding (lossless)

The quadtree compression technique is the most common compression method applied to raster
data. Quadtree coding stores the information by subdividing a square region into quadrants, each
of which may be further subdivided in squares until the contents of the cells have the same values.

Reading versus

Example1:
Positioncode of 10: 3,2

Example 2:

51
On the following figure we can see how an area is represented on a map and the corresponding
quadtree representation. More information on constructing and addressing quadtrees,

Huffman coding (lossless compression)

Huffmann coding compression technique involves preliminary analysis of the frequency of


occurrence of symbols. Huffman technique creates, for each symbol, a binary data code, the
length of which is inversely related to the frequency of occurrence.

52
LZ77 method (lossless compression)

LZ77 compression is a loss-less compression method, meaning that the values in your raster are
not changed. Abraham Lempel and Jacob Ziv first introduced this compression method in 1977.
The theory behind this compression method is relatively simple: when you find a match (a data
value that has already been seen in the input file) instead of writing the actual value, the position
and length (number of bytes) of the value is written to the output (the length and offset
- where it is and how long it is).

Some image-compression methods often referred to as LZ (Lempel Ziv) and its variants such as
LZW (Lempel Ziv Welch). With this method, a previous analysis of the data is not required. This
makes LZ77 Method applicable to all the raster data types.

JPEG-Compression (lossy compression)

The JPEG-compression process:

 The representation of the colors in the image is converted from RGB to YCbCr, consisting
of one luma component (Y), representing brightness, and two chroma components, Cb
and Cr), representing color. This step is sometimes skipped.
 The resolution of the chroma data is reduced, usually by a factor of 2. This reflects the
fact that the eye is less sensitive to fine color details than to fine brightness details.
 The image is split into blocks of 8×8 pixels, and for each block, each of the Y, Cb, and Cr
data undergoes a discrete cosine transform (DCT). A DCT is similar to a Fourier transform
in the sense that it produces a kind of spatial frequency spectrum.
 The amplitudes of the frequency components are quantized. Human vision is much more
sensitive to small variations in color or brightness over large areas than to the strength of

53
high-frequency brightness variations. Therefore, the magnitudes of the high-frequency
components are stored with a lower accuracy than the low-frequency components. The
quality setting of the encoder (for example 50 or 95 on a scale of 0–100 in the Independent
JPEG Group's library) affects to what extent the resolution of each frequency component
is reduced. If an excessively low quality setting is used, the high- frequency components
are discarded altogether.
 The resulting data for all 8×8 blocks is further compressed with a loss-less algorithm, a
variant of Huffman encoding.

JPEG-Compression (left to right: decreasing quality setting results in a 8x8 block generation of
pixels)

2.7 Vector Data Structures: Geometry

Vector data structures explicitly store the geometries that represent geographic objects. They are
referred to as vector data structures because they represent geometries using a series of points and
lines (vectors). In vector data structures, geometries may have different dimensions.
• A zero-dimensional geometry is a point. It is usually represented with x and y coordinates.
• A one-dimensional geometry is a line, arc or string of line segments (sometimes called a
polyline) connecting point geometries, sometimes requiring additional information about arc
properties.
• A two-dimensional geometry is a polygon, defined by a sequence of one-dimensional
geometries with the same start and end point. Polygons may be simple or complex. Complex
polygons may be aggregations of more than one polygon and may have holes, possibly with island
geometries inside the holes.

54
• A three-dimensional geometry is a solid, defined by a collection of two-dimensional
geometries
with a z coordinate (usually representing height relative to a reference point). The solid may
also contain holes or be an aggregation of multiple solids.

Figure : Geometries
Geometries may have attributes attached to them. For example, a land ownership parcel may be
represented by a two-dimensional geometry (polygon), with attributes for the name of the parcel
owner and the land-use controls that apply over it.

Vector Data Structures: Topology


Some spatial data structures represent topology as well as geometry. Topology describes how
objects are connected to one another. The topology of a set of objects can be represented by a
graph of nodes and links. Formally, topology describes the set of relationships between objects
that are not changed when object geometries are deformed in various ways (twisted, stretched or
resized). In topological spatial data structures, the connections between objects are stored in
addition to their geometries, and thus when there is a change in the geometry of one object, the
geometry of a connected object may also be affected. For example, in Figure 2(a), although
points A and B coincide, when B is moved, A is not affected. In figure 2(b), point A is stored as
a node of both polygons, and there is no point B. When point A is moved, this affects the
geometries of both polygons.

55
Figure : Topology

2.8 Raster vs Vector models

The main difference between vector and raster graphics is that raster graphics are composed of
pixels, while vector graphics are composed of paths. A raster graphic, such as a gif or jpeg, is an
array of pixels of various colors, which together form an image.

56
In GIS, vector and raster are two different ways of representing spatial data. However, the
distinction between vector and raster data types is not unique to GIS: here is an example from the
graphic design world which might be clearer.

Raster data is made up of pixels (or cells), and each pixel has an associated value. Simplifying
slightly, a digital photograph is an example of a raster dataset where each pixel value corresponds
to a particular colour. In GIS, the pixel values may represent elevation above sea level, or
chemical concentrations, or rainfall etc. The key point is that all of this data is represented as a
grid of (usually square) cells. The difference between a digital elevation model (DEM) in GIS
and a digital photograph is that the DEM includes additional information describing where the
edges of the image are located in the real world, together with how big each cell is on the ground.
This means that your GIS can position your raster images (DEM, hillshade, slope map etc.)
correctly relative to one another, and this allows you to build up your map.

Vector data consists of individual points, which (for 2D data) are stored as pairs of (x, y) co-
ordinates. The points may be joined in a particular order to create lines, or joined into closed rings
to create polygons, but all vector data fundamentally consists of lists of co-ordinates that define
vertices, together with rules to determine whether and how those vertices are joined.

Note that whereas raster data consists of an array of regularly spaced cells, the points in a vector
dataset need not be regularly spaced.In many cases, both vector and raster representations of the
same data are possible:

57
At this scale, there is very little difference between the vector representation and the "fine" (small
pixel size) raster representation. However, if you zoomed in closely, you'd see the polygon edges
of the fine raster would start to become pixelated, whereas the vector representation would remain
crisp. In the "coarse" raster the pixelation is already clearly visible, even at this scale.

Vector and raster datasets have different strengths and weaknesses, some of which are described
in the thread linked to by @wetland. When performing GIS analysis, it's important to think about
the most appropriate data format for your needs. In particular, careful use of raster algebra can
often produce results much, much faster than the equivalent vector workflow.

2.9 TIN and Grid data models

TIN data models

A triangulated irregular network, most commonly referred to as a TIN, is a network of triangles


connected together to create a 3-D surface where the triangles in this network are not crossing.
Triangulated irregular networks are more complex than rasters, however they are more efficient
space wise. Additionally, triangular irregular networks can easily accommodate different
sampling densities, where rasters cannot. Lastly, trying to irregular networks preserved each input
measurement point, again, where rasters cannot.

58
If we look at the anatomy of a TIN, is composed of points, edges, and faces. A point represents
an input data value that is preserved, and defines an endpoint for a triangle. The edge, is the line
drawn between each point which creates the outline of the triangles. The face is the area, or
surface, inside each of the triangles.

Triangulated irregular networks can be quite large, and look quite complex. However, if you think
about what each point of the triangle represents, in this case, an elevation point, and you know
that faces are the flat face of a triangle, you can decipher some of the features in a TIN without
seeing it colorized. For instance, the very large triangles in the top left corner of this trying to the
irregular network represents a lake behind the dam. The very dense triangles flowing from the
top right of the triangulated irregular network to the bottom right of triangular irregular network
represents a river.

59
When we apply colors to this triangular irregular network based on the elevation value, it becomes
clearer what the triangulated irregular network is representing. It is representing the continuous
spatial phenomenon known as elevation. And in this particular case, shows a lake behind the dam,
the surrounding terrain, and the river leading away from the dam.

GRID Data Models:

A data grid is an architecture or set of services that gives individuals or groups of users the ability
to access, modify and transfer extremely large amounts of geographically distributed data for
research purposes.Data grids make this possible through a host of middleware applications and
services that pull together data and resources from multiple administrative domains and then
present it to users upon request. The data in a data grid can be located at a single site or multiple
sites where each site can be its own administrative domain governed by a set of security
restrictions as to who may access the data. Likewise, multiple replicas of the data may be
distributed throughout the grid outside their original administrative domain and the security
restrictions placed on the original data for who may access it must be equally applied to the
replicas. Specifically developed data grid middleware is what handles the integration between
users and the data they request by controlling access while making it available as efficiently as
possible. The adjacent diagram depicts a high level view of a data grid.

60
Data grids have been designed with multiple topologies in mind to meet the needs of the scientific
community. On the right are four diagrams of various topologies that have been used in data
grids.Each topology has a specific purpose in mind for where it will be best utilized. Each of these
topologies is further explained below.

Federation topology is the choice for institutions that wish to share data from already existing
systems. It allows each institution control over their data. When an institution with proper
authorization requests data from another institution it is up to the institution receiving the request
to determine if the data will go to the requesting institution. The federation can be loosely
integrated between institutions, tightly integrated or a combination of both.

Monadic topology has a central repository that all collected data is fed into. The central
repository then responds to all queries for data. There are no replicas in this topology as compared
to others. Data is only accessed from the central repository which could be by way of a web portal.
One project that uses this data grid topology is the Network for Earthquake Engineering
Simulation (NEES) in the United States. This works well when all access to the data is local or
within a single region with high speed connectivity.

Hierarchical topology lends itself to collaboration where there is a single source for the data and
it needs to be distributed to multiple locations around the world. One such project that will benefit
from this topology would be CERN that runs the Large Hadron Collider that generates enormous
amounts of data. This data is located at one source and needs to be distributed around the world
to organizations that are collaborating in the project.

Hybrid Topology is simply a configuration that contains an architecture consisting of any


combination of the previous mentioned topologies. It is used mostly in situations where
researchers working on projects want to share their results to further research by making it readily
available for collaboration.

61
2.10 OGC Standards
What is the OGC?
The Open Geospatial Consortium (OGC) is a not-for-profit organisation focused on developing
and defining open standards for the geospatial community to allow interoperability between
various software, and data services. The OGC provides open standard specifications with the aim
to facilitate and encourage the use of these standards when organisations develop their own
geospatial software, or online geoportals offering data and software services online. The
collection of geoportals and various other compliemntary services, create a Spatial Data
Infrastructure (SDI).
Open Geospatial Consortium Standards: Introduction
The Open Geospatial Consortium (OGC) was founded in 1994 to make geographic information
an integral part of the world’s information infrastructure. OGC members – technology providers
and technology users – collaboratively develop open interface standards and associated encoding
standards, and also best practices, that enable developers to create information systems that can
easily exchange “geospatial” information and instructions with other information systems.
Requirements range from complex scheduling and control of Earth observation satellites to
displaying simple map images on the Web and encoding location in just a few bytes for geo-
tagging and messaging. A look at the OGC Domain Working Groups shows the wide scope of
current activity in the OGC.
62
The OGC Baseline and OGC Reference Model
The OGC Standards Baseline consists of the OGC standards (for interfaces, encodings, profiles,
application schemas, and best practice documents. The OGC Reference Model (ORM) (describes
these standards and the relationships between them and related ISO standards. The ORM provides
an overview of OGC standards and serves as a useful resource for defining architectures for
specific applications.

In developing a Web services application using OGC standards (and in learning about the
relationships between OGC standards) it helps to think of publish, find and bind as the key
functions for applications in a Web services environment.
 Publish: Resource providers advertise their resources.
 Find: End users and their applications can discover resources that they need at run-time.
 Bind: End users and their applications can access and exercise resources at run-time.

63
Most of the OGC standards developed in recent years are standards for the Web services
environment, and these standards are collectively referred to as OGC Web Services (OWS). The
figure below provides a general architectural schema for OGC Web Services. This schema
identifies the generic classes of services that participate in various geoprocessing and location
activities.
Acronyms in the figure are defined below. Some of these are “OGC standards” and others are
publicly available “Discussion Papers”, “Requests” and “Recommendation Papers”. (Note that
some in work candidate standards are not yet public, but are accessible by OGC members.)
 Catalogue Service for the Web (CSW)
 Filter Encoding (FE)
 Geography Markup Language (GML)
 KML Encoding Standard (KML)
 Sensor Model Language (SensorML)
 Style Layer Descriptor (SLD)
 Sensor Observation Service (SOS)
 Web Coverage Service (WCS)
 Web Feature Service (WFS)
 Web Map Service (WMS)
 Web Processing Service (WPS)
 Sensor Planning Service (SPS)
 Web Terrain Service (WTS)
 Grid Coverage Service
 Coordinate Transformation Service
 Web Coverage Processing Service (WCPS)
 Web Map Tile Service (WMTS)
 Simple Features (SF)
 Sensor Web Enablement (SWE)
 XML for Image and Map Annotation (XIMA)
 CityGML
 GeosciML
 GML in JPEG 2000
 Observations and Measurements (O&M)
 Symbology Encoding
 Transducer Markup Language (TML)

64
2.11 Data Quality
Data quality is a perception or an assessment of data's fitness to serve its purpose in a given
context. The quality of data is determined by factors such as accuracy, completeness, reliability,
relevance and how up to date it is. As data has become more intricately linked with the operations
of organizations, the emphasis on data quality has gained greater attention.
Why data quality is important
Poor-quality data is often pegged as the source of inaccurate reporting and ill-conceived strategies
in a variety of companies, and some have attempted to quantify the damage done. Economic
damage due to data quality problems can range from added miscellaneous expenses when
packages are shipped to wrong addresses, all the way to steep regulatory compliance fines for
improper financial reporting.
An oft-cited estimate originating from IBM suggests the yearly cost of data quality issues in the
U.S. during 2016 alone was about $3.1 trillion. Lack of trust by business managers in data quality
is commonly cited among chief impediments to decision-making.
The demon of poor data quality was particularly common in the early days of corporate
computing, when most data was entered manually. Even as more automation took hold, data
quality issues rose in prominence. For a number of years, the image of deficient data quality was
represented in stories of meetings at which department heads sorted through differing spreadsheet
numbers that ostensibly described the same activity.
Determining data quality
Aspects, or dimensions, important to data quality include: accuracy, or correctness; completeness,
which determines if data is missing or unusable; conformity, or adherence to a standard format;
consistency, or lack of conflict with other data values; and duplication, or repeated records.
As a first step toward data quality, organizations typically perform data asset inventories in which
the relative value, uniqueness and validity of data can undergo baseline studies. Established
baseline ratings for known good data sets are then used for comparison against data in the
organization going forward.
Methodologies for such data quality projects include the Data Quality Assessment Framework
(DQAF), which was created by the International Monetary Fund (IMF) to provide a common
method for assessing data quality. The DQAF provides guidelines for measuring data dimensions
that include timeliness, in which actual times of data delivery are compared to anticipated data
delivery schedules.

65
UNIT III DATA INPUT AND TOPOLOGY 9

Scanner - Raster Data Input – Raster Data File Formats – Vector Data Input –Digitiser – Topology
- Adjacency, connectivity and containment – Topological Consistency rules – Attribute Data
linking – ODBC – GPS - Concept GPS based mapping.

3.1 SCANNER:

Scanner are used to convert from analog maps or photographs to digital image data in raster
format. Digital image data are usually integer-based with one byte gray scale (256 gray tones
from 0 to 255) for black and white image and a set of three gray scales of red (R), green (G) and
blue(B) for color image.

The following four types of scanner are commonly used in GIS and remote sensing.

a. MechanicalScanner
It is called drum scanner since a map or an image placed on a drum is digitized
mechanically with rotation of the drum and shift of the sensor as shown in Figure 3.4(a).
It is accurate but slow.

b. VideoCamera
Video camera with CRT (cathode ray tube) is often used to digitize a small part of map of
firm. This is not very accurate but cheap. (see Figure 3.4(b))

66
c. CCDCamera
Area CCD camera (called digital still camera) instead of video camera will be also convenient to
acquire digital image data (see Figure 3.4 (c)). It is more stable and accurate than video camera.

d. CCDScanner
Flat bed type or roll feed type scanner with linear CCD (charge coupled device) is now commonly
used to digitize analog maps in raster format, either in mono-tone or color mode. It

67
is accurate but expensive.

Table 3.2 shows the performance of major scanners.

3.2 RASTER DATA INPUT:

In its simplest form, a raster consists of a matrix of cells (or pixels) organized into rows and
columns (or a grid) where each cell contains a value representing information, such as
temperature. Rasters are digital aerial photographs, imagery from satellites, digital pictures, or
even scanned maps.

68
Data stored in a raster format represents real-world phenomena:

 Thematic data (also known as discrete) represents features such as land-use or soils data.
 Continuous data represents phenomena such as temperature, elevation, or spectral data
such as satellite images and aerial photographs.
 Pictures include scanned maps or drawings and building photographs.

Thematic and continuous rasters may be displayed as data layers along with other geographic data
on your map but are often used as the source data for spatial analysis with the ArcGIS Spatial
Analyst extension. Picture rasters are often used as attributes in tables—they can be displayed
with your geographic data and are used to convey additional information about map features.

Learn more about thematic and continuous data

While the structure of raster data is simple, it is exceptionally useful for a wide range of
applications. Within a GIS, the uses of raster data fall under four main categories:

 Rasters as basemaps

A common use of raster data in a GIS is as a background display for other feature layers.
For example, orthophotographs displayed underneath other layers provide the map user
with confidence that map layers are spatially aligned and represent real objects, as well as
additional information. Three main sources of raster basemaps are orthophotos from aerial
photography, satellite imagery, and scanned maps. Below is a raster used as a
basemap for road data.
69
 Rasters as surface maps

Rasters are well suited for representing data that changes continuously across a landscape
(surface). They provide an effective method of storing the continuity as a surface. They
also provide a regularly spaced representation of surfaces. Elevation values measured
from the earth's surface are the most common application of surface maps, but other
values, such as rainfall, temperature, concentration, and population density, can also
define surfaces that can be spatially analyzed. The raster below displays elevation—using
green to show lower elevation and red, pink, and white cells to show higher elevations.

 Rasters as thematic maps

Rasters representing thematic data can be derived from analyzing other data. A common
analysis application is classifying a satellite image by land-cover categories. Basically,
this activity groups the values of multispectral data into classes (such as vegetation type)
and assigns a categorical value. Thematic maps can also result from geoprocessing
operations that combine data from various sources, such as vector, raster, and terrain data.
For example, you can process data through a geoprocessing model to create a
70
raster dataset that maps suitability for a specific activity. Below is an example of a
classified raster dataset showing land use.

 Rasters as attributes of a feature

Rasters used as attributes of a feature may be digital photographs, scanned documents, or


scanned drawings related to a geographic object or location. A parcel layer may have
scanned legal documents identifying the latest transaction for that parcel, or a layer
representing cave openings may have pictures of the actual cave openings associated with
the point features. Below is a digital picture of a large, old tree that could be used as an
attribute to a landscape layer that a city may maintain.

Why store data as a raster?

Sometimes you don't have the choice of storing your data as a raster; for example, imagery is only
available as a raster. However, there are many other features (such as points) and measurements
(such as rainfall) that could be stored as either a raster or a feature (vector) data type.

71
The advantages of storing your data as a raster are as follows:

 A simple data structure—A matrix of cells with values representing a coordinate and
sometimes linked to an attribute table
 A powerful format for advanced spatial and statistical analysis
 The ability to represent continuous surfaces and perform surface analysis
 The ability to uniformly store points, lines, polygons, and surfaces
 The ability to perform fast overlays with complex datasets

There are other considerations for storing your data as a raster that may convince you to use a
vector-based storage option. For example:

 There can be spatial inaccuracies due to the limits imposed by the raster dataset cell
dimensions.
 Raster datasets are potentially very large. Resolution increases as the size of the cell
decreases; however, normally cost also increases in both disk space and processing
speeds. For a given area, changing cells to one-half the current size requires as much as
four times the storage space, depending on the type of data and storage techniques used.

Learn more about cell size

 There is also a loss of precision that accompanies restructuring data to a regularly spaced
raster-cell boundary.

Learn more about representing features in a raster dataset

General characteristics of raster data

In raster datasets, each cell (which is also known as a pixel) has a value. The cell values represent
the phenomenon portrayed by the raster dataset such as a category, magnitude, height, or spectral
value. The category could be a land-use class such as grassland, forest, or road. A magnitude
might represent gravity, noise pollution, or percent rainfall. Height (distance) could represent
surface elevation above mean sea level, which can be used to derive slope, aspect, and watershed
properties. Spectral values are used in satellite imagery and aerial photography to represent light
reflectance and color.

72
Cell values can be either positive or negative, integer, or floating point. Integer values are best
used to represent categorical (discrete) data and floating-point values to represent continuous
surfaces. For additional information on discrete and continuous data, see Discrete and
continuous data. Cells can also have a NoData value to represent the absence of data. For
information on NoData, see NoData in raster datasets.

Rasters are stored as an ordered list of cell values, for example, 80, 74, 62, 45, 45, 34, and so on.

The area (or surface) represented by each cell consists of the same width and height and is an
equal portion of the entire surface represented by the raster. For example, a raster representing
elevation (that is, digital elevation model) may cover an area of 100 square kilometers. If there
were 100 cells in this raster, each cell would represent 1 square kilometer of equal width and
height (that is, 1 km x 1 km).

73
The dimension of the cells can be as large or as small as needed to represent the surface conveyed
by the raster dataset and the features within the surface, such as a square kilometer, square foot,
or even square centimeter. The cell size determines how coarse or fine the patterns or features in
the raster will appear. The smaller the cell size, the smoother or more detailed the raster will be.
However, the greater the number of cells, the longer it will take to process, and it will increase
the demand for storage space. If a cell size is too large, information may be lost or subtle patterns
may be obscured. For example, if the cell size is larger than the width of a road, the road may not
exist within the raster dataset. In the diagram below, you can see how this simple polygon feature
will be represented by a raster dataset at various cell sizes.

The location of each cell is defined by the row or column where it is located within the raster
matrix. Essentially, the matrix is represented by a Cartesian coordinate system, in which the rows
of the matrix are parallel to the x-axis and the columns to the y-axis of the Cartesian plane. Row
and column values begin with 0. In the example below, if the raster is in a Universal Transverse
Mercator (UTM) projected coordinate system and has a cell size of 100, the cell location at 5,1
would be 300,500 East, 5,900,600 North.

74
Learn about transforming the raster dataset

Often you need to specify the extent of a raster. The extent is defined by the top, bottom, left, and
right coordinates of the rectangular area covered by a raster, as shown below.

3.3 RASTER DATA FILE FORMATS

The geodatabase is the native data model in ArcGIS for storing geographic information, including
raster datasets, mosaic datasets, and raster catalogs. However, there are many file formats you
can work with that are maintained outside a geodatabase. The following table gives a description
of the supported raster formats (raster datasets) and their extensions and identifies if they are read-
only or if they can also be written by ArcGIS.

Supported data Supports


Format Description Extensions Read / Write
types multiband
Airborne AIRSAR is an Multiple files
Yes
Synthetic instrument designed with an L, C,
Aperture Radar and managed by or P in the Read-only 64-bit complex
(Six
(AIRSAR) NASA's Jet file name
bands)
Polarimetric Propulsion followed by

75
Supported data Supports
Format Description Extensions Read / Write
types multiband
Laboratory (JPL). .dat. For
ArcGIS supports the example:
polarimetric AIRSAR mission_l.dat
data (POLSAR). (L-Band) and
mission_c.dat
(C-Band).
ARC Digitized Distributed on CD-
Raster ROM by the National
Graphics Geospatial-
(ADRG) Intelligence Agency
ADRG Legend (NGA). ADRG is Multiple files
geographically
referenced using the Data file—
equal arc-second extension Yes
raster chart/map *.img or
8-bit unsigned
(ARC) system in *.ovr Read-only (Always
integer
which the globe is three
ADRG divided into 18 Legend bands)
Overview latitudinal bands, or file—
zones. The data extension
consists of raster *.lgg
images and other
graphics generated by
scanning source
documents.
The ArcGIS Desktop Read-only 16-bit signed
No
Advanced ASCII integer
Single file—
Grid format is (Write—
ASCII Grid extension
an ArcGIS Desktop requires ArcGIS 32-bit floating
*.asc No
AdvancedGrid Spatial Analyst point
exchange file. extension)
Multiple files 1-,and 4-bit
Yes
This format provides a unsigned integer
method for reading Data file— 8-, and 16-bit
and displaying extension signed and Yes
decompressed, BIL, *.bil, *.bip, unsigned integer
Band
BIP, and BSQ image or *.bsq
interleaved by
data. By creating an
line (BIL),
ASCII description file Header file—
band
that describes the extension
interleaved by Read and write
layout of the image *.hdr
pixel (BIP),
data, black-and-white,
band 32-bit signed and
grayscale, pseudo Color map Yes
sequential unsigned integer
color, and multiband file—
(BSQ)
image data can be extension
displayed without *.clr
translation into a
proprietary format. Statistics
file—
76
Supported data Supports
Format Description Extensions Read / Write
types multiband
extension
*.stx
The Bathymetric
Yes
Bathymetric Attributed Grid is a Single file—
32-bit floating
Attributed Grid nonproprietary file extension Read-only
point (Always
(BAG) format for storing *.bag
two bands)
bathymetric data.
The Binary Terrain 16-, and 32-bit
No
format was created by signed integer
Single file—
the Virtual Terrain
extension
Project (VTP) to store
*.bt
elevation data in a Read-only
Binary Terrain
more flexible file (Write—
(BT) Projection 32-bit floating
format. The BT format developer only) No
file— point
is flexible in terms of
extension
file size and spatial
*.prj
reference
system.
BMP files are
Bitmap (BMP),
Windows bitmap Single file—
device- Yes
images. They are extension
independent
usually used to store *.bmp
bitmap (DIB) 8-bit unsigned (Limited
pictures or clip art that Read and write
format, or integer to one or
can be moved between World file—
Microsoft three
different applications extension
Windows bands)
on *.bpw
bitmap
Windows platforms.
This is a compressed
Multiple
raster format used in
files—
the distribution of 8-bit unsigned
BSB extensions Read-only Yes
raster nautical charts integer
*.bsb, *.cap,
by MapTech
and *.kap
and NOAA .
Optimized for writing
and reading large files
in a distributed
Directory—
processing and
extension
storage environment. 8-, 16-, and 32-bit
*.crf
In a CRF file, large unsigned/signed
Cloud raster
rasters are broken Read and write integer, 32-bit Yes
format (CRF) Bundle
down into smaller floating point, 64-
files—
bundles of tiles which bit complex
extension
allows multiple
*.bundle
processes to write
simultaneously to a
single raster.
Cloud A COG is a regular Single file— 8-, 16-, and 32-bit
Read and write Yes
Optimized GeoTIFF that has possible file unsigned/signed

77
Supported data Supports
Format Description Extensions Read / Write
types multiband
GeoTIFF been optimized for extensions integer, 32-bit
(COG) being hosted and *.tif, *.tiff, floating point, 64-
worked with on a and *.tff bit complex
HTTP file server.
Optimization depends
on the ability of a
COG to store and
organize raw pixel
data in addition to
utilizing HTTP GET
requests such that
only selected portions
of imagery is obtained
at a time.
Committee on
This format reads
Earth
CEOS SAR image
Observing
files—specifically
Sensors 8-bit unsigned
those radar images .raw Read-only Yes
(CEOS) integer
provided from
Synthetic
Radarsat and ERS
Aperture Radar
data products.
(SAR)
Distributed by the
NGA.
CADRG/ECRG is
geographically
referenced using the
ARC system in which
the globe is divided
into 18 latitudinal
Compressed bands, or zones. The File
ARC Digitized data consists of raster extension is
8-bit unsigned
Raster images and other based on Read-only No
integer
Graphics graphics generated by specific
(CADRG) scanning source product.
documents. CADRG
achieves a nominal
compression ratio of
55:1. ECRG uses
JPEG 2000
compression using a
compression ratio of
20:1
Panchromatic
File
(grayscale) images
Controlled extension is
that have been 8-bit unsigned
Image Base based on Read-only No
georeferenced and integer
(CIB) specific
corrected for
product.
distortion due to
78
Supported data Supports
Format Description Extensions Read / Write
types multiband
topographic relief
distributed by NGA.
Thus, they are similar
to digital orthophoto
quads and have
similar applications,
such as serving as a
base or backdrop for
other data or as a
simple map.
Multiple files

Main raster
image—
extension
*.img

DIGEST datasets are General


digital replicas of information
Digital graphic products file—
Geographic designed for seamless extension
Information worldwide coverage. *.gen
Exchange ASRP data is
Standard transformed into the Georeference
(DIGEST) Arc ARC system and file—
1-, 4-, and 8-bit
Standard divides the earth's extension Read-only No
unsigned integer
Raster Product surface into *.ger
(ASRP), latitudinal zones.
UTM/UPS USRP data is Source file—
Standard referenced to UTM or extension
Raster Product UPS coordinate *.sou
(USRP) systems. Both are
based on the WGS84 Quality
datum. file—
extension
*.qal

Transmission
header file—
extension
*.thf
The DIMAP format is 8-, and 16-bit
Yes
an open format in the unsigned integer
public domain;
Digital Image however, its primary
Directory Read-only
Map purpose was for the 16-bit signed
Yes
distribution of data integer
from the SPOT
satellite. The format
79
Supported data Supports
Format Description Extensions Read / Write
types multiband
is composed of a
GeoTIFF file and a
metadata file.
Single file—
various file
extensions
A simple, regularly
*.dt0, *.dt1, Read
Digital Terrain spaced grid of
*.dt2. All
Elevation Data elevation points based 16-bit signed
possible file Write using the No
(DTED) Level on 1 degree latitude integer
extensions Raster to DTED
0, 1, and 2 and longitude extents.
are available tool
Created by NGA.
by default
(*.dt0, *.dt1,
*.dt2).
Earth
Yes
Resources This format is from
Single file—
Laboratory the ELAS remote 8-bit unsigned
extension Read-only (Always
Applications sensing system used integer
*.elas three
Software within NASA.
bands)
(ELAS)
Distributed by the
NGA.
CADRG/ECRG is
geographically
referenced using the
ARC system in which
the globe is divided
into 18 latitudinal
Enhanced bands, or zones. The File Yes
Compressed data consists of raster extension is
8-bit unsigned
ARC Raster images and other based on Read-only (Always
integer
Graphics graphics generated by specific three
(ECRG) scanning source product. bands)
documents. CADRG
achieves a nominal
compression ratio of
55:1. ECRG uses
JPEG 2000
compression using a
compression ratio of
20:1.
ECW is a propriatary
Enhanced
format. It is a wavelet- Single file—
Compressed 8-bit unsigned
based, lossy extension Read-only Yes
Wavelet integer
compression, similar *.ecw
(ECW)
to JPEG 2000.
Envisat
Multiple data 8-bit unsigned
Envisat (Environmental Read-only Yes
files integer
Satellite) is an Earth-
80
Supported data Supports
Format Description Extensions Read / Write
types multiband
observing satellite
operation by the
European Space
agency (ESA). This
format supports
Advanced Synthetic
Aperture Radar
(ASAR) Level 1 and
above products, and
some Medium
Resolution Imaging
Spectrometer
(MERIS) and
Advanced Along
Track Scanning
Radiometer (AATSR)
products.
When ENVI works Header file— 8-bit unsigned
Yes
with a raster dataset it extension integer
Read and write
creates a header file *.hdr
containing the
(.dat—via UI) 16-, and 32-bit
information the Multiple data
ENVI Header unsigned/signed
software requires. files—
(.bsq, .img, and integer, and 32-, Yes
This header file can be extension
.raw— and 64-bit floating
created for multiple *.raw, *.img,
developer only point
raster file formats. *.dat, *.bsq,
The Earth etc.
Observation Satellite
(EOSAT) FAST
format support
consists of the Single file—
8-bit unsigned
EOSAT FAST following: FAST- extension Read-only Yes
integer
L7A (Landsat TM) *.fst
and FAST Rev. C.
(IRS).

Multiple files

Header file—
A proprietary raster
extension
format from ER
*.ers
Mapper. Produced 8-, 16-, and 32-bit
ER Mapper using the ER Mapper unsigned/signed
Data file— Read-only Yes
ERS image processing integer, and 32-bit
usually same
software. floating point
as header file
without the
*.ers
extension but
could be any
81
Supported data Supports
Format Description Extensions Read / Write
types multiband
and is
defined in the
header file.
Multiple files

Data file—
Single-band thematic extension
1-, 2-, 4-, 8-, and
ERDAS 7.5 images produced by *.GIS
Read-only 16-bit unsigned No
GIS ERDAS 7.5 image
integer
processing software. Color map
file—
extension
*.trl
Single- or multiband
continuous images Single file—
ERDAS 7.5 8- and 16-bit
produced by the extension Read-only Yes
LAN unsigned integer
ERDAS 7.5 image *.lan
processing software.
Provides a method for
reading and
displaying files that
are not otherwise
supported by another
format but are
formatted in such a
way that the
arrangement of the
data can be described
by a relatively small Single file— 1-, 2-, 4-, 8-, and
ERDAS 7.5
number of extension Read-only 16-bit unsigned No
RAW
parameters. By *.raw integer
creating an ASCII file
that describes the
layout of the raster
data, it can be
displayed without
translation in a
proprietary format.
The format is defined
in the ERDAS
IMAGINE software.
Produced using Single file— 1-, 2-, and 4-bit
Yes
IMAGINE image extension unsigned integer
processing software *.img 8-, 16-bit
ERDAS
created by ERDAS. Read and write unsigned/signed Yes
IMAGINE
IMAGINE files can If image is integer
store both continuous bigger than 2 32-bit
and discrete single- GB— Yes
unsigned/signed

82
Supported data Supports
Format Description Extensions Read / Write
types multiband
band and multiband extension integer
data. *.ige 32-bit floating
Yes
point
World file—
extension 64-bit double
Yes
*.igw precision
A 32-bit signed
No
proprietary Esriformat integer
that supports 32-bit
integer and 32-bit
floating-point raster
grids. Grids are useful Directory
for representing
geographic color map
Esri Grid Read and write
phenomena that vary file— 32-bit floating
No
continuously over extension point
space and for *.clr
performing spatial
modeling and analysis
of flows, trends, and
surfaces such as
hydrology.
Used to reference 32-bit signed
Yes
multiple Esri Grids as integer
a multiband raster
Esri Grid stack dataset. A stack is Directory Read and write
32-bit floating
stored in a directory Yes
point
structure similar to a
grid or coverage.
Used to reference 32-bit signed
Yes
multiple Esri Grids as integer
a multiband raster
dataset. A stack file is Single file—
Esri Grid stack a simple text file that possible file
Read and write
file stores the path and extension 32-bit floating
Yes
name of *.stk point
each EsriGrid
contained within it on
a separate line.
Format used for
storing data
representing n-
Extensible N- dimensional arrays of
Directory—
Dimensional numbers, such as 8-bit unsigned
extension Read-only Yes
Data images. Uses integer
*.sdf
Format(NDF) container files
(directories
containing files and
directories) to manage
83
Supported data Supports
Format Description Extensions Read / Write
types multiband
the data objects.
A floating-point file is Read-only
a binary file of Single file— (Write—Only
Floating point 32-bit floating
floating-point values extension via Raster To No
file point
that represent raster *.flt Float tool, or
data. developer code)
This is a file format
created by the
Geospatial Data
8-, 16-, and 32-bit
Abstraction Library Single file— Read-only
GDAL Virtual unsigned integer
(GDAL). It allows a extension (Write— Yes
Format (VRT) and 64-bit complex
virtual dataset to be *.vrt developer only)
integer
derived from other
datasets that GDAL
can read.
The geodatabase is 1-, and 4-bit
Yes
the native data unsigned integer
structure for ArcGIS 8-bit
and is the primary unsigned/signed Yes
data format for integer
representing and 16-bit
managing geographic unsigned/signed Yes
Raster
information, integer
Geodatabase datasets
including raster Read and write 32-bit
Raster stored within
datasets and mosaic unsigned/signed
*.gdb folder Yes
datasets. integer, or floating
point
The geodatabase is a 32-bit floating
collection of various Yes
point
types of GIS datasets
held in a file system 64-bit double
Yes
folder. precision
There are three types
of Golden Software
Grids that are
supported: Golden Read-only
Golden Software ASCII Single file— (Write— 32-bit floating
Software Grid GRID (GSAG), extension developer only point and 64-bit No
(.grd) Golden Software *.grd for GSAG and double precision
Binary Grid (GSBG), GSBG)
and Golden Software
Surfer 7 Binary Grid
(GS7BG).
Single file—
Graphic A bitmap image extension
8-bit unsigned
Interchange format generally used *.gif Read and write No
integer
Format (GIF) for small images.
World file—

84
Supported data Supports
Format Description Extensions Read / Write
types multiband
extension
*.gfw
The gridded binary
format is used for the
storage, transmission,
and manipulation of
meteorological
archived data and Single file—
64-bit double
GRIB forecast data. The extension Read-only Yes
precision
World Meteorological *.grb
Organization (WMO)
is responsible for the
design and
maintenance of this
format standard.
The next generation
standard for GRIB. Single file—
64-bit double
GRIB 2 extension Read-only No
precision
Subdataset support is *.grb2
not available.
This is an ASCII
format, primarily used Single file—
Grid eXchange 32- and 64-bit
in Geosoft . The extension Read-only No
File floating point
GXF Revision 3 *.gxf
(GXF-3) is supported.
Single file—
extension
This is a compressed
*.hf2
heightfield format 32-bit floating
Heightfield Read-only No
used to support terrain point
Gzip file—
data as a raster.
extension
*.hfz
Raw SRTM height
files containing
elevation measured in
meters above sea Single file—
16-bit signed
HGT level, in a geographic extension Read-only No
integer
projection (latitude *.hgt
and longitude array),
with voids indicated
using -32768.
A self-defining file 8-, and 16-bit
Yes
format used for signed integer
Hierarchical storing arrays of Single file— Read-only 8-, and 16-bit
Yes
Data Format multidimensional extension (Write— unsigned integer
(HDF) 4 data. *.h4 or *.hdf developer only) 32-bit
unsigned/signed Yes
Subdataset support is integer, or floating

85
Supported data Supports
Format Description Extensions Read / Write
types multiband
not available. point
8-, and 16-bit
Yes
signed integer
The next generation
Single file— 8-, and 16-bit
Hierarchical standard for HDF. Yes
extension unsigned integer
Data Format Read-only
*.h5 or 32-bit
(HDF) 5 Subdataset support is
*.hdf5 unsigned/signed
not available. Yes
integer, or floating
point
HRE data is intended
for a wide variety of
National Geospatial-
Intelligence Agency
(NGA) and National
System for Geospatial
Intelligence (NSG)
partners and
members, and
Multiple files
customers external to
the NSG, to access
Raw image—
High and exploit
extension 16-bit signed
Resolution standardized data
*.hr* Read-only integer and 32-bit No
Elevation products. HRE data
floating point
(HRE) replaces the current
Metadata—
non-standard High
extension
Resolution Terrain
*.xml
Elevation/Information
(HRTE/HRTI)
products and also
replaces non-standard
products referred to as
DTED level 3 thru 6.

This data format is


similar to NITF.
Multiple files 8-bit unsigned
Yes
integer
Raw image— 16-bit signed
extension Yes
integer
*.rst
Read-only
IDRISI Raster File format native to
(Write—
Format (RST) IDRISI . Descriptor—
developer only)
extension 32-bit floating
*.rdc Yes
point

Color map—
extension

86
Supported data Supports
Format Description Extensions Read / Write
types multiband
*.smp

Georeference
file—
extension
*.ref
Raster Byte, 16-, and 32-
ILWIS format for
Map—*.mpr, bit unsigned
ILWIS raster maps and map Read-only Yes
Maplist— integer, and 64-bit
lists.
*.mpl floating point
Image Display Single file—
File format used by
and Analysis extension Read-only 8-bit binary No
WinDisp 4.
(IDA) *.img
ISIS Cube format as
created by the United
Integrated
States Geological
Software for Single file— 8-bit unsigned
Survey (USGS) for
Imagers and extension Read-only integer and 32-bit No
the mapping of
Spectrometers *.cub floating point
planetary imagery.
(ISIS)
Versions 2 and 3 are
supported.
Binary
Intergraph's
imagery—
Intergraph CIT proprietary format for Read-only 1-bit No
extension
16-bit imagery (CIT).
*.cit
Intergraph's Grayscale
Intergraph proprietary format for imagery— 8-bit unsigned
Read-only No
COT 8-bit unsigned extension integer
imagery (COT). *.cot
This format was
created by the
Japanese
Japanese Aerospace
Aerospace Single file—
eXploration Agency
eXploration extension 16-bit signed
(JAXA) to store data Read-only Yes
Agency *1.5GUD or integer
from processed
(JAXA) *1.1 A
PALSAR data. Level
PALSAR
1.1 and Level 1.5 are
supported.
A standard
Single file—
compression
Joint extension Yes
technique for storing
Photographic *.jpg, *.jpeg,
full-color and
Experts Group *.jpc, or *.jpe 8-bit unsigned (Limited
grayscale images. Read and write
(JPEG) File integer to one or
Support for JPEG
Interchange World file— three
compression is
Format (JFIF) extension bands)
provided through the
*.jgw
JFIF file format.
JPEG 2000 A compression Single file— Read and write 8-, and 16-bit Yes

87
Supported data Supports
Format Description Extensions Read / Write
types multiband
technique especially extension unsigned integer
for maintaining the *.jp2, *.j2c,
quality of large *.j2k, or
imagery. Allows for a *.jpx
high-compression
ratio and fast access
to large amounts of
data at any scale.
Magellan's BLX/XLB
file format is
primarily used for
storing topographic
data. The tile size for
these files must be a
multiple of 128 by
Magellan
128 pixels. The Single file— Read-only
Mapsend 16-bit signed
projection for these extension (Write— No
BLX/XLB integer
files is WGS84. *.blx or *.xlb developer only)
format
When the file is
ordered using
littleendian, the file
extension is BLX. If
bigendian is used, the
file extension is XLB
format.
8-bit unsigned Single
integer band
Single file— Read-only
PCRaster's raster 32-bit signed Single
MAP extension (Write—
format. integer band
*.map developer only)
32-bit floating Single
point band
A map cache created
by ArcGIS Server can
be viewed as a single
Read-only (in
raster dataset. You
Map service Desktop); write 8-bit unsigned
cannot build pyramids Directory Yes
cache using ArcGIS integer
or calculate statistics.
Server
It should not be used
for any analysis or
processing.
MRF is a technology Multiple files
developed
8-, 16-, and 32-bit
by NASAthat Raw image—
unsigned/signed
Meta Raster combines raster extension
Read and write integer, 32-bit Yes
Format (MRF) storage with cloud *.mrf
floating point, 64-
computing for tiling,
bit complex
indexing and multiple Metadata—
resolutions support. It extension

88
Supported data Supports
Format Description Extensions Read / Write
types multiband
is a raster storage *.xml
format, but also a tile
cache format for web Index—
services and a extension
dynamic tile cache for *.idx
another raster.
A proprietary
compression
technique especially
Yes
for maintaining the
quality of large
Generation
images. Allows for a Single file—
2 and
Multiresolution high compression extension
generation
Seamless ratio and fast access *.sid
8-, and 16-bit 3—limited
Image to large amounts of Read-only
unsigned integer to 1 or 3
Database data at any scale. The World file—
bands
(MrSID) MrSID Encoder is extension
developed and *.sdw
Generation
supported
4—
by LizardTech, Inc.
unlimited
Supports generations
2, 3, and 4.
MrSID generation 4
(MG4) format, used
Single file—
to support point
extension
clound (lidar) data.
*.sid
Will be rendered as a
64-bit double
MrSID Lidar raster. Read-only No
Optional precision
View file—
View files can be
extension
used to define how the
*.view
point cloud data will
be viewed.
National A collection of
Imagery standards and
1-, 8-, and 16-bit
Transmission specifications that Yes
unsigned integer
Format (NITF) allow interoperability
2.0 in the dissemination
of imagery and its Single file— 1-, 2-, 4-, 8-, and
metadata among extension Read-only 16-bit unsigned Yes
various computer *.ntf integer
NITF 2.1/NSIF systems. Developed
1.0 by the NGA. 1-, 2-, 4-, 8-, and
16-bit signed Yes
Subdataset support is integer
not available.
National Land The NLAPS data Multiple files 8-bit unsigned
Read-only Yes
Archive format (NDF) is used integer
89
Supported data Supports
Format Description Extensions Read / Write
types multiband
Production by the USGS to Main File
System distribute their (Header)—
(NLAPS) Landsat MSS and TM extension
data. *.H1, *.H2,
or *.HD

Image data—
extension
*.I1, *.I2,
etc.
Support for NOAA's
Polar Orbiter Data
Multiple
(POD); specicically
NOAA Polar Files—
for the Advanced 16-bit unsigned
Orbiter Level extension Read-only Yes
Very High Resolution integer
1b *.1b, *.sv,
Radiometer
*.gc
(AVHRR) Level 1b
digital data.
This is a simple ASCII 8-bit, 16-bit
Single File
file used and created unsigned integer,
PCI .aux — extension Read-only Yes
by PCI to read raw 16-bit signed
*.aux
binary raster data. integer, and 32-bit
8-, and 16-bit
Yes
unsigned integer
Single file— Read-only
PCI Geomatics 16-bit signed
PCIDSK extension (Write— Yes
raster dataset format. integer
*.pix developer only)
32-bit floating
Yes
point
The Planetary Data
System (PDS) is
Single file—
managed by NASA to
possible
Planetary Data archive and distribute 16-bit signed
extensions Read-only No
System (PDS) data from its integer
*.img and
planetary missions.
*.lbl
PDS version 3 is
supported.
Provides a well- 1-, 2-, and 4-bit
Read and write unsigned integer No
compressed, lossless
compression for raster *Yes
Note:ArcGIS is
files. It supports a (limited to
able to read an
Portable large range of bit one or
Single file— alpha band from
Network depths from 8-bit unsigned three
extension an existing
Graphics monochrome to 64-bit integer bands
*.png PNG; however,
(PNG) color. Its features only, no
it will only
include indexed color alpha
write one- or
images of up to 256 channel)
three-band PNG
colors and effective 16-bit unsigned *Yes
files.
100 percent lossless integer (limited to

90
Supported data Supports
Format Description Extensions Read / Write
types multiband
images of up to 16 bits one or
per pixel. three
bands
only, no
alpha
channel)
RADARSAT-2
satellite produces 16-bit unsigned
RADARSAT-2 imagery using the C- Directory Read-only integer and 32-bit Yes
band SAR and X- complex integer
band frequencies.
Raster Product 8-bit unsigned
No
Format (RPF) integer
8-bit unsigned
RPF (CIB) No
integer
8-bit unsigned
RPF (CADRG) Yes
The underlying Single file— integer
format of CADRG no standard Read-only Yes
and CIB. file extension
8-bit unsigned
RPF (ADRG) (Always
integer
three
bands)
8-bit unsigned
RPF (ADRG) No
integer
SAGA binary grid
datasets are composed 8-, 16-, and 32-bit
Multiple
of an ASCII header unsigned, integer,
files—
SAGA GIS (.sgrd) and a binary 16-, and 32-bit
extension Read-only Yes
Binary Grid data (.sdat) file with a signed integer, 32-,
*.sdat and
common basename. and 64-bit floating
*.sgrd
Select the .sdat file to point
access the dataset.
Sandia National
Laboratories created a
Sandia
complex image Single file —
Synthetic 16-, and 32-bit
format to extension Read-only Unknown
Aperture Radar complex integer
accommodate the data *.gff
(GFF)
from its synthetic
aperture radar.
The HGT format is
used to store elevation
Shuttle Radar data from the Shuttle
Single file— Read-only
Topography Radar Topography 32-bit signed
extension (Write— No
Mission Mission (SRTM). integer
*.hgt developer only)
(SRTM) SRTM-3 and SRTM-
1 v2 files can be
displayed.
Spatial Data The Spatial Data Multiple Read-only 16-bit signed No
91
Supported data Supports
Format Description Extensions Read / Write
types multiband
Transfer Transfer Standard files— integer or 32-bit
Standard (SDTS) was created extension floating point
(SDTS) by the USGS. The *.ddf
digital purpose of this format
elevation was to transfer digital
model (DEM) geospatial data
between various
computer systems in a
compatible format
that would not lose
any information.
Widespread use in the 1-bit unsigned
No
desktop publishing integer
world. It serves as an 4-bit signed integer No
interface to several Single file— 8-bit unsigned
scanners and graphic possible file Yes
integer
Tagged Image arts packages. TIFF extensions 8-bit signed integer No
File Format supports black-and- *.tif, *.tiff, 16-bit unsigned
(TIFF) white, grayscale, and *.tff Read and write Yes
integer
(GeoTIFF tags pseudo color, and true
16-bit signed
are supported) color images, all of World file— No
integer
which can be stored in extension
a compressed or *.tfw 32-bit
decompressed format. unsigned/signed
No
integer, or floating
BigTIFF is supported. point
The Terragen Terrain Single file—
file was created by possible file Read-only
Terragen 16-bit signed
Planetside Software. extensions (Write— No
terrain integer
It stores elevation *.ter, developer only)
data. *.terrain
The TerraSAR-X
radar satellite
16-bit unsigned
produces earth
TerraSAR-X Directory Read-only integer and 32-bit No
observation data
complex integer
using the X-band
SAR.
National Oceanic and
Atmospheric
Single file —
Transformation Administration's 32-bit floating
extension Read-only No
Grids (NOAA's) files used point
*.gtx
for shifting a vertical
datum.
United States This format consists Single file—
Geological of a raster grid of extension
Survey regularly spaced *.dem (need 16-bit signed
Read-only No
(USGS) digital elevation values to change integer
elevation derived from the .dat
model (DEM) USGS topographic extension to
92
Supported data Supports
Format Description Extensions Read / Write
types multiband
map series. In their .dem)
native format, they are
written as ANSI-
standard ASCII
characters in fixed-
block format.
Single file—
USGS Digital This is the new, possible file
Orthophoto labeled DOQ (DOQ2) extensions 8-bit unsigned
Read-only Yes
Quadrangels format from the *.doq, *.nes, integer
(DOQ) USGS. *.nws, *.ses,
*.sws
Stores color images in
Single file— Read-only
XPixMap a format consisting of 8-bit unsigned
extension (Write— No
(XPM) an ASCII image and a integer
*.xpm developer only)
C library.

Bit depth capacity for supported export formats

= is supported

NA = not applicable

Bit depth capacity for supported raster export formats


IMG TIFF GRID JPEG JP2 BMP GIF PNG BIL/BIP/BSQ DAT Note
Converts to 8-bit for JPEG,
1-bit NA NA JP2, BMP, GIF, PNG, BIL,
BIP, BSQ, and DAT.
2-bit NA NA NA NA NA NA NA NA NA
Converts to 8-bit for JPEG,
4-bit NA NA JP2, BMP, GIF, PNG, BIL,
BIP, BSQ, and DAT.
4-bit Converts to 8-bit for BMP,
color NA NA NA NA GIF, PNG, BIL, BIP, and
map BSQ.
8-bit
unsigned
8-bit
NA NA NA NA NA NA
signed
8-bit
color NA NA NA
map
8-bit 3
NA NA
band
8-bit > 3
NA NA NA NA
band

93
16-bit
NA NA NA
unsigned
16-bit
NA NA NA NA
signed
32-bit
NA NA NA NA NA
unsigned
32-bit
NA NA NA NA NA
signed
32-bit
floating NA NA NA NA
point

3.4 VECTOR DATA INPUT

Tablet digitizers with a free cursor connected with a personal computer are the most common
device for digitizing spatial features with the planimetric coordinates from analog maps. The
analog map is placed on the surface of the digitizing tablet as shown in Figure 3.2. The size of
digitizer usually ranges from A3 to A0 size.

The digitizing operation is as follows.


Step 1 : a map is affixed to a digitizing table.
Step 2 : control points or tics at four corners of this map sheet should be digitized by the
digitizer and input to PC together with the map coordinates of the four corners.
Step 3 : map contents are digitized according to the map layers and map code system in either
point mode or stream mode at short time interval.
Step 4 : editing errors such as small gaps at line junctions, overshoots, duplicates etc. should be
made for a clean dataset without errors.
Step 5 : conversion from digitizer coordinates to map coordinates to store in a spatial database.

Major problems of map digitization are :


- the map will stretch or shrink day by day which makes the newly digitized points slightly off
from the previous points.
- the map itself has errors
- discrepancies across neighboring map sheets will produce disconnectivity.

94
3.5 DIGITIZERS

 Digitizers are the most common device for extracting spatial information from maps and
photographs
o the map, photo, or other document is placed on the flat surface of the digitizing
tablet

Hardware

 the position of an indicator as it is moved over the surface of the digitizing tablet is
detected by the computer and interpreted as pairs of x,y coordinates
o the indicator may be a pen-like stylus or a cursor (a small flat plate the size of a
hockey puck with a cross-hair)
 frequently, there are control buttons on the cursor which permit control of the system
without having to turn attention from the digitizing tablet to a computer terminal
 digitizing tablets can be purchased in sizes from 25x25 cm to 200x150 cm, at
approximate costs from $500 to $5,000
 early digitizers (ca. 1965) were backlit glass tables
95
o a magnetic field generated by the cursor was tracked mechanically by an arm
located behind the table
o the arm's motion was encoded, coordinates computed and sent to a host
processor
o some early low-cost systems had mechanically linked cursors - the free-cursor
digitizer was initially much more expensive
 the first solid-state systems used a spark generated by the cursor and detected by linear
microphones
o problems with errors generated by ambient noise
 contemporary tablets use a grid of wires embedded in the tablet to generate a magnetic
field which is detected by the cursor
o accuracies are typically better than 0.1 mm
o this is better than the accuracy with which the average operator can position the
cursor
o functions for transforming coordinates are sometimes built into the tablet and
used to process data before it is sent to the host

The digitizing operation

 the map is affixed to a digitizing table


 three or more control points ("reference points", "tics", etc.) are digitized for each map
sheet
o these will be easily identified points (intersections of major streets, major peaks,
points on coastline)
o the coordinates of these points will be known in the coordinate system to be used
in the final database, e.g. lat/long, State Plane Coordinates, military grid
o the control points are used by the system to calculate the necessary mathematical
transformations to convert all coordinates to the final system
o the more control points, the better
 digitizing the map contents can be done in two different modes:
o in point mode, the operator identifies the points to be captured explicitly by
pressing a button
o in stream mode points are captured at set time intervals (typically 10 per second)
or on movement of the cursor by a fixed amount
 advantages and disadvantages:
o in point mode the operator selects points subjectively
 two point mode operators will not code a line in the same way
o stream mode generates large numbers of points, many of which may be
redundant
o stream mode is more demanding on the user while point mode requires some
judgement about how to represent the line
 most digitizing is currently done in point mode

Problems with digitizing maps

 arise since most maps were not drafted for the purpose of digitizing
o paper maps are unstable: each time the map is removed from the digitizing table,
the reference points must be re-entered when the map is affixed to the table again
o if the map has stretched or shrunk in the interim, the newly digitized points will
be slightly off in their location when compared to previously digitized points

96
o errors occur on these maps, and these errors are entered into the GIS database as
well
o the level of error in the GIS database is directly related to the error level of the
source maps
 maps are meant to display information, and do not always accurately record locational
information
o for example, when a railroad, stream and road all go through a narrow mountain
pass, the pass may actually be depicted wider than its actual size to allow for the
three symbols to be drafted in the pass
 discrepancies across map sheet boundaries can cause discrepancies in the total GIS
database
o e.g. roads or streams that do not meet exactly when two map sheets are placed next
to each other
 user error causes overshoots, undershoots (gaps) and spikes at intersection of lines

diagram

 user fatigue and boredom


 for a complete discussion on the manual digitizing process, see Marble et al, 1984

Editing errors from digitizing

 some errors can be corrected automatically


o small gaps at line junctions
o overshoots and sudden spikes in lines
 error rates depend on the complexity of the map, are high for small scale, complex maps
 these topics are explored in greater detail in later Units
o Unit 13 looks at the process of editing digitized data
o Units 45 and 46 discuss digitizing error

Digitizing costs

 a common rule of thumb in the industry is one digitized boundary per minute
o e.g. it would take 99/60 = 1.65 hours to digitize the boundaries of the 99 counties
of Iowa

3.6 TOPOLOGY

Topology expresses explicitly the spatial relationships between connecting or adjacent vector
features (points, polylines and polygons) in a GIS, such as two lines meeting perfectly at a point
and directed line having an explicit left and right side.

Topological or topology based data are useful for detecting and correcting digitizing error in
geographic data set and are necessary for some GIS analyses.

97
Topologic data structures help insure that information is not unnecessarily repeated. The database
stores one line only in order to represent a boundary (as opposed to two lines, one for each
polygon). The database tells us that the line is the “left side” of one polygon and the “right side”
of the adjacent polygon.

Topology is the study of those properties of geometric objects that remain invariant under certain
transformations such as bending or stretching.

Topology is often explained through graph theory. Topology has least two main advantages.

(i) The assurance of data quality

(ii) Enhance GIS analysis

Topological relationships are built from simple elements into complex elements: points (simplest
elements), arcs (sets of connected points), areas (sets of connected arcs), and routes (sets of
sections, which are arcs or portions of arcs).

Components of Topology
Topology has three basic components:

1. Connectivity (Arc – Node Topology):

 Points along an arc that define its shape are called Vertices.
 Endpoints of the arc are called Nodes.
 Arcs join only at the Nodes.
2. Area Definition / Containment (Polygon – Arc Topology):

 An enclosed polygon has a measurable area.


 Lists of arcs define boundaries and closed areas are maintained.
 Polygons are represented as a series of (x, y) coordinates that connect to define an area.
3. Contiguity:

 Every arc has a direction


 A GIS maintains a list of Polygons on the left and right side of each arc.
 The computer then uses this information to determine which features are next to one
another.

98
1. Topology in different GIS Format

Coverage

Coverage is a topology based vector data format. Coverage can be a point coverage, line coverage,
or polygon coverage.

The coverage model supports three basic topological relationships.

· Connectivity: Arc connects to each other at nodes.

· Area definition: An Area is defined by a series of connected arcs.

· Contiguity: Arcs have directions and left and right polygon.

Figure: Diagram showing the coverage data structure for storing vector data.

Shapefile

Shapefile is a standard non topological data format. Shape file are a first attempt an object spatial
features.They are very simple floating point geometry feature. A Shapefile is a digital vector
storage format for storing geometric location and associated attribute information.

A Shapefile is actually a set of several files

· .shp — shape format; the feature geometry itself


· .shx — shape index format; a positional index of the feature geometry to allow seeking
forwards and backwards quickly
99
· .dbf — attribute format; columnar attributes for each shape, in dBase III format
· .prj — projection format; the coordinate system and projection information, a plain text file
describing the projection using well-known text format
· .sbn- This is a binary spatial index file, which is used only by ESRI software
· .sbx — a spatial index of the features
· .shp.xml — metadata in XML format
The geometry of a shapefile is stored in two basic files .shp and .shx:

DXF (Drawing exchange format)

It maintains data in separate layers. But it does not support topology. It is AutoCAD format.

Geodatabase

A geodatabase is a relational database that store geographic information. It is Object-oriented


model not a Georelational.

A relational database is a collection of tables logically associated with each other by common
key attribute field.

A geodatabase can store geographic information because, besides storing a number or a string in
a attribute field; tables in a geodatabase can also store geometric coordinates to define the shape
and locations of points, lines or polygon.

ArcGIS supports five physical implementations of the geodatabase:


A file geodatabase, an Access based personal geodatabase, as well as a personal, a workgroup
and an enterprise geodatabase.

A personal geodatabase is file with extension .mdb, which is the file extension used by Microsoft
access. A file geodatabase is folder in which file stored with .gdb extension.

Georelational Object based


Topological Coverage Geodatabase
Non-Topological Shapefile Geodatabase

Topological Error

Nowadays, production, storage and usage of maps in digital environments either in


photogrammetric or classical methods are quite common (Burrough, 1997). The most common

100
method of producing vector maps is the precise scanning of analog maps into raster formats and
then digitizing into vector forms (Grimshaw, 1994).

During digitization process Topological Errors can be inserted in vector data.

Topological errors with polygon features can include unclosed polygons, gaps between polygon
borders or overlapping polygon borders. A common topological error with polyline features is
that they do not meet perfectly at a point (node). This type of error is called an undershoot if a
gap exists between the lines, and an overshoot if a line ends beyond the line it should connect to.

3.7 ADJACENCY, CONNECTIVITY AND CONTAINMENT

Containment is the property that defines one entity as being within another. For example,
if an isolated node (representing a household) is located inside a face (representing a
congressional district) in the MAF/TIGER database, you can count on it remaining inside that
face no matter how you transform the data. Topology is vitally important to the Census Bureau,
whose constitutional mandate is to accurately associate population counts and characteristics with
political districts and other geographic areas.
Connectedness refers to the property of two or more entities being connected. Recall the
visual representation of the geometric primitives in Figure 6.3. Topologically, node N14 is not
connected to any other nodes. Nodes N9 and N21 are connected because they are joined by edges
E10, E1, and E10. In other words, nodes can be considered connected if and only if they are
reachable through a set of nodes that are also connected; if a node is a destination, we must have
a path to reach it.
Connectedness is not immediately as intuitive as it may seem. A famous problem related to
topology is the Königsberg bridge puzzle (Figure 6.5).
Try This: Can you solve the Königsberg bridge problem?

Figure 6.5: The seven bridges of Königsberg bridge puzzle.


Credit: Euler, L. "Solutio problematis ad geometriam situs pertinentis." Comment. Acad. Sci.
U. Petrop. 8, 128-140, 1736. Reprinted in Opera Omnia Series Prima, Vol. 7. pp. 1-10, 1766.

101
The challenge of the puzzle is to find a route that crosses all seven bridges, while respecting the
following criteria:

1. Each bridge must be crossed;


2. A bridge is a directional edge and can only be crossed once (no backtracking);
3. Bridges must be fully crossed in one attempt (you cannot turn around halfway, and then do
the same on the other side to consider it “crossed”).
4. Optional: You must start and end at the same location. (It has been said that this was a
traditional requirement of the problem, though it turns out that it doesn’t actually matter – try
it with and without this requirement to see if you can discover why.)
Take some time to see if you can figure out the solution. When you’ve found the answer or
given up, scroll down the page to see the correct solution to the problem.
Did you find the route that crosses all seven bridges and meets the above criteria? If not, you got
the right answer; there is no such route. Euler proved, in 1736, that there was no solution to this
problem. In fact, his techniques paved the way for graph theory, an important area of
mathematics and computer science that deals with graphs and connections. Graph theory is
beyond the scope of this course, but it does have applications to geography. Interested readers
can learn more about graph theory at Diestel Graph Theory.
The property of adjacency relates to entities being directly next to one another. In Figure
6.3, all of the faces are adjacent. This is easy to determine: if two faces share an edge, they are
adjacent. Adjacency becomes less intuitive with other entities, however. See Figure 6.6 for an
example of adjacency and why it cannot be simply assessed from a visual perspective:

Figure 6.6: Because nodes are zero-dimensional, they cannot be adjacent.


Credit: Joshua Stevens, Department of Geography, The Pennsylvania State University.

At first, the two nodes in Figure 6.6 might look like they are adjacent. Zooming in or tilting the
plane of view reveals otherwise. This is because nodes, as points made from coordinate pairs, do
not have a length or width; they are size-less and shapeless. Without any size or dimensionality,
it is impossible for nodes to be adjacent. The only way for two nodes to ‘touch’ would be for them
to have the exact same coordinates – which then means that there aren’t really two nodes, just
one that has been duplicated.

This is exactly why features in the MAF/TIGER database are represented only once. As David
Galdi (2005) explains in his white paper “Spatial Data Storage and Topology in the Redesigned

102
MAF/TIGER System,” the “TI” in TIGER stands for “Topologically Integrated.” This means that
the various features represented in the MAF/TIGER database—such as streets, waterways,
boundaries, and landmarks (but not elevation!)—are not encoded on separate “layers.” Instead,
features are made up of a small set of geometric primitives — including 0- dimensional nodes
and vertices, 1-dimensional edges, and 2-dimensional faces —without redundancy. That means
that where a waterway coincides with a boundary, for instance, MAF/TIGER represents them
both with one set of edges, nodes and vertices. The attributes associated with the geometric
primitives allow database operators to retrieve feature sets efficiently with simple spatial queries.
To accommodate this efficient design and eliminate the need for visual or mental exercises in
order to determine topological states, the MAF/TIGER structure abides by very specific rules that
define the relations of entities in the database (Galdi 2005):

1. Every edge must be bounded by two nodes (start and end nodes).
2. Every edge has a left and right face.
3. Every face has a closed boundary consisting of an alternating sequence of nodes and edges.
4. There is an alternating closed sequence of edges and faces around every node.
5. Edges do not intersect each other, except at nodes.
Compliance with these topological rules is an aspect of data quality called logical consistency. In
addition, the boundaries of geographic areas that are related hierarchically — such as blocks,
block groups, tracts, and counties (all defined in Chapter 3 ) — are represented with common,
non-redundant edges. Features that do not conform to the topological rules can be identified
automatically, and corrected by the Census geographers who edit the database. Given that the
MAF/TIGER database covers the entire U.S. and its territories, and includes many millions of
primitives, the ability to identify errors in the database efficiently is crucial.

So how does topology help the Census Bureau assure the accuracy of population data needed for
reapportionment and redistricting? To do so, the Bureau must aggregate counts and characteristics
to various geographic areas, including blocks, tracts, and voting districts. This involves a process
called “address matching” or “address geocoding” in which data collected by household is
assigned a topologically-correct geographic location. The following pages explain how that
works.

3.8 TOPOLOGICAL CONSISTENCY RULES

Topology
Rule description Potential fixes Examples
rule
Must Be Requires that a Delete: The
Larger feature does not Delete fix removes
Than collapse during a polygon features
Cluster validate process. that would
Tolerance This rule is collapse during the
mandatory for a validate process
topology and based on the Any polygon feature, such as the one in red,
applies to all line topology's cluster
and polygon tolerance. This fix that would collapse when validating the topology

103
feature classes. In can be applied to is an error.
instances where one or more Must
this rule is Be Larger Than
violated, the Cluster Tolerance
original geometry errors.
is left unchanged.

Must Not Requires that the Subtract: The


Overlap interior of Subtract fix
polygons not removes the
overlap. The overlapping
polygons can portion of
share edges or geometry from
vertices. This rule each feature that is
is used when an causing the error
area cannot belong and leaves a gap or
to two or more void in its place.
polygons. It is This fix can be
useful for applied to one or
modeling more selected
administrative Must Not Overlap
boundaries, such errors.
as ZIP Codes or Merge: The
voting districts, Merge fix adds the
and mutually portion of overlap
exclusive area from one feature
classifications, and subtracts it
such as land cover from the others
or landform type. that are violating
the rule. You need
to pick the feature
that receives the
portion of overlap
using the Merge
dialog box. This
fix can be applied
to one Must Not
Overlap error
only.
Create
Feature: The
Create Feature fix
creates a new
polygon feature
out of the error
shape and removes
the portion of
overlap from each
of the features,
causing the error

104
to create a planar
representation of
the feature
geometry. This fix
can be applied to
one or more
selected Must Not
Overlap errors.

Must Not This rule requires Create


Have that there are no Feature: The
Gaps voids within a Create Feature fix
single polygon or creates new
between adjacent polygon features
polygons. All using a closed ring
polygons must of the line error
form a continuous shapes that form a
surface. An error gap. This fix can You can use Create Feature to create a new polygon
will always exist be applied to one
on the perimeter of or more selected in the void in the center.
the surface. You Must Not Have You can also use Create Feature or mark the error
can either ignore Gaps errors. If you
this error or mark it select two errors on the outside boundary as an exception.
as an exception. and use the Create
Use this rule on Feature fix, the
data that must result will be one
completely cover polygon feature
an area. For per ring. If you
example, soil want one multipart
polygons cannot feature as a result,
include gaps or you will need to
form voids—they select each new
must cover an feature and click
entire area. Merge from the
Editor menu. Note
that the ring that
forms the outer
bounds of your
feature class will
be in error. Using
the Create Feature
fix for this specific
error can create
overlapping
polygons.
Remember that
you can mark this
error as an
exception.

105
Must Not Requires that the Subtract: The
Overlap interior of Subtract fix
With polygons in one removes the
feature class (or overlapping
subtype) must not portion of each
overlap with the feature that is
interior of causing the error
polygons in and leaves a gap
another feature or void in its
class (or subtype). place. This fix can
Polygons of the be applied to one
two feature classes or more selected
can share edges or Must Not Overlap
vertices or be With errors.
completely Merge: The
disjointed. This Merge fix adds the
rule is used when portion of overlap
an area cannot from one feature
belong to two and subtracts it
separate feature
from the others
classes. It is useful that are violating
for combining two the rule. You need
mutually exclusive to pick the feature
systems of area that receives the
classification, portion of overlap
such as zoning and using the Merge
water body type, dialog box. This
where areas fix can be applied
defined within the to one Must Not
zoning class Overlap With
cannot also be error only.
defined in the
water body class
and vice versa.

Must Be Requires that a Subtract: The


Covered polygon in one Subtract fix
By feature class (or removes the
Feature subtype) must overlapping
Class Of share all of its area portion of each
with polygons in feature that is
another feature causing the error
class (or subtype). so the boundary of
An area in the first each feature from
feature class that is both feature
not covered by classes is the
polygons from the same. This fix can
other feature class be applied to one
is an error. This or more selected
rule is used when Must Be Covered
an area of one By Feature Class
106
type, such as a Of errors.
state, should be
Create
completely Feature: The
covered by areas Create Feature fix
of another type, creates a new
such as counties. polygon feature
out of the portion
of overlap from the
existing
polygon so the
boundary of each
feature from both
feature classes is
the same. This fix
can be applied to
one or more
selected Must Be
Covered By
Feature Class Of
errors.

Must Requires that the Subtract: The


Cover polygons of one Subtract fix
Each feature class (or removes the
Other subtype) must overlapping
share all of their portion of each
area with the feature that is
polygons of causing the error
another feature so the boundary of
class (or subtype). each feature from
Polygons may both feature
share edges or classes is the
vertices. Any area same. This fix can
defined in either be applied to one
feature class that is or more selected
not shared with the Must Cover Each
other is an error. Other errors.
This rule is used
Create
when two systems Feature: The
of
Create Feature fix
classification are creates a new
used for the same
polygon feature
geographic area, out of the portion
and any given
of overlap from the
point defined in existing
one system must
polygon so the
also be defined in boundary of each
the other. One feature from both
such case occurs feature classes is
with nested the same. This fix
hierarchical
107
datasets, such as can be applied to
census blocks and one or more
block groups or selected Must
small watersheds Cover Each Other
and large drainage errors.
basins. The rule
can also be
applied to
nonhierarchically
related polygon
feature classes,
such as soil type
and slope class.

Must Be Requires that Create


Covered polygons of one Feature: The
By feature class (or Create Feature fix
subtype) must be creates a new
contained within polygon feature
polygons of out of the portion
another feature of overlap from the
class (or subtype). existing
Polygons may polygon so the
share edges or boundary of each
vertices. Any area feature from both
defined in the feature classes is
contained feature the same. This fix
class must be can be applied to
covered by an area one or more
in the covering selected Must Be
feature class. This Covered By
rule is used when errors.
area features of a
given type must be
located within
features of another
type. This rule is
useful when
modeling areas
that are subsets of a
larger
surrounding area,
such as
management units
within forests or
blocks within
block groups.

108
Boundary Requires that Create
Must Be boundaries of Feature: The
Covered polygon features Create Feature fix
By must be covered by creates a new line
lines in another feature from the
feature class. This boundary
rule is used when segments of the
area features need polygon feature
to have line generating the
features that mark error. This fix can
the boundaries of be applied to one
the areas. This is or more selected
usually when the Boundary Must Be
areas have one set Covered By
of attributes and errors.
their boundaries
have other
attributes. For
example, parcels
might be stored in
the geodatabase
along with their
boundaries. Each
parcel might be
defined by one or
more line features
that store
information about
their length or the
date surveyed, and
every parcel
should exactly
match its
boundaries.

Area Requires that None


Boundary boundaries of
Must Be polygon features
Covered in one feature class
By (or subtype) be
Boundary covered by
Of boundaries of
polygon features
in another feature
class (or subtype).
This is useful
when polygon
features in one
feature class, such
as subdivisions,
are composed of
109
multiple polygons
in another class,
such as parcels,
and the shared
boundaries must
be aligned.

Contains Requires that a Create


Point polygon in one Feature: The
feature class Create Feature fix
contain at least creates a new point
one point from feature at the
another feature centroid of the
class. Points must polygon feature
be within the that is causing the
polygon, not on error. The point
the boundary. This feature that is
is useful when created is
every polygon guaranteed to be
should have at within the polygon
least one feature. This fix
associated point, can be applied to The top polygon is an error because it does not cont ain a
such as when one or more
parcels must have selected Contains
an address point. Point errors.

Contains Requires that each None


One Point polygon contains
one point feature
and that each point
feature falls within
a single polygon.
This is used when
there must be a
one-to-one
correspondence
between features
of a polygon
feature class and
features of a point
feature class, such The top polygon is an error because it contains mor e th
as administrative when they are outside a polygon.
boundaries and
their capital cities.
Each point must
be properly inside
exactly one
polygon and each
polygon must
properly contain
exactly one point.
110
Points must be
within the
polygon, not on
the boundary.
Polygon rules

Line rules
Topology
Rule description Potential fixes Examples
rule
Must Be Requires that a Delete: The Delete
Larger feature does not fix removes line
Than collapse during a features that would
Cluster validate process. collapse during the
Tolerance This rule is validate process
mandatory for a based on the
topology and topology's cluster Any line feature, such as these
applies to all line tolerance. This fix
and polygon feature can be applied to one lines in red, that would collapse
classes. In instances or more Must Be when validating the topology is an
where this rule is Larger Than Cluster
violated, the Tolerance errors. error.
original geometry is
left unchanged.

Must Not Requires that lines Subtract: The


Overlap not overlap with Subtract fix removes
lines in the same the overlapping line
feature class (or segments from the
subtype). This rule is feature causing the
used where line error. You must
segments should not select the feature
be duplicated, for from which the error
example, in a stream will be removed. If
feature class. Lines you have duplicate
can cross or intersect line features, select
but cannot share the line feature you
segments. want to delete from
the Subtract dialog
box. Note that the
Subtract fix will
create multipart
features, so if the
overlapping segments
are not at the end or
start of a line feature,
you might want to
use the Explode
command on the
Advanced Editing

111
toolbar to create
single-part features.
This fix can be
applied to one
selected Must Not
Overlap error only.

Must Not Requires that line Subtract: The


Intersect features from the Subtract fix removes
same feature class the overlapping line
(or subtype) not segments from the
cross or overlap feature causing the
each other. Lines error. You must
can share endpoints. select the feature
This rule is used for from which the error
contour lines that will be removed. If
should never cross you have duplicate
each other or in line features, select
cases where the the line feature you
intersection of lines want to delete from
should only occur at the Subtract dialog
endpoints, such as box. Note that the
street segments and Subtract fix will
intersections. create multipart
features, so if the
overlapping segments
are not at the end or
start of a line feature,
you might want to
use the Explode
command on the
Advanced Editing
toolbar to create
single-part features.
This fix can be
applied to one Must
Not Intersect error
only.
Split: The Split fix
splits the line features
that cross one another
at their point of
intersection. If two
lines cross at a single
point, applying the
Split fix at that
location will result in
four features.
Attributes from the
original features will
be maintained in the
112
split features. If a split
policy is present, the
attributes will be
updated accordingly.
This fix can be
applied to one or
more Must Not
Intersect errors.

Must Not Requires that line Subtract: The


Intersect features from one Subtract fix removes
With feature class (or the overlapping line
subtype) not cross segments from the
or overlap lines feature causing the
from another feature error. You must
class (or subtype). select the feature
Lines can share from which the error
endpoints. This rule will be removed. If
is used when there you have duplicate
are lines from two line features, select
layers that should the line feature you
never cross each want to delete from
other or in cases the Subtract dialog
where the box. Note that the
intersection of lines Subtract fix will
should only occur at create multipart
endpoints, such as features, so if the
streets and railroads. overlapping segments
are not at the end or
start of a line feature,
you might want to
use the Explode
command on the
Advanced Editing
toolbar to create
single-part features.
This fix can be
applied to one Must
Not Intersect With
error only.
Split: The Split fix
splits the line features
that cross one another
at their point of
intersection. If two
lines cross at a single
point, applying the
Split fix at that
location will result in
four features.
Attributes from the
113
original features will
be maintained in the
split features. If a split
policy is present, the
attributes will be
updated accordingly.
This fix can be
applied to one or
more Must Not
Intersect With errors.

Must Not Requires that a line Extend: The Extend


Have feature must touch fix will extend the
Dangles lines from the same dangling end of line
feature class (or features if they snap
subtype) at both to other line features
endpoints. An within a given
endpoint that is not distance. If no feature
connected to is found within the
another line is called distance specified,
a dangle. This rule the feature will not
is used when line extend to the distance
features must form specified. Also, if
closed loops, such as several errors were
when they are selected, the fix will
defining the simply skip the
boundaries of features that it cannot
polygon features. It extend and attempt to
may also be used in extend the next
cases where lines feature in the list. The
typically connect to errors of features that
other lines, as with could not be extended
streets. In this case, remain on the Error
exceptions can be Inspector dialog box.
used where the rule If the distance value is
is occasionally 0, lines will extend
violated, as with until they find a
cul-de-sac or dead- feature to snap to.
end street segments. This fix can be
applied to one or more
Must Not Have
Dangles errors.
Trim: The Trim fix
will trim dangling
line features if a point
of intersection is
found within a given
distance. If no feature
is found within the
distance specified,
the feature will not be
114
trimmed, nor will it be
deleted if the distance
is greater than the
length of the feature
in error. If the
distance value is 0,
lines will be trimmed
back until they find a
point of intersection.
If no intersection is
located, the feature
will not be trimmed
and the fix will
attempt to trim the
next feature in error.
This fix can be
applied to one or more
Must Not Have
dangles errors.
Snap: The Snap fix
will snap dangling
line features to the
nearest line feature
within a given
distance. If no line
feature is found
within the distance
specified, the line
will not be snapped.
The Snap fix will snap
to the nearest feature
found within the
distance. It searches
for
endpoints to snap to
first, then vertices,
and finally to the
edge of line features
within the feature
class. The Snap fix
can be applied to one
or more Must Not
Have Dangles errors.

115
Must Not Requires that a line Merge To
Have connect to at least Largest: The Merge
Pseudo two other lines at To Largest fix will
Nodes each endpoint. merge the geometry
Lines that connect of the shorter line into
to one other line (or the geometry of the
to themselves) are longest line. The
said to have pseudo attributes of the
nodes. This rule is longest line feature
used where line will be retained. This
features must form fix can be applied to
closed loops, such one or more Must Not
as when they define Have Pseudo Nodes
the boundaries of errors.
polygons or when Merge: The Merge
line features fix adds the geometry
logically must of one line feature
connect to two other into the other line
line features at each feature causing the
end, as with
error. You must pick
segments in a the line feature into
stream network, which to merge. This
with exceptions fix can be applied to
being marked for one selected Must
the originating ends Not Have Pseudo
of first-order Nodes error.
streams.

Must Not Requires that a line Subtract: The


Intersect Or in one feature class Subtract fix removes
Touch (or subtype) must the overlapping line
Interior only touch other segments from the
lines of the same feature causing the
feature class (or error. You must
subtype) at select the feature
endpoints. Any line from which the error
segment in which will be removed. If
features overlap or you have duplicate
any intersection not line features, choose
at an endpoint is an the line feature you
error. This rule is want to delete from
useful where lines the Subtract dialog
must only be box. The Subtract fix
connected at creates multipart
endpoints, such as in features, so if the
the case of lot lines, overlapping segments
which must split are not at the end or
(only connect to the start of a line feature,
endpoints of) back you might want to
lot lines and use the Explode
cannot overlap each command on the
116
other. Advanced Editing
toolbar to create
single-part features.
This fix can be
applied to one
selected Must Not
Intersect Or Touch
Interior error only.
Split: The Split fix
splits the line features
that cross one another
at their point of
intersection. If two
lines cross at a single
point, applying the
Split fix at that
location will result in
four features.
Attributes from the
original features will
be maintained in the
split features. If a
split policy is present,
the attributes will be
updated accordingly.
This fix can be
applied to one or more
Must Not Intersect Or
Touch Interior errors.

Must Not Requires that a line Subtract: The


Intersect Or in one feature class Subtract fix removes
Touch (or subtype) must the overlapping line
Interior only touch other segments from the
With lines of another feature causing the
feature class (or error. You must
subtype) at select the feature
endpoints. Any line from which the error
segment in which will be removed. If
features overlap or you have duplicate
any intersection not line features, choose
at an endpoint is an the line feature you
error. This rule is want to delete from
useful where lines the Subtract dialog
from two layers box. The Subtract fix
must only be creates multipart
connected at features, so if the
endpoints. overlapping segments
are not at the end or
start of a line feature,
117
you might want to use
the Explode
command on the
Advanced Editing
toolbar to create
single-part features.
This fix can be
applied to one
selected Must Not
Intersect Or Touch
Interior With error
only.
Split: The Split fix
splits the line features
that cross one another
at their point of
intersection. If two
lines cross at a single
point, applying the
Split fix at that
location will result in
four features.
Attributes from the
original features will
be maintained in the
split features. If a
split policy is present,
the attributes will be
updated accordingly.
This fix can be
applied to one or more
Must Not Intersect Or
Touch Interior With
errors.

Must Not Requires that a line Subtract: The


Overlap from one feature Subtract fix removes
With class (or subtype) the overlapping line
not overlap with line segments from the
features in another feature causing the
feature class (or error. You must
subtype). This rule is select the feature Where the purple lines overlap is an erro
used when line from which the error
features cannot will be removed. If
share the same you have duplicate
space. For example, line features, choose
roads must not the line feature you
overlap with want to delete from
railroads or the Subtract dialog
depression subtypes box. The Subtract fix
of contour lines creates multipart
118
cannot overlap with features, so if the
other contour lines. overlapping segments
are not at the end or
start of a line feature,
you might want to use
the Explode
command on the
Advanced Editing
toolbar to create
single-part features.
This fix can be
applied to one
selected Must Not
Overlap With error
only.

Must Be Requires that lines None


Covered By from one feature
Feature class (or subtype)
Class Of must be covered by
the lines in another
feature class (or
subtype). This is Where the purple lines do not overl ap is
useful for modeling
logically different
but spatially
coincident lines,
such as routes and
streets. A bus route
feature class must
not depart from the
streets defined in the
street feature class.

Must Be Requires that lines Subtract: The


Covered By be covered by the Subtract fix removes
Boundary boundaries of area line segments that are
Of features. This is not coincident with
useful for modeling the boundary of
lines, such as lot polygon features. If
lines, that must the line feature does
coincide with the not share any
edge of polygon segments in common
features, such as with the boundary of
lots. a polygon feature, the
feature will be
deleted. This fix can
be applied to one or
more Must Be
Covered By
119
Boundary Of errors.

Must Be Requires that a line Delete: The Delete


Inside is contained within fix removes line
the boundary of an features that are not
area feature. This is within polygon
useful for cases features. Note that
where lines may you can use the Edit
partially or totally tool and move the
coincide with area line inside the
boundaries but polygon feature if
cannot extend you do not want to
beyond polygons, delete it. This fix can
such as state be applied to one or
highways that must more Must Be Inside
be inside state errors.
borders and rivers
that must be within
watersheds.

Endpoint Requires that the Create Feature: The


Must Be endpoints of line Create Feature fix
Covered By features must be adds a new point
covered by point feature at the
features in another endpoint of the line
feature class. This is feature that is in error.
useful for modeling The Create Feature The square at the bottom indicates an e
cases where a fitting fix can be applied to covering the endpoint of the line.
must connect two one or more Endpoint
pipes or a street Must Be Covered By
intersection must be errors.
found at the junction
of two streets.

Must Not Requires that line Simplify: The


Self- features not overlap Simplify fix removes
Overlap themselves. They self-overlapping line
can cross or touch segments from the
The individual line feature overlaps itse
themselves but must feature in error.
the coral line.
not have coincident Applying the
segments. This rule Simplify fix can
is useful for result in multipart
features, such as features, which you
streets, where can detect using the
segments might Must Be Single Part
touch in a loop but rule. The Simplify fix
where the same can be applied to one
street should not or more Must Not
follow the same Self-Overlap errors.
course twice.
120
Must Not Requires that line Simplify: The
Self- features not cross or Simplify fix removes
Intersect overlap themselves. self-overlapping line
This rule is useful segments from the
for lines, such as feature in error. Note
contour lines, that that applying the
cannot cross Simplify fix can
themselves. result in multipart
features. You can
detect multipart
features using the
Must Be Single Part
rule. This fix can be
applied to one or
more Must Not Self-
Intersect errors.

Must Be Requires that lines Explode: The


Single Part have only one part. Explode fix creates
This rule is useful single-part line
where line features, features from each
Multipart lines are created
such as highways, part of the multipart
may not have line feature that is in from a single sketch.
multiple parts. error. This fix can be
applied to one or
more Must Be Single
Part errors.
Line rules

Point rules
Topology
Rule description Potential fixes Examples
rule
Must Requires that Snap: The Snap
Coincide points in one fix will move a
With feature class (or point feature in the
subtype) be first feature class
coincident with or subtype to the
points in another nearest point in
feature class (or the second feature
Where a red point is not coincident
subtype). This is class or subtype
with a blue point is an error.
useful for cases that is located
where points must within a given
be covered by distance. If no
other points, such point feature is
as transformers found within the
must coincide with tolerance
power poles in specified, the
electric distribution point will not be
networks and snapped. The Snap
121
observation points fix can be applied
must coincide with to one or more
stations. Must Coincide
With errors.

Must Be Requires that None


Disjoint points be separated
spatially from
other points in the
same feature class
(or subtype). Any
points that overlap
Where a red point and a blue point
are errors. This is
overlap is an error.
useful for ensuring
that points are not
coincident or
duplicated within
the same feature
class, such as in
layers of cities,
parcel lot ID
points, wells, or
streetlamp poles.

Must Be Requires that None


Covered points fall on the
By boundaries of area
Boundary features. This is
Of useful when the
point features help
support the
boundary system,
such as boundary
markers, which
must be found on The square on the right indicates an
the edges of certain error because it is a point that is not on
areas. the boundary of the polygon.

122
Must Be Requires that Delete: The
Properly points fall within Delete fix removes
Inside area features. This point features that
is useful when the are not properly
point features are within polygon
related to features. Note that
polygons, such as you can use the
wells and well pads Edit tool and
or address points move the point
and parcels. inside the polygon
feature if you do
The squares are errors where there are
not want to delete
points that are not inside the polygon.
it. This fix can be
applied to one or
more Must Be
Properly Inside
errors.

Must Be Requires that Delete: The


Covered points in one Delete fix removes
By feature class must point features that
Endpoint be covered by the are not coincident
Of endpoints of lines with the endpoint
in another feature of line features.
class. This rule is Note that you can
The square indicates an error where
similar to the line snap the point to
the point is not on an endpoint of a
rule Endpoint Must the line by setting
line.
Be Covered By edge snapping to
except that, in the line layer, then
cases where the moving the point
rule is violated, it with the Edit tool.
is the point feature This fix can be
that is marked as applied to one or
an error rather than more Must Be
the line. Boundary Covered By
corner markers Endpoint Of
might be errors.
constrained to be
covered by the
endpoints of
boundary lines.

Must Be Requires that None


Covered points in one
By Line feature class be
covered by lines in
another feature
class. It does not
constrain the The squares are points that are not
covering portion of covered by the line.
the line to be an
123
endpoint. This rule
is useful for points
that fall along a set
of lines, such as
highway signs
along highways.

3.9 ATTRIBUTE DATA LINKING


In the GIS Attribute Data Sets window, select New to define a new link. In the resulting Select a
Member window, select MAPS.USAAC. You must next specify the values that are common to
both the attribute and spatial data, because the common values provide the connection between
the spatial data and the attribute data. The spatial database and the MAPS.USAAC data set share
compatible state and county codes, so first select STATE in both the Data Set Vars and
Composites lists, and then select COUNTY in both lists. Select Save to save the link definition
to the Links list. Finally, select Continue to close the GIS Attribute Data Sets window.
After the GIS Attribute Data Sets window closes, the Var window automatically opens for you.
Select which variable in the attribute data provides the theme data for your theme. Select the
CHANGE variable to have the counties colored according to the level of change in the county
population. Select OK to close the Var window.
The counties in the spatial data are colored according to the demographic values in the attribute
data set, as shown in the following display.

Linking the Attribute Data as a Theme

124
3.10 ODBC

ODBC DATABASE DRIVER

Communication between GRASS and ODBC database for attribute management:

GRASS module <-> <--> ODBC Interface <--> RDBMS

PostgreSQL

GRASS DBMI driver unixODBC ODBC driver Oracle

...

Supported SQL commands

All SQL commands supported by ODBC.

Operators available in conditions

All SQL operators supported by ODBC.

EXAMPLE

In this example we copy the dbf file of a SHAPE map into ODBC, then connect GRASS to the
ODBC DBMS. Usually the table will be already present in the DBMS.

Defining the ODBC connection

MS-Windows

On MS-Windows, in order to be able to connect, the ODBC connection needs to be configured


using dedicated tools (tool called "ODBC Data Source Administrator") and give a name to that
connection. This name is then used as database name when accessing from a client via ODBC.

Linux

Configure ODBC driver for selected database (manually or with 'ODBCConfig'). ODBC drivers
are defined in /etc/odbcinst.ini. Here an example:

[PostgreSQL]
Description = ODBC for PostgreSQL
Driver = /usr/lib/libodbcpsql.so
Setup = /usr/lib/libodbcpsqlS.so
FileUsage =1

125
Create DSN (data source name). The DSN is used as database name in db.* modules. Then DSN
must be defined in $HOME/.odbc.ini (for this user only) or in /etc/odbc.ini for (for all users)
[watch out for the database name which appears twice and also for the PostgreSQL protocol
version]. Omit blanks at the beginning of lines:

[grass6test]
Description = PostgreSQL
Driver = PostgreSQL
Trace = No
TraceFile =

Database = grass6test
Servername = localhost
UserName = neteler
Password =
Port = 5432
Protocol = 8.0

ReadOnly = No
RowVersioning = No
ShowSystemTables = No
ShowOidColumn = No
FakeOidIndex = No
ConnSettings =
Configuration of an DSN without GUI is described on http://www.unixodbc.org/odbcinst.html,
but odbc.ini and .odbc.ini may be created by the 'ODBCConfig' tool. You can easily view your
DSN structure by 'DataManager'. Configuration with GUI is described on
http://www.unixodbc.org/doc/UserManual/

To find out about your PostgreSQL protocol, run:

psql -V
Using the ODBC driver
Now create a new database if not yet existing:

db.createdb driver=odbc database=grass6test

To store a table 'mytable.dbf' (here: in current directory) into PostgreSQL through ODBC, run:

db.connect driver=odbc database=grass6test


db.copy from_driver=dbf from_database=./ from_table=mytable \
to_driver=odbc to_database=grass6test to_table=mytable

Next link the map to the attribute table (now the ODBC table is used, not the dbf file):

v.db.connect map=mytable.shp table=mytable key=ID \


database=grass6test driver=odbc
v.db.connect -p

Finally a test: Here we should see the table columns (if the ODBC connection works):

126
db.tables -p
db.columns table=mytable

Now the table name 'mytable' should appear.


Doesn't work? Check with 'isql <databasename>' if the ODBC-PostgreSQL connection is really
established.

Note that you can also connect mySQL, Oracle etc. through ODBC to GRASS.

You can also check the vector map itself concerning a current link to a table:

v.db.connect -p mytable.shp

which should print the database connection through ODBC to the defined RDBMS.

3.11 GPS

GPS or Global Positioning System is a constellation of 27 satellites orbiting the earth at about
12000 miles. These satellites are continuously transmitting a signal and anyone with a GPS
receiver on earth can receive these transmissions at no charge. By measuring the travel time of
signals transmitted from each satellite, a GPS receiver can calculate its distance from the satellite.
Satellite positions are used by receivers as precise reference points to determine the location of
the GPS receiver. If a receiver can receive signals from at least 4 satellites, it can determine
latitude, longitude, altitude and time. If it can receive signals from 3 satellites, it can determine
latitude, longitude and time. The satellites are in orbits such that at any time anywhere on the
planet one should be able to receive signals from at least 4 satellites. The basic GPS service
provides commercial users with an accuracy of 100 meters, 95% of the time anywhere on the
earth. Since May of 2000, this has improved to about 10 to 15 meters due to the removal of
selective availability.

GPS technology offers several advantages: First and foremost, the service is free worldwide and
anyone with a receiver can receive the signals and locate a position. Second, the system supports
unlimited users simultaneously. Third, one of the great advantages of GPS is the fact that it
provides navigation capability.

127
Limitations of GPS
As with any technology, GPS also has some limitations. It is essential that the users are aware
of these limitations.

3.11 CONCEPT BASED GPS MAPPING


GPS consists of a constellation of radio navigation satellite and a ground control segment. It
manages satellite operation and users with specialized receivers who use the satellite data to
satisfy a broad range of positioning requirements.

128
In brief, following are the key features of GPS:-

1 The basis of GPS is „triangulation‟ more precisely trilateration from satellites

2 A GPS receiver measures distance using the travel time of radio signals.

3 To measure travel time GPS needs very accurate timing that is achieved with some
techniques.

4 Along with distance, one needs to know exactly where the satellites are in space.

5 Finally one must correct for any delays, the signal experience as it travels through the
atmosphere.

The whole idea behind GPS is to use satellites in space as reference points for location here
on earth. By very accurately measuring the distances from at least three satellites, we can

„triangulate‟ our position anywhere on the earth by resection method.

GPS Elements

GPS has 3 parts: the space segment, the user segment, and the control segment, Figure-1.2
illustrates the same. The space segment consists of a constellation of 24 satellites, each in its
own orbit, which is 11,000 nautical miles above the Earth. The user segment consists of
receivers, which can be held in hand or mount in the vehicle. The control segment consists o f
ground stations (six of them, located around the world) that make sure the satellites are working
properly. More details on each of these elements can be referred from any standard book or
online literature on GPS.

129
Figure 1.2: GPS segments

130
GPS Satellite Navigation System

GPS is funded and controlled by the U. S. Department of Defense (DOD). While there are many
thousands of civil users of GPS worldwide, the system was designed for and is operated by
the U. S. military. It provides specially coded satellite signals that can be processed in a GPS
receiver, enabling the receiver to compute position, velocity and time. Four GPS satellite signals
are used to compute positions in three dimensions and the time offset in the receiver clock.

GPS Positioning Techniques

GPS positioning techniques may be categorized as being predominantly based on code or carrier
measurements. Code techniques are generally simple and produce low accuracy, while carrier
techniques are more complex and produce higher accuracy. There exist a variety of positioning
methods for both code and carrier measurements. The suitability of each for a specific
application is dependent on the desired accuracy, logistical constraints and costs. Many
variables affect this accuracy, such as the baseline lengths, ionospheric conditions, magnitude
of selective availability, receiver types used, and processing strategies adopted.

Differential GPS (DGPS)

The technique used to augment GPS is known as “differential”. The basic idea is to locate
one or more reference GPS receivers at known locations in users‟ vicinities and calibrate
ranging errors as they occur. These errors are transmitted to the users in near real time. The
errors are highly correlated across tens of kilometers and across many minutes. Use of such
corrections can greatly improve the accuracy and integrity. To increase the accuracy of
positioning, Differential-GPS (D-GPS) was introduced. The idea is as follows: a reference
station is located at a known and accurately surveyed point. The GPS reference station
determines its GPS position using four or more satellites. Given that the position of the GPS
reference station is exactly known, the deviation of the measured position to the actual position
and more importantly the measured pseudo range to each of the individual satellites can be
calculated. The differences are either transmitted immediately by radio or used

131
afterwards for correction after carrying out the measurements. The man made error like

GPS applications in Transportation

Due to the high accuracy, usability, ease and economy of operations in all weather, offered by
GPS, it has found numerous applications in many fields ranging from accurac y level of mm for
the high precision geodesy to several meters for navigational positioning. Some of the
applications in urban and transportation field are: i) establishment of ground control points
for imageries / map registration, ii) determination of a precise geo ID using GPS data, iii) survey
control for topographical and cadastral surveys, iv) air, road, rail, and marine navigation, v)
intelligent traffic management system, vi) vehicle tracking system etc.

132
UNIT IV DATA ANALYSIS 9
Vector Data Analysis tools - Data Analysis tools - Network Analysis - Digital Education models
- 3D data collection and utilisation.

4.1 Vector Data Analysis tools

Analysis is often considered to be "the heart" of GIS. Through analysis new information is gained.
As a GIS stores both attribute and spatial data, analysis can be conducted on both types of data –
however, it is the spatial analysis capability that sets GIS apart from database applications.
There are a great many GIS analyses that can be conducted. For convenience sake we often group
the analyses into categories. For the purpose of this course I have decided to create a bit of a hierarchy.
This is done to reflect your lab experiences using GIS. Our first encounter with GIS analyses is with
vector data. Later we will (hopefully) move into analysis of raster data. The focus of this module
is vector data. The breakout of vector analysis is as follows: Attribute
Attribute Query - Select by Attribute
Arithmetic Calculation
Statistical Summary
Reclassification
Relating Tables
Spatial Join
Spatial
Spatial Query - Select by Location
Spatial Calculation
Spatial Join
Overlay
Buffer
Dissolve
Network Analysis
ATTRIBUTE DATA OPERATIONS
There are many operations that can be conducted on the attribute database (the data tables). These
can be divided into 4 categories: query (or logical), arithmetic, statistical and reclass operations. I
133
included a short note about relating tables as it enhances our analytical capabilities. There is also the
spatial join function, which straddles attribute and spatial analysis.
Queries = Select by Attributes
Queries include both comparison (=, >, <, >=, <=, <>) and Boolean (AND, OR and NOT) operators
(for a simple but effective visual, check-out the Boolean Machine). These operators are used to
perform queries.
Example 1: a simple comparison query,
Forest_Age >= 250 (years)
would query for a subset of forest polygons that could be considered ‘old growth’.
Note the query has 3 parts: field name, operator and value.
Example 2: in Pacific Northwest critical deer winter range has old growth Douglas-fir trees. The
query has two criteria and would look like,
Forest_Age >= 250 AND Fir% >= 50.
This second query operates on two attribute fields and is more specific (restrictive) than the
first – it would yield a smaller subset of polygons as both conditions would have to be met.
Example 3: if deer simply liked old growth and/or Douglas-fir (a fictitious example), then the query
would be
Forest_Age >= 250 OR Fir% >= 50.
This query is more inclusive as a stand can be either old OR composed of Douglas-fir.
Arithmetic
Arithmetic operators perform simple mathematical functions on values in the attribute database;
operators include:
· +
· -
· /
· *
n
· (raised to the power of)
· √
· Sin
· Cos
· Tan
134
These operators can be utilized to to calculate values to be placed in a new field:
convert square metres (m2) to hectares (ha) [e.g. divide by 10,000] - results would be placed in a
new field in the table
convert driving distance to driving time [ e.g. divide by average driving speed] - results would be
placed in a new field in the table
determine total volume (m3) [e.g. multiply area (ha) by inventory volume (m3/ha)] - again results
would be placed in a new field in the table

Statistics
Statistical operations can also be performed on the attribute data. There are 2 options available when
you right-click on a field name: 'statistics' and 'summarize'. 'Statistics' provides a temporary pop-up
table with the typical parameters:
· count
· minimum
· maximum
· sum
· mean
· standard deviation
Plus the data are plotted in a histogram (frequency distribution).
'Summarize' creates a an output data table. Statistics are based on unique values in a chosen field.
selected fields from these operations are placed in a summary table. In the example below, the field
Group was chosen and the statistics count, sum and mean were calculated.

Data Table Summary Table


Group Value Group CountValue SumValue MeanValue
A 100 A 3 700 233
A 300 B 2 150 75
C 50 C 1 50 50
B 80

135
B 70
A 300

Reclassification
Reclassification is another operation that can be conducted on attribute data. Reclassification results
in a generalization (i.e. a simplification) of the original data set. For instance, raw property values
in a data set can be put in 3 classes: lower, middle, and upper class. We typically use the legend editor
(in ArcGIS the Symbology tab of the Layer Properties dialog box) to classify the data
- altering the legend is temporary and we can change the colouring at anytime. (If we want a
"permanent reclassification", e.g. a new map, then we would use Merge / Dissolve - this is described
in section 3.5 below).

Table Relations
Relating tables to each other involves joining or linking records between two tables. This may not
be considered ‘an operation’, but it does allow us to relate outside source data to our themes to allow
the features in our themes to be analyzed based on ‘outside’ data. Refer to the database lecture notes.

Spatial Join
As with relating tables, a spatial join will relate records between two tables. But the records
are not joined based on a common attribute value (usually ID); instead records are joined based on
‘common location’ (as defined by the coordinates of the spatial features). This type of operation is
a combination of spatial and attribute; it is described in more detail in section 3.2 below.

SPATIAL/ GEOMETRIC OPERATIONS


The spatial characteristics of map features (points, lines, polygons) can also be analyzed. Location,
size and shape of the map features, as defined by their coordinates, are basis for these operations.
Spatial operations can be categorized as follows:

Spatial Query - Select by Location


This is where features in one theme are selected based on their spatial relation (connectivity,
136
containment, intersection, or nearness) to features in a second theme (i.e. select forest stands
that contain an eagle’s nest); new data is not created, just a set of features are selected. A few
examples:
intersect – share geographic space (roads that cross creeks)
within a distance of – as the name implies, select features within the “buffer area” (wildlife trees
within 20m of river)
contain – feature has to be within (e.g. select forest stands that contain wildlife trees)

Spatial Calculations
Simple spatial calculations determine areas, perimeters, and distances based on the coordinates
(in ArcView these are accessed through the ‘shape’ field as it contains the vertices); the calculations
utilize the coordinates that define the features, but the results are stored in the database table (so this
operation also straddles both attribute and geometric).

Spatial Join
As previously stated, this operation is a mix of spatial and attribute operations. The end result is a
join of two database tables, but the basis for the join is ‘coincident space’. As with a ‘regular join’
the relation has to be one-to-one or many-to-one between records in the ‘destination-to-source tables’.
As an example we could have two themes: Cities and Countries of the world. A spatial join could
be done for cities as the destination theme, as it yields a many-to-one relation (many cities to one
country). A spatial join would thus bring data from the Countries theme to the Cities theme.
A spatial join could not be done with Countries as the destination table as the relation would be
one-to-many.

137
Overlays – bringing together two themes – the line work of the two themes are combined (lines are
broken, new nodes and links/arcs are recognized, topology is redone, note that sliver polygons may
have to be eliminated) and the fields from both theme databases are combined into one new database.
Three common types of overlays:

Union – is a complete merging of two themes where the new theme is composed of the entire map
area of both themes and all the fields from both theme data tables

Intersect – is a merging of two themes but only where they share space such that the ‘map area’
of the new theme is the area that was in common for both themes and the attribute database is
composed of all the fields from both theme data tables

138
Clip – is akin to pressing a cookie cutter onto a theme such that the new theme is a miniature
version of the first (a mini-me), the map area is defined by the overlay (cookie cutter) theme
and thedatabase comes only from the input (cookie dough) theme.

Update - features from the 'update layer' descend upon the input theme and replace whatever was
underneath. An example would be updating a timber type map (input layer) with a cut block (update
layer) where the cut block shape supersedes the timber types it overlaps.

Erase - the polygons from the 'erase layer' descend upon the input theme and eliminate that area.
An example would be if land were expropriated from a woodlot owner to create a park. The 'park
area' would be erased from the woodlot area.

Buffers
Buffering creates a new theme with new polygon features (geometric objects) based on a constant

139
measure from features in a source theme; buffers around points are circles, around lines are ‘corridors’
(snake-like with rounded ends) and around polygons are ‘donuts’; buffers can be created based on:
a single set width (i.e. all features by 50m)
multiple widths where more than one buffer is created around each feature (i.e. a 50 and a 150m
buffer created around each feature – gives a “bull’s eye” effect)
varying width based on an attribute field (i.e. width for each feature is stored in the data table,
buffer width depends on the value in this field)

other factors include:


dissolve boundaries between overlapping buffers
buffer outside / inside / both

Dissolve
With dissolve, also known as merge polygons, boundaries between adjacent polygons with the same
attribute value (i.e. class = poor) are ‘dissolved’ and the two (or more) polygons are merged into one
larger polygon; a new map layer (theme) results with the generalized data. This is the ‘spatial
equivalent’ to reclassification of attribute data.

140
Network Routing – as the name implies, this type of analysis assesses movement through a network.
Consider the difference between the shortest route and the fastest route. During the middle of the
night the ‘shortest route’ is likely the ‘fastest route’, however, during rush hour I would consider
traffic and use the ‘fastest route’ (which may have a longer distance). The network is modeled using
lines (arcs) and intersections (nodes). Arc-node topology provides information regarding
connectivity. The attribute database would provide additional information regarding impedance to
flow (or movement). Examples would include speed limit and traffic loads at different times of the
day. There is a ‘cost’ to making turns at intersections – i.e. you have to slow down rather than use
just two wheels to make the turn. One-way streets would provide for an absolute barrier. As well,
making a turn off of an overpass onto a highway below would be prohibited. Other routing examples
include most efficient route (for making several stops or deliveries) and location-allocation (where
school catchment areas can be determined based on road network and not just a straight-line distance).
4.2 Data Analysis tools

Data analytics tools can help deliver that value and bring that data to life. A lot of hard work goes
into extractingand transforming data into a usable format, but once that’s done, data analytics can
provide users with greater insights into their customers, business, and industry.

There are three broad categories of data analytics that offer different levels of insight:

 Traditional Business Intelligence (BI) provides traditional, recurring reports.

141
 Self-Service Analytics enable end users to structure their own analyses within the context of
IT-provided data and tools.
 Embedded Analytics provide business intelligence within the confines of a traditional
business application, like an HR system, CRM, or ERP. These analytics provide context-
sensitive decision support within users’ normal workflow.

If you’re not using an analytics tool, you should be. Gartner predicts that by 2019, self-service
analytics and BI users will actually produce more analysis than data scientists. No matter what level
of insight you need, here are 15 of the best data analytics tools to get you started on your journey, in
no particular order.

Data Analytics Tools:

Tableau

Tableau features robust functionality with fast speed to insight. With connectivity to many different
local and cloud-based data sources, Tableau’s intuitive interface combines data sourcing, preparation,
exploration, analysis, and presentation in a streamlined workflow.

Tableau’s flexibility makes it well-suited to the three types of analytics discussed above. Tableau
Server can easily house recurring reports. Power users will appreciate the integrated statistical and
geospatial functionality for advanced self-service. And finally, Tableau uses application integration
technologies like JavaScript APIs and single sign-on functionality to seamlessly embed Tableau
analytics into common business applications.

Looker

Looker strives to provide a unified data environment and centralized data governance with heavy
emphasis on reusable components for data-savvy users. Using an extract/load/transform (ELT)
approach, Looker gives users the ability to model and transform data as they need it.

Looker also features proprietary LookML language, which harnesses SQL in a visual and reusable
way. The reusability concept extends to Looker’s Blocks components, which are reusable utilities
for data connections, analysis, visualization, and distribution. Finally, Looker is designed to easily
142
integrate with popular collaboration and workflow tools such as Jira, Slack, and Segment.

Solver

BI360 offers modern, dynamic reporting with out-of-the-box integrations to many of the world’s most
popular on-premise and cloud-based ERP systems. This easy-to-use report writer offers Excel, web,
and mobile interfaces, and provides finance professionals with powerful financial and operational
reporting capabilities in a variety of layouts and presentation formats.

BI360 also offers integrated budgeting workflow and analytics, including industry-specific templates.
Once you connect data sources to the BI360 Suite, use these templates to access data, collaboratively
develop a budget, and display results on predefined dashboards.

Available for cloud and on-premise deployment.

Dataiku

Dataiku DSS combines much of the data analysis lifecycle into one tool. It enables analysts to source
and prep data, build predictive models, integrate with data mining tools, develop visualizations for
end users and set up ongoing data flows to keep visualizations fresh. DSS’ collaborative environment
enables different users to work together and share knowledge, all within the DSS platform.

With its focus on data science, DSS tends to serve deeply analytical use cases like churn analytics,
demand forecasting, fraud detection, spatial analytics, and lifetime value optimization.

KNIME

An open-source, enterprise class analytics platform, KNIME is designed with the data scientist in
mind. KNIME’s visual interface includes nodes for everything from extracting to presenting data,
with an emphasis on statistical models. KNIME integrates with several other data science tools
including R, Python, Hadoop, and H2O, as well as many structured and unstructured data types.

KNIME supports leading edge, data science use cases such as social media sentiment analysis,

143
medical claim outline detection, market basket analysis, and text mining.

RapidMiner

RapidMiner emphasizes speed to insight for complex data science. Its visual interface includes pre-
built data connectivity, workflow, and machine learning components. With R and Python integration,
RapidMiner automates data prep, model selection, predictive modeling, and what-if gaming. This
platform also accelerates “behind-the-scenes” work with a combined development and collaboration
environment and integration with Hadoop and Spark big data platforms.

Finally, RapidMiner’s unique approach to self-service utilizes machine learning to glean insight from
its 250,000-strong developer community for predictive analytics development. Its context- sensitive
recommendations, automated parameter selection, and tuning accelerate predictive model
deployment.

Pentaho

Pentaho emphasizes IoT data collection and blending with other data sources like ERP and CRM
systems, as well as big data tools like Hadoop and NoSQL. Its built-in integration with IoT end points
and unique metadata injection functionality speeds data collection from multiple sources. Pentaho’s
visualization capabilities range from basic reports to complex predictive models.

Pentaho proactively approaches embedded analytics. In addition to investing in integration


components like REST APIs, Pentaho’s thorough training and project management methodology help
ensure customer success with embedded analytics.

Talend

Talend’s toolset is meant to accelerate data integration projects and speed time to value. An open-
source tool, Talend comes with wizards to connect to big data platforms like Hadoop and Spark. Its
integrated toolset and unique data fabric functionality enable self-service data preparation by business
users. By making data prep easier for users who understand the business context for the data, Talend
removes the IT bottleneck on clean and usable data, which reduces the time to merge data sources.
144
Domo

Domo focuses on speed to insight for less technical users. It features 500+ built-in data connectors
and a visual data prep interface to accelerate data sourcing and transformation. Its robust business
intelligence capabilities enable visualization and social commenting to facilitate collaboration. Domo
also boasts native mobile device support with the same analysis, annotation, and collaboration
experience as desktop.

Domo simplifies remotely embedding analytics using “Cards,” or deployable, interactive


visualization portlets. These components integrate with web applications using JavaScript APIs and
iframes, and can track utilization by unique end point.

Sisense

Sisense offers an end-to-end analytics platform with a strong governance component. It offers a visual
data sourcing and preparation environment, plus alerts that notify users when a given metric falls
outside a configurable threshold. Sisense deploys to on-premises, private-cloud, or Sisense- managed
environments, and enables governance at the user role, object, and data levels.

Sisense’s comprehensive approach to embedded analytics includes integration components like


JavaScript APIs and single sign-on. But it also enables users to customize embedded visualizations,
adding a dimension of self-service to embedded analytics.

Qlik

Qlik emphasizes speed to insight by automating data discovery and relationships between multiple
data sources during data acquisition and preparation. Instead of the traditional query-based approach
to acquiring data, Qlik’s Associative Engine automatically profiles data from all inbound sources,
identifies linkages, and presents this combined data set to the user. Multiple, concurrent users can
quickly explore large and diverse data sets because of Qlik’s in-memory processing architecture,
which includes compressed binary indexing, logical inference, and dynamic calculation.

Qlik supports RESTful APIs as well as HTML5 and JavaScript. This support enables web, business
application, and mobile platform integration for enterprise-wide embedded analytics.
145
Microstrategy

Founded in 1989, Microstrategy is one of the older data analytics platforms, and has the robustness
that one would expect from such a mature toolset. Microstrategy connects to numerous enterprise
assets like ERPs and cloud data vendors, and integrates with multiple common user clients like
Android, iOS, and Windows. It also provides a variety of common services such as alerts, distribution,
and security, and enables many BI functions like data enrichment, visualization, and user
administration.

Microstrategy enhances data governance by using end-point telemetry to manage user access. By
gathering location, access, authentication, timestamp, and authorization data, this functionality can
help analyze utilization and strengthen security practices.

Thoughtspot

Thoughtspot features a search engine-like interface and AI to enable users to take a conversational
approach to data exploration and analytics. Its SpotIQ engine parses search requests such as “revenue
by country for 2014,” and produces a compelling visualization showing a bar chart ordered least to
greatest.

The Thoughtspot platform helps companies quickly deploy this unique approach to analytics with a
visual data sourcing and preparation pane, extensive in-memory processing, back-end cluster
management for big data environments, centralized row-level security, and built-in embeddable
components.

Birst

Birst focuses on solving one of the most vexing challenges in data analytics: establishing trust in data
from many different sources within the enterprise. Birst’s user data tier automatically sources, maps,
and integrates data sources and provides a unified view of the data to users.

Then, using Birst’s Adaptive User Experience, which breaks down the silo between data discovery
and dashboarding, users can access the unified data sources to develop analytics with no coding or IT
intervention. Finally, Birst enables distribution to multiple platforms and other analytics tools
146
like R and Tableau.

Microsoft SQL Server Reporting Services

SQL Server Reporting Services (SSRS) is a business intelligence and reporting tool that tightly
integrates with the Microsoft data management stack, SQL Server Management Services, and SQL
Server Integration Services. This toolset enables a smooth transition from database to business
intelligence environment. SSRS in particular offers a visual authoring environment, basic self- service
analytics, and the ability to output spreadsheet versions of reports and visualizations.

SSRS and the Microsoft data management stack are the workhorses of traditional BI. They are a
mature tool set that performs very well with recurring reports and user-entered parameters.

The well-known vendors above support multiple uses cases across many industries. However, the
volume of data generated by traditional business activity, social media, and IoT technology continues
to explode every year, so data analytics options continue to evolve. With so many options, choosing
a vendor can be daunting.

The key to making an informed choice is to understand the unique analytics needs of your
organization and industry. Knowing where your needs fall on the analytics spectrum will help you
productively engage with vendors – and make the most of the analytics you produce.

4.3 Network Analysis

The Network Analysis tool generates an interactive dashboard of a network, to explore


relationships between the various nodes. The tool provides a visual representation of the network
along with key summary statistics that characterize the network.

The network analysis tool uses the javascript library vis.js to generate the interactive force
diagrams. For more information, go to: http://visjs.org/docs/network/.

This tool uses the R tool. Go to Options > Download Predictive Tools and sign in to the Alteryx
Downloads and Licenses portal to install R and the packages used by the R Tool.

Connect inputs
147
 N anchor: A Nodes data stream that contains a field named _name_ that uniquely identifies each
node in the network.

 Eanchor: An Edges data stream that contains fields named “from” and “to” identifying nodes
that are connected by an edge. Note that the “from” and “to” fields must use the same unique
node identifiers as described in (1) above.

Configure the tool

Nodes

 Shape: Select the shape of nodes.

 Size: Select how to size the nodes.

o Fixed: Enter a fixed size for all nodes.

o By Variable: Select a field to scale the nodes by.

o By Statistic: Select a statistical measure to scale the nodes by. For a description of different
centrality measures, go to: https://en.wikipedia.org/wiki/Centrality.

 Group Nodes:

o Variable: Select a field to group the nodes by.

o Statistic: Select a statistical algorithm to group the nodes by.

Edges

 Directed: Select the check box to indicate if the network is a directed network.

 Opacity: Enter the opacity of the edges when moused over.

Layout

 Specify Layout: Select the check box to indicate if a layout is specified.


148
 Choose Layout Algorithm: Select a layout algorithm. This tool uses the R package, igraph, to
compute the layouts. For more information on the layouts supported, go
to: https://mran.microsoft.com/package/igraph.

Outputs

 D anchor: An Alteryx data stream with network centrality measures for each node.

 I anchor: An interactive dashboard of the network that consists of:

o An interactive force diagram to visualize the network.

o Aggregate statistical measures that characterize the entire network.

o A histogram of node centrality measures.

4.4 Digital Education models

Two Models of Digital Education

Mal Lee and Roger Broadie

From the introduction of the World Wide Web in 1993 the young of the world have experienced
two models of digital education, that outside the school walls and that within. Outside the young
and the digitally connected families of the world employed – unseen – the naturally evolving
laissez faire model. Within the school the young worked within the traditional, highly structured
model.
149
It is time the difference is understood, the global success and benefits of the laissez faire recognised
and lauded, and the serious shortcomings of the highly structured understood and addressed. For
much of the period the two models ran in parallel, with most schools showing little or no interest
in the out of school digital education.

Around 2010 – 2012 the scene began to change when a handful of digitally mature schools began
genuinely collaborating with their families in the 24/7/365 digital education of the children. Those
schools had reached the evolutionary stage where their teaching model and culture closely mirrored
that of the families. They revealed what was possible with collaboration.

That said it took time for that collaboration to take hold more widely and for the most part the
parallel models continue in operation today, with the difference between the in and out of school
teaching growing at pace.

It is surely time for schools and government to question the retention of the parallel modes and to
ask if taxpayers are getting value for the millions upon millions spent solely on schools when the
digitally connected families receive no support. Might it be time to employ a more collaborative
approach where the schools complement and add value to the contribution of the families?

Without going into detail, it bears reflecting on the distinguishing features of the learning
environment and digital education model, of both the digitally connected family and the school,
and asking what is the best way forward,

The learning environments.

 Digitally connected families

That of the families we know well. It has been built around the home’s warmth and support, and
the priority the parents attached to their children having a digital education that would improve
their education and life chances. The focus has always been on the child – the individual learner –
with the children from the outset being provided the current technology by their family and
empowered to use that technology largely unfettered.

150
Importantly the family as a small regulating unit, with direct responsibility for a small number of
children could readily trust each, and monitor, guide and value their learning from birth onwards,
assisting ensure each child had use of the current technology and that the use was wise and
balanced.

The learning occurred within a freewheeling, dynamic, market driven, naturally evolving
environment, anywhere, anytime, just in time and invariably in context. Those interested could
operate at the cutting edge and the depth desired. Very early on the young’s use of the digital was
normalised, with the learning occurring as a natural part of life, totally integrated, with no regard
for boundaries

The time available to the digitally connected family was – and continues to be – at least four/five
times greater than that in the school. It was to many seemingly chaotic, but also naturally evolving.

Very quickly the family learning environment became collaborative, socially networked, global in
its outlook, highly enjoyable and creative where the young believed anything was possible. By the
latter 2000’s most families had created – largely unwittingly – their own increasingly integrated
and sophisticated digital ecosystem, operating in the main on the personal mobile devices that
connected all in the family to all manner of other ecosystems globally.

 Digital learning in the school.

The general feature of the school digital learning environment has been invariably one of unilateral
control, where the ICT experts controlled every facet of the technology and its teaching. They
chose, configured and controlled the use of both the hardware and software, invariably opting for
one device, one operating system and a standard suite of applications.

The students were taught within class groups, using highly structured, sequential, teacher directed,
regularly assessed instructional programs. The school knew best. The clients – the parents and
students – were expected to acquiesce. There was little or no recognition of the out of school
learning or technology or desire to collaborate with the digitally connected families. The teaching
was insular, inward looking, highly site fixated.
151
In reflecting on school’s teaching with the digital between 1993 and 2016 there was an all-
pervasive sense of constancy, continuity, with no real rush to change. There was little sense that
the schools were readying the total student body to thrive within in a rapidly evolving digitally
based world.

Significantly by 2016 only a relatively small proportion of schools globally were operating as
mature digital organisations, growing increasingly integrated, powerful higher order digitally
based ecosystems.

The reality was that while the learning environment of the digitally connected families evolved
naturally at pace that of most schools changed only little, with most schools struggling to
accommodate rapid digital evolution and transformation.

The teaching models

With the advantage of hindsight, it is quite remarkable how hidden the laissez faire model has
remained for twenty plus years, bearing in mind it has been employed globally since the advent of
the WWW.

For years, it was seen simply as a different, largely chaotic approach used by the kids – with the
focus being on the technological breakthroughs and the changing practices rather than on the
underlying model of learning that was being employed.

It wasn’t until the authors identified and documented the lead role of the digitally connected
families of the world did we appreciate all were using basically the same learning approach. The
pre-primary developments of the last few years affirmed the global application of the model. We
saw at play a natural model that was embraced by the diverse families of the world. All were using
the same model – a naturally evolving model where the parents were ‘letting things take their own
course ‘(OED).

The learning was highly individualized, with no controls other than the occasional parent nudge.
That said the learning was simultaneously highly collegial, with the young calling upon and
collaborating with their siblings, family members, peers and social networks when desired.

152
Interestingly from early on the young found themselves often knowing more about the technology
in some areas than their elders – experiencing what Tapscott (1998) termed an ‘inverted authority’
– being able to assist them use the technology. Each child was free to learn how to use, and apply
those aspects of the desired technologies they wanted, and to draw upon any resources or people
if needed.

In the process the children worldwide – from as young as two – directed their own learning, opting
usually for a discovery based approach, where the learning occurred anytime, anywhere 24/7/365.
Most of the learning was just in time, done in context and was current, relevant, highly appealing
and intrinsically motivating. Invariably it was highly integrated, with no thought given to old
boundaries – like was it educational, entertainment, communication, social science or history.

In contrast the school digital teaching model has always been highly structured and focused on
what the school or education authority ‘experts’ believed to be appropriate. Throughout the period
the teaching has been unilaterally controlled, directed by the classroom teacher, with the students
disempowered, distrusted and obliged to do as told.

The teaching built upon linear, sequential instructional programs where the digital education was
invariably treated like all other subjects, shoehorned into an already crowded curriculum and
continually assessed. Some authorities made the ‘subject’ compulsory, others made it optional.

The focus – in keeping with the other ‘subjects’ in the curriculum – was academic. There was little
interest in providing the young the digital understanding for everyday life. The teaching took place
within a cyber walled community, at the time determined by the teaching program. Increasingly
the course taught and assessed became dated and irrelevant.

In considering why the young and the digitally connected families of the world have embraced the
laissez faire model of digital education aside from the young’s innate curiosity and desire to learn
we might do well to examine the model of digital learning we have used over the last twenty plus
years and reflect on how closely it approximates that adopted by the young.Might they be
following that ancient practice of modelling the behaviour of their parents?

153
The way forward.

Near a quarter of a century on since the introduction of the WWW and an era of profound
technological and social change it is surely time for governments and educators globally to

 Publicly recognise the remarkable success of the digitally connected families and the
laissez faire teaching model in the 24/7/365 digital education of both the children and the
wider family
 Understand the digitally connected families are on trend to play an even greater lead role
 Identify how best to support the family’s efforts without damaging the very successful
teaching model employed
 Consider how best to enhance the educational contribution of all the digitally connected
families in the nation, including the educationally disadvantaged
 Rethink the existing, somewhat questionable contribution of most schools and the
concept of schools as the sole provider of digital education for the young
 Examine where scarce taxpayer monies can best be used to improve the digital education
in the networked world.

4.5 3D data collection and utilisation

3D Geographical Information Systems need 3D representations of objects and, hence, 3D data


acquisition and reconstruction methods. Developments in these two areas, however, are not
compatible. While numerous operational sensors for 3D data acquisition are readily available
on the market (optical, laser scanning, radar, thermal, acoustic, etc.), 3D reconstruction
software offers predominantly manual and semi-automatic tools (e.g. Cyclone, Leica
Photogrammetry Suite, PhotoModeler or Sketch-up). The ultimate 3D reconstruction
algorithm is still a challenge and a subject of intensive research. Many 3D reconstruction
approaches have been investigated, and they can be classified into two large groups, optical
imagebased and point cloud-based, with respect to the sensor used, which can be mounted on
different platforms. Optical Image-based sensors produce sets of single or multiple images,
which combined appropriately, can be used to create 3D polyhedronal models. This approach
can deliver accurate, detailed, realistic 3D models, but many components of the process

154
remain manual or semi-manual. It is a technique which has been well-studied and documented
(see Manuals of Photogrammetry, 2004; Henricsson and Baltsavias, 1997; Tao and Hu, 2001).
Active scanning techniques, such as laser and acoustic methods, have been an enormous
success in recent years because they can produce very dense and accurate 3D point clouds.
Applications that need terrain or seabed surfaces regularly make use of the 2.5D grids obtained
from airborne or acoustic points clouds. The integration of direct geo-referencing (using GPS
and inertial systems) into laser scanning technologies has given a further boost to 3D
modelling. Although extraction of height (depth) information is largely automated, complete
3D object reconstruction and textures (for visualisation) are often weak, and the amount of data
to be processed is huge (Maas and Vosselman, 1999; Wang and Schenk, 2000; Rottensteiner
et al 2005). Hybrid approaches overcome the disadvantages mentioned above by using
combinations of optical images, point cloud data and other data sources (e.g. existing maps or
GIS/CAD databases) (Tao, 2006). The combination of images, laser scanning point clouds and
existing GIS maps is considered to be the most successful approach to automatically creating
low resolution, photo-textured models. There are various promising studies and publications
focused on hybrid methods (Schwalbe et al, 2005; Pu and Vosselman, 2006) and even on
operational solutions. These approaches are generally more flexible, robust and successful but
require additional data sources, which may influence the quality of the model. In summary, 3D
data acquisition has become ubiquitous, fast and relatively cheap over the last decade.
However, the automation of 3D reconstruction remains a big challenge. There are various
approaches for 3D reconstruction from a diverse array of data sources, and each of them has
some limitations in producing fully automated, detailed models. However, as the cost of
sensors, platforms and processing hardware decreases, simultaneous and integrated 3D data
collection using multiple sensing technologies should allow for more effective and efficient
3D object reconstruction. Designing integrated sensor platforms, processing and integrating
sensors measurements and developing algorithms for 3D reconstruction are among topics
which should be addressed in the near future. Besides these, I expect several more general
issues to emerge:

1. Levels of Detail (LOD). Presently, a 3D reconstruction algorithm is often created for a


given application (e.g. cadastre, navigation, visualisation, analysis, etc.), responding to

155
specific requirements for detail and realism. Indeed, 3D reconstruction is closely related to the
application that uses the model, but such a chaotic creation of 3D models may become a major
bottleneck for mainstream use of 3D data in the very near future. Early attempts to specify
LOD are already being done by the CityGML team, but this work must be further tested and
refined (Döllner et al, 2006).

2. Standard outputs. Formalising and standardizing the outputs of the reconstruction


processes with respect to formal models and schemas as defined by OGC is becoming
increasingly important. Currently, most of the algorithms for 3D reconstruction result in
proprietary formats and models, both with specific feature definitions, which frequently disturb
import/export and often lead to loss of data (e.g. geometry detail or texture).

3. Integrated 3D data acquisition and 3D modelling, including subsurface objects such as


geologic bodies, seabed, utilities and underground construction. Traditionally, the objects of
interest for modelling in GIS have been visible, natural and man-made, usually above the
surface. As the convergence of applications increases, various domains (e.g. civil engineering,
emergency response, urban planning, cadastre, etc.) will look towards integrated 3D models.
With advances in underground detection technologies (e.g. sonic/acoustic, ground penetration
radar), already developed algorithms can be reapplied to obtain models of underground objects.

4. Change detection. Detection of changes is going to play a crucial role in the maintenance
and update of 3D models. Assuming that automated 3D acquisition mechanisms will be
available, the initial high costs of acquiring multiple data sources can be balanced and justified.
Changes can then be detected against existing data from previous periods or initial design
models (e.g. CAD). In both cases, robust and efficient 3D computational geometry algorithms
must be studied.

5. Monitoring dynamic processes. The focus of 3D reconstruction is still on static objects.


Although most sensors produce 3D data, hardly any dynamic 3D reconstruction is presently
being done. Most dynamic software relies on geovisualisation tools (e.g. flood monitoring;
Jern, 2005) for analysis and decision making.

156
Three-dimensional GIS data incorporates and extra dimension—a z-value—into its definition
(x,y,z). Z-values have units of measurement and allow the storage and display of more information
than traditional 2D GIS data (x,y). Even though z-values are most often real-world elevation
values—such as the height above sealevel or geological depth—there is no rule that enforces this
methodology. Z-values can be used to represent many things, such as chemical concentrations, the
suitability of a location, or even purely representative values for hierarchies.There are two basic
types of 3D GIS data: feature data and surface data.

3D feature data

Feature data represents discrete objects, and the 3D information for each object is stored in the
feature's geometry.Three-dimensional feature data can support potentially many different z- values
for each x,y location. For example, a vertical line has an upper vertex and a lower vertex, each
with the same 2D coordinate, but each having different z-values. Another example of 3D feature
data would be a 3D multipatch building, whose roof, interior floors, and foundation would all
contain different z-values for the same 2D coordinate. Other 3D feature data, such as an aircraft's
3D position or a walking trail up a mountain, would only have a single z-value for each x,y
location.

Surface data

Surface data represents height values over an area, and the 3D information for each location within
that area can be either stored as cell values or deduced from a triangulated network of 3D
faces.Surface data is sometimes referred to as 2.5D data because it supports only a single z-value
for each x,y location. For example, the height above sealevel for the surface of the earth will only
return a single value.

When to model GIS data in 3D?

Since 3D GIS data can be more difficult to create and maintain than 2D data, modeling your data
in three dimensions should only be done when the extra effort will add value to your work. While
some GIS features, such as aircraft locations or underground wells, naturally lend themselves to
being modeled in 3D, other data can be just as effective in 2D as in 3D. For
157
example, having a road network modeled in 3D might seem useful for investigating gradients, but
the additional effort to maintain z-values might outweigh the benefits.

These are some important considerations when deciding to model your data in 3D:
 GIS data does not have to be modeled in 3D to be displayed inside a 3D view.
 Height values from a surface can easily be added to 2D objects, when you need them,
through the use of geoprocessing tools.
 If the source of your z-values is a surface, consider how often that underlying surface
changes. The more it changes, the less useful it will be to store z-values for features
generated against it.

If you decide to model some or all of your data in three dimensions, the most important decision
will be the units of the z-values. A solid understanding of what your z-values represent will be
critical when you start editing and maintaining them. A general rule to follow whenever possible
is that the z-units should match your x,y units. For example, if your data is in a (meter-based) UTM
zone, you should also model your z-values as meters. This will help you interact with the data in
an intuitive way, such as when you measure 3D distances or move objects in x, y, and z.

158
UNIT V APPLICATIONS 9

GIS Applicant - Natural Resource Management - Engineering - Navigation - Vehicle tracking and
fleet management - Marketing and Business applications - Case studies.

5.1 GIS APPLICANT

GIS is convergence of technological fields and traditional disciplines. GIS has been called an
"enabling technology" because of the potential it offers for the wide variety of disciplines dealing
with spatial data. Many related fields of study provide techniques which make up GIS. These
related fields emphasise data collections while GIS brings them together by emphasising
integration, modelling and analysis. Thus GIS often claims to be the science of spatial information.
Fig. 17.1 shows the technical and conceptual development of GIS.. The list of contributing
disciplines can be classified according to (1) Heritage (2) Data Collection (3) Data analysis (4)
Data Reporting .

An important distinction between GIS applications is whether the geographic phenomena studied
are man-made or natural. Clearly, setting up a cadastral information system, or using GIS for urban
planning purposes involves a study of man-made things mostly: the parcels, roads, sidewalks, and
at larger scale, suburbs and transportation routes are all man-made, those entities often have clear-
cut boundaries. On the other hand, geomorphologists, ecologists and soil scientists often have
natural phenomena as their study objects. They may be looking at rock formations, plate tectonics,
distribution of natural vegetation or soil units. Often, these entities do not have clear-cut
boundaries, and there exist transition zones where one vegetation type, for instance, is gradually
replaced by another. (de et al, 2001).

The applications of GIS include mapping locations, quantities and densities, finding distances and
mapping and monitoring change. Function of an Information system is to improve one’s ability to
make decisions. An Information system is a chain of operations starting from planning the
observation and collection of data, to store and analysis of the data, to the use of the derived
information in some decision making process. A GIS is an information system that is designed to
work with data referenced to spatial or geographic coordinates. GIS is both a database system with
specific capabilities for spatially referenced data, as well as a set of operation for working with
data. There are three basic types of GIS applications which might also represent stages of
development of a single GIS applications.

5.2 NATURAL RESOURCE MANAGEMENT

The major application of GIS in natural resource management is in confronting with environmental
issues like a flood, landslide, soil erosions, drought, earthquake etc. GIS in natural resource
management also address the current problems of climate change, habitat loss, population growth,
pollution etc. The solution to these problems is the application of GIS in natural resource
management. Yes, introduction to GIS has solved many problems related to the natural
environment. GIS is a powerful tool that is used in the management of natural resources. Some
applications of GIS in major fields are discussed below:

159
 Hazard and risk assessment
GIS in natural resource management is used in the reduction of a natural hazard such as flood,
landslide, soil erosion, forest fires, earthquake, drought etc. One cannot totally stop these natural
disasters but can minimize these warnings by early planning, preparation, and strategies. GIS in
natural resource management is being used in analyzing, organizing, managing and monitoring the
natural hazards. GIS in natural resource management provides a spatial data of the disasters that
have taken place before or might to occur so that early risk can be prevented. It is indicated through
GIS-based map.

Risk Assessment
 Change detection
GIS in natural resource management provides information about land area change between time
periods. The land change documents detected through satellite imagery or aerial photographs. It is
a useful application in land change, deforestation assessment, urbanization, habitat fragmentation
etc. The information obtained from GIS in natural resource management help to study the specific
area and monitoring can be done in and around the area. It is a way of studying the variations
taking place in landscape and managing the environment.

160
Change detection in land use
 Natural resource inventory
Natural resource inventory is a statistical survey of the condition of natural resources. It provides
relevant information about the environment condition and policy including conservation program
that is obtained through GIS in natural resource management. The information through maps in
GIS provides the information of location and current resources.

Forest Inventory

 Environmental Monitoring

GIS in natural resource management provides a graphical data that helps in monitoring the
environment. It determines the qualitative and quantitative data about the environment issues
161
such as pollution, land degradation, soil erosions etc. GIS in natural resource management detects
these problems and predicts the future hazards. Thus, GIS in natural resource management
monitors all these environment problems.
GIS application in Natural Resource Management:
 GIS helps in the management of land providing resourceful data in doing construction works or
any agricultural works. It selects a suitable site before any change is done.
 GIS in natural resource management is conserving wide range of biodiversity by the pre
information obtained through it. Many biological habitats are protected and further planning for
the protection of flora and fauna is promoted.
 GIS in natural resource management provides hydrological data for the analysis of watershed
management and Watershed analysis.
 GIS in water resource management is now extended in the use of mineral exploration in various
developed countries like USA, Canada and Australia.
Further applications are briefly pointed as below:
 Facility management
 Topographic analysis
 Network analysis
 Transportation modeling
 Engineering design
 Demographic analysis
 Geo process modeling

Therefore, GIS is a suitable technology for the understanding of natural resource management. It
is an effective technique to learn the factors affecting environment including its result and
execution. The geo spatial data taken through this GIS meet the sustainable use of natural
resources. Thus, GIS in natural resource management guides in managing the resources properly
and wisely in present and future generation. In addition, GIS in natural resource management helps
in management of natural resources effectively and efficiently.
5.3 ENGINEERING

An advanced information system like GIS plays a vital role and serves as a complete platform in
every phase of infrastructure life cycle. Advancement and availability of technology has set new
marks for the professionals in the infrastructure development areas. Now more and more
professionals are seeking help of these technologically smart and improved information systems
like GIS for infrastructure development. Each and every phase of infrastructure life-cycle is greatly
affected and enhanced by the enrollment of GIS.

 Planning: In planning its major contribution is to give us with an organized set of data
which can help professionals to combat complex scenarios relating to the selection of site,
environmental impact, study of ecosystem, managing risk regarding the use of natural
resources, sustainability issues, managing traffic congestion, routing of roads and pipelines
etc.

162
 Data Collection: Precise and accurate data is the core driving factor of any successful
project. GIS is equipped with almost all those tools and functions that enables user to have
access to the required data within a reasonable time.
 Analysis: Analysis is one of the major and most influential phases of infrastructure life
cycle. Analysis guides us about the validity or correctness of design or we can say that
analysis is a method which supports our design. Some of the analyses that can be performed
by GIS are:
 Water distribution analysis
 Traffic management analysis
 Soil analysis
 Site feasibility analysis
 Environment impact analysis
 Volume or Area analysis of catchment
 River or canals pattern analysis
 Temperature and humidity analysis
Construction: It is the stage when all layout plans and paper work design come into
existence in the real world. The GIS helps the professionals to understand the site conditions
that affect the schedule baseline and cost baseline. To keep the construction within budget
and schedule GIS guides us about how to utilize our resources on site efficiency by:
 Timely usage of construction equipment.
 Working Hours
 Effects of seasonal fluctuations.
 Optimizing routes for dumpers and concrete trucks
 Earth filling and cutting
 Calculation of volumes and areas of constructed phase thereby helping in Estimation
and Valuation.
Operations: Operations are controlled by modeling of site data and compared by the
baselines prepared in planning phase. Modeling of site may be in the form of raster images
or CAD drawings. These can help us to keep track of timely operations of activities.

GIS can help to make a record of work that has been completed and can give us visualization
in the form of thematic maps which will guide us about rate of operations, completed
operations and pending operations.

In short we can say that GIS will prove to be the foundation of next generation civil
engineering.

5.4 NAVIGATION (ROUTING AND SCHEDULING)

Web-based navigation maps encourage safe navigation in waterway. Ferry paths and shipping
routes are identified for the better routing. ArcGIS supports safe navigation system and provides
accurate topographic and hydrographic data. Recently DNR, s Coastal Resources Division began

163
the task of locating, documenting, and cataloging these no historic wrecks with GIS. This division
is providing public information that make citizens awareness of these vessel locations through web
map. The web map will be regularly updated to keep the boating public informed of these coastal
hazards to minimize risk of collision and injury.
5.5 VEHICLE TRACKING AND FLEET MANAGEMENT
The core of a Fleet Management Tracking system is a GNSS tracking system used in conjunction
with data transmission by means of the selected communications system, for instance
GSM or GPRS.
This combination of GNSS technology with GSM/GPRS wireless coverage, can keep track on the
position of all the resources, such as vehicles, personnel, assets, as well as incidents. This
information is sent to a server and can be visualized using a Geographic Information System (GIS),
where the location, stops, idling and distance covered by each item can be monitored. Many
systems keep the tracking data stored locally or centrally, which can be retrieved for further
analysis.
The GNSS unit is essential to identify the position of the vehicle. The tracking systems usually use
one of the following architectures, which always include a GNSS receiver:

 Passive Tracking: The tracking system stores the vehicles location, through a GNSS receiver,
and other data, such as vehicle condition or container status. This data is stored and can be
collected and analyzed at the end of the trip.

 Active Tracking: The tracking device obtains the vehicle location, through the GNSS
receiver, and sends it through a wireless communication system to a control center on regular
intervals or if certain condition are met.

 Real-time, cellular network: The vehicle's locations and speed are transmitted periodically
over a GSM cellular network. The controller accesses to the information by logging on to the
vendor's website, which requires a monthly fee, or by receiving the data directly on a cell
phone, which requires a cell phone account. The positions of trucks or goods are updated every
few minutes, according with the system specification.

 Real-time, satellite: The vehicle's data is transmitted through satellite to the vendor and the
controller accesses the data by logging on to the vendor's website. This method also requires a
monthly subscription fee.[5]
A fleet management tracking is constituted by the following components:

 On Board Unit (OBU), which includes the GNSS receiver and other types of sensors to collect
the status of the vehicle and the cargo. This device will also have the ability to connect to a
central tracking server. The vehicle's information can include latitude, longitude, altitude,
computed odometer, door open or close, fuel amount, tire pressure, turnoff ignition, turn on
headlight, engine temperature, as well as cargo information and other vehicle's sensors.

164
 Driver Console, most systems include a driver console where the driver can register shift
starts/end, route used, stops, pickups, dropoffs and other labor and business related information
that cannot be acquired automatically. This console can be also used to provide messaging or
warnings to the drivers. Warnings can be issued if the adequate procedures or schedule are not
being followed.

 Central tracking server, which have the capability to receive, store and publish the tracking
data to an user interface, which usually encompasses a Geographic Information System.

Application Characterization
The main benefits of Fleet Management and Vehicle Tracking Applications are:

 Improved operational efficiency of the vehicle fleet - Fleet Management provides businesses
with operational data of the fleet allowing the optimization and planning of the resources,
improving response times, increasing the number of services and using the most suitable
routes.
 Improved customer care - Knowing were each vehicle of the fleet is at a given time allows
companies to be able to provide to its customers accurate information about the location and
expected arrival time of vehicles and/or goods transported in the vehicles.
 Reduction of theft risk - In case of theft, the vehicle is easily locatable which makes it
possible to act immediately in order to recover it.
 Facilitated Fleet Maintenance - Usually fleet management systems provide tools to plan the
vehicle maintenance based on the distances run providing alarms to inspections and
maintenance activities.
 Enforcement of Transport Regulations - The transport of persons or goods normally follow
specific regulations such as forbidden areas (e.g. some areas are not allowed for vehicles
transporting dangerous goods), velocity limits, labor regulations (e.g. maximum number of
consecutive hours a driver can work). Fleet management systems allow companies to
guarantee that these regulations are being followed by their drivers.
Some of the sectors that use fleet management are:

 Public Services Fleets - Fleets providing public services (e.g. waste collection, road
maintenance, taxi fleets, etc) use GNSS for the optimization of routes, planning of services
and determine closest responder.
 Emergency and Assistance Fleets - Emergency and Assistance fleets use GNSS to determine
which is the vehicle most adequate to respond to a assistance request.
 Car Rental Companies - Car rental companies use GNSS to determine closest available
vehicle for a client, to monitor mileage or area limits on rented vehicles and as anti-theft
system.
 Goods Transportation and Distribution - Freight transportation companies use GNSS to
monitor the goods transportation, providing information to customers about their cargo and
determine closest vehicle for unscheduled pickups.
 Sales Force Management - Companies with a mobile sales force can use GNSS to determine
the closest representative in case of unscheduled visits and to monitor their representatives
activity, mileage and work hours.
165
 Hazardous Goods or Valuables Transportation - Hazardous goods or valuables
transportation companies are using GNSS to monitor in realtime the transported goods,
supporting alarms when the vehicle deviates from the scheduled route or violates
transportation regulations. Fleet management systems for these companies normally support
panic button functionalities that will send the position of the vehicle to the central tracking
server in case of emergency or theft.
 Public Transportation - Public transportation operators are using GNSS to track the vehicle
fleet, to eventually reroute vehicles if needed and to provide information to the user.
The use of GNSS for Fleet Management and Vehicle Tracking in certain sectors has been driven
by transport regulations and policies. A specific example of the use of such systems is for
Livestock transportation in Europe which is detailed in the following section.
Tracking of Livestock Transportation
The application of satellite positioning for livestock traceability is becoming a general objective
to support livestock transportation policies. Regulation in Europe, requires an appropriate
navigation system allowing for recording and providing information equivalent to those required
in the journey log and information concerning opening/closing of the loading doors. It also requires
a temperature monitoring and recording system which alerts the driver when the temperature in
the animal compartment reaches the maximum of 30°C or the minimum of 5°C.
In livestock transportation, GNSS will permit to:

 Localize and continuously track and trace the vehicles transporting livestock in order to
increase the efficiency of all activities related with livestock transportation.
 Generate reports about sensors information such as temperature, loading doors information,
warning signals, etc. in order to improve the animal's welfare.
 Optimal route calculation to specify the most suitable roads and hence, to ensure a fast and
safe delivery of the cargo.
 Geofencing and alarming.
 Recording of data for statistical and enforcement/governmental use.

Application Examples
The fleet management devices can can range from more simple devices without any interface with
the user or to devices that have graphical human-machine interfaces and some might have
interfaces with the vehicle's on-board diagnostics or other specific vehicle systems such as
temperature sensors, door opening sensors, etc. Although usually these systems are attached
permanently to the vehicle it is possible (although not usual) to use GNSS cell phones running
specific applications for fleet management. The devices normally used for fleet management are
usually called Vehicle Trackers and are described in more detail here.
These systems can be sold as a product where the on-board device is sold to the costumer and the
management of the fleet can be done by an application or server bundled with it. The complexity
of the management application can vary from a simple application that shows the position of the
vehicles and can generate reports based on the data received from the on-board devices to realtime
servers that can be customized supporting realtime alarms and that can provide complex services
such routing, planning and customized reporting. Alternatively some providers offer

166
these systems as a service where the equipment can be rented and the centralized services are
provided as a service. In some cases even communications costs are handled by the provider and
a monthly fee per vehicle monitored is charged to the costumer.

5.6 MARKETING AND BUSINESS APPLICATIONS

1. Banking: Being market driven banks need to provide customer centric services around planning
of resources and marketing. GIS plays an important role providing planning, organizing and
decision making.
2. Assets Management: GIS helps organizations to locate and store information about their assets.
Operations and maintenance staff can also deploy their enterprise and mobile workforce.
3. Dairy Industry: Geographic Information Systems are used in the distribution of products,
production rate, location of shops and their selling rate. These can also be monitored by using a
GIS system.
4. Tourism: Tourists can get all the information they need on a click, measuring distance, finding
hotels, restaurants and even navigate to their respective links. This Information plays a vital role
to tourists in planning their travel from one place to another.
5. Business: GIS is used for managing business information based on its location. GIS can keep
track of where customers are located, site business, target marketing campaigns, and optimize sales
territories and model retail spending patterns.
6. Market Share: Examining branch locations, competitor locations and demographic
characteristics to identify areas worthy of expansion or determine market share in Maptitude.
7. ATM Machine: Filling in market and service gaps by understanding where customers,
facilities, and competitors are with address locating, database management and query tools.

8. World Bank Economic Statistics: Slicing and dicing raw financial data from the World Bank.
9. Merger and Acquisitions: Profiling and finding opportunities to gain and build where
customers are with market profiling

10. Supply and Demand: Identifying under-served areas and analyzing your competitor's market.
11. Community Reinvestment Act (CRA): Fulfilling the obligations to loan in areas with
particular attention to low- and moderate-income households – using GIS to understand spatial
demographics.

12. Mobile Banking: Capturing locations where existing mobile transaction occur and assisting
in mobile security infrastructure.
13. Internet of Things: Improving efficiency, accuracy and economic benefit through a
network of physical objects such as devices, vehicles, buildings and other items—embedded with
electronics, software, sensors, and network connectivity that enables these objects to collect and
exchange information with one another.
14. Market Share Analysis: Optimizing the locations of facilities so the allocated demand is
maximized in the presence of competitors using tools like location-allocation in ArcGIS.

167
15. Integrated Freight Network Model: Integrating highly detailed information about shipping
costs, transfer costs, traffic volumes and network interconnectivity properties in a GIS-based
platform. (Integrated Freight Network Model)

5.7 CASE STUDIES

1. Agriculture

Background
Identification, description and mapping of Rangeland sites to a scale of 1:250,000; estimation of
the present and potential grazing productivity and load; and presentation of recommendations for
sustainable management for each site.
Solution
RMSI created a comprehensive Remote Sensing and GIS-based database using satellite data,
thematic maps, floral species inventory data and biomass estimation complemented with field
expertise. The study provided ready-to-use maps for managers and decision/ policy makers to
ensure Sustained and Secured Rangelands. The database also provided qualitative & quantitative
information related to the benefits derived from Rangelands in terms of ecosystem services such
as available biomass for livestock and other uses by humans. ClientBenefit
The comprehensive Remote Sensing and GIS database with multiple layers of information served
as a key reference for evaluation, management and monitoring of Rangelands in the arid regions
of the Saudi Arabia. This in particular reflects the Kingdom’s commitment towards Climate
Change resilience by conserving Rangelands and its vegetation.
2. Forestry
Background
Realizing the need to increase the adaptive capacity of farming communities, the World Bank
commissioned RMSI to develop an application suite of a web and a mobile-phone application to
disseminate location-specific climate/ weather information, and related agro-advisories, which are
understandable to the farmers on real-time basis. The agro-weather tool disseminates vital weather-
forecast linked agro-advisories through SMS, IVRS, mobile app, and website for farmers to
better plan and manages weather risks, and maximizes productivity. Solution
RMSI experts developed web and mobile phone (i.e., IVRS, SMS, android app) based agro-
weather tool to disseminate weather forecast information and best-bet agronomic management
practices for the farmers in Ethiopia and Kenya. The tool was developed for the main cultivated
crops in Ada’a district of Ethiopia (chickpea, lentil, teff, and wheat) and in Embu district of Kenya
(bean, maize, sorghum, tea, and coffee) on a pilot basis. ClientBenefit
The key benefit is the availability of location-specific agro-advisories to farmers to minimize crop
losses, and practice climate-smart agriculture.

168
CONTENT BEYOND SYLLABUS
GIS in the Cloud using ArcGIS Online:
Esri cloud ecosystem allows you to access, create, edit, analyze and share maps, apps, and
geospatial data from anywhere in the world. While traditional GIS is installed on your desktop or
server, Cloud GIS makes use of the flexibility of the cloud environment for data capture,
visualization, analysis and sharing.
Cloud GIS
The cloud computing technology has revolutionized the way one works. Although GIS has been a
late adopter of the cloud technology, the many advantages are compelling organizations to shift
their geospatial functions to the cloud. Cloud-based tools are accessed for web-based geographic
information system. Data generated as maps are helping analyze and optimize operations in real-
time. Apps in the cloud are helping manage isolated silos of GIS workflows and geodatabases.
Thus, Cloud GIS could be defined as a next generation on-demand GIS technology that uses a
virtualized platform or infrastructure in a scalable elastic environment.
How does Cloud GIS work?
The cloud computing environment offers three base service models – Software-as-a-Service
(SaaS); Platform-as-a-Service (PaaS); and Infrastructure-as-a-Service (IaaS).

169
Cloud GIS Service Models
In the geospatial environment, the Cloud SaaS supports three other service models
 GIS-as-a-Service (GaaS),
 Applications-as-a-Service (AaaS)
 Imagery-as-a-Service (IaaS), where ready-to-use GIS datasets are available as Data-as-a-
Service (DaaS)
These are accessed as private, public, hybrid or community cloud services, depending upon the
organization’s need for security, collaboration and ownership.

Cloud GIS Deployment Models


Key Benefits of GIS in the Cloud

170
Key Benefits of GIS in the Cloud
Other Benefits of Cloud GIS
 On demand service of online maps, geospatial data, imagery, computing or analysis
 Large volumes of data handling, app management and geospatial analysis possible
 Supports viewing, creating, monitoring, managing, analyzing and sharing maps and data
with other users
 Facilitates inputs, validation and collaboration by a global mobile workforce in real time
 As optimizing with spatio temporal principles is possible, it provides effective geospatial
validations and analysis
 Managed services prevent data and work loss from frequent outages, minimizing
financial risks, while increasing efficiency
 Competitive advantage – shorter time to share and publish maps, with always on always
available data / maps; and effective ROI
 Choice of various deployment, service and business models to best suit organization
goals
 Supports offerings of client-rich GIS software solutions as a software plus service model
– geocoding, mapping, routing, and more
Applications
 Earth observation data
 Citizen and social science
 Road infrastructure projects
 Mobile data collection and integration
171
 Traffic management
 E-commerce and geo-targeted advertising
 Geo-referenced Weather Service
 Crime analysis
 Web mapping
 Research
 Public safety and emergency response
Case Studies
Education and Research in the GIS Cloud, using ArcGIS Online and Mango Map
This application was constructed using the map services and application templates on ArcGIS
Online.

172
Relationship between rates in obesity, diabetes, and percentage of people on restricted sugar diet
(US).
Another user-contributed initiative makes use of the Mango Map interactive web map platform.

173
Deforestation in Cambodia 1976 – 2006
Transport Departments making use of the GIS Cloud – Maryland, Idaho, Utah
The Maryland Department of Transportation has deployed a four level cloud-based model. The
first is a hybrid application (MUTTS) which coordinates and tracks responses of construction work
and excavations. The second model uses a hybrid cloud configuration of interactive mobile
application for travelling truck drivers, highway motorists and cyclists. The third focuses on
integration with the interagency mapping and GIS data portal MD iMap. The fourth deployment
is a private cloud integrating the enterprise GIS of MDTA for access of staff members only. The
whole system seamlessly blends the functionality of Esri, with data and applications stored within
the MDOT’s firewall.
Security and Crime management in the Cloud
The US Department of Homeland Security uses ArcGIS cloud products to access and share critical
data for protection of US citizens – in aviation, border security, emergency response,
cybersecurity, and chemical facility inspection.
The Ogden Police Department operates a real-time crime center for a 24 hour support using
ArcGIS cloud services.
Cloud GIS Products and Vendors
ArcGIS Business Analyst Online – on-demand reports and maps for informed decision.

174
ArcGIS Online. –creation of interactive web maps and apps, shared with anyone, on any device.
Explorer for ArcGIS – access, author, share maps from any device.
ArcGIS Server on Amazon EC2 – deploy ArcGIS Server and use enterprise geodatabases on
Amazon EC2.
Vendors – GIS Cloud, Mango Map, MapBox, Map2Net, MapInfo Stratus, ThunderMaps, GIS
Direct, Spatial Vision, Interroute, Aerometric, and more.

175

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy