Gis 5 Units Notes
Gis 5 Units Notes
Introduction to GIS - Basic spatial concepts - Coordinate Systems - GIS and Information Systems
– Definitions – History of GIS - Components of a GIS – Hardware, Software, Data, People,
Methods – Proprietary and open source Software - Types of data – Spatial, Attribute data- types
of attributes – scales/ levels of measurements.
1
In order to understand Geographic information systems (GIS) basic concepts and its relation
with the environmental management we need to see what GIS actually means; and will try to put
forward some evidence how it can benefits the environmental management.
This would be a series of articles explaining brief history, technological aspects, best practices
and the practical applications of GIS to understand environmental studies.
As we study the history of GIS we came to know that in 1960’s the GIS term was came into
existence but there were fewer people and professionals involved in it. In 1990’s more researchers
were taking GIS as a researching tool but the real boost for GIS was in 2005 when Google
launched Google Maps and Google Earth web applications, this is where everyone came to know
the importance of GIS.
Google maps and Google earth provided people with the solutions of maps but data interpretation
and data preparation was still not included in it, and for high level data analysis and decision
making GIS professionals were required.
GIS is the science of location based services to know which thing is where and why? The process
is to collect data from different sources, displaying it over the maps, later performing spatial
analysis on that data which helps in making decisions and predictions.
There are three major and basic components of Geographic information (GI) technologies
which have changed and revolutionized the concept of handling the locations and spatial data.We
will be discussing these technologies in our upcoming articles, but for now we will go through
the basics concepts of GItechnologies.
Global Positioning System (GPS)
Remote Sensing (RS)
Geographic Information System (GIS)
As we all are familiar with GPS, it’s a system which tells geographical location from the earth’s
surface through satellite. It saves time, money and has more accuracy than any other
methods.Previously companies used to hire expensive surveyors who had to physically visit the
locations to gather the desired information, it was a great hassle in the past and sometimes it was
impossible to gather the accurate and precise information. But with the technological
advancements GPS is accessible to every part of the world.
Remote Sensing (RS) is about collecting and measuring data without having a direct contact with
the objects; use of satellite, aircraft and now drones are used to capture this information of earth’s
surface.It saves time and money from the expensive physical field surveys. For environmental
studies (RS) is more commonly used technology.
2
GIS is a robust set of tools for collecting and retrieving data, transforming it into information
and displaying that information on maps taken from the real world. The integration of GPS, RS
and other data modelling technologies provides information which helps in dealing with the
changes that are integral for environmental protection, surveillance and disaster management.
Geographic information system (GIS) is software that converts data into productive
information by getting data from GPS and RS, and then analyzes the data and displays it as
productive information. It gives an inexpensive way of map production, displaying the
information on the map and makes the analysis easier.
In conclusion GIS is the integration of GPS and RS, and the core concept of GIS applications
development is to make decisions based on the data gained from different sources, converts them
into information so it can fulfils the business, environmental, technological needs.
3
in degrees (or in grads). The following illustration shows the world as a globe with longitude and
latitude values.
In the spherical system, horizontal lines, or east–west lines, are lines of equal latitude, or parallels.
Vertical lines, or north–south lines, are lines of equal longitude, or meridians. These lines
encompass the globe and form a gridded network called a graticule.
The line of latitude midway between the poles is called the equator. It defines the line of zero
latitude. The line of zero longitude is called the prime meridian. For most geographic coordinate
systems, the prime meridian is the longitude that passes through Greenwich, England. Other
countries use longitude lines that pass through Bern, Bogota, and Paris as prime meridians. The
origin of the graticule (0,0) is defined by where the equator and prime meridian intersect. The
globe is then divided into four geographical quadrants that are based on compass bearings from
the origin. North and south are above and below the equator, and west and east are to the left and
right of the prime meridian.
This
illustration shows the parallels and meridians that form a graticule.
Latitude and longitude values are traditionally measured either in decimal degrees or in degrees,
minutes, and seconds (DMS). Latitude values are measured relative to the equator and range from
-90° at the South Pole to +90° at the North Pole. Longitude values are measured relative to the
prime meridian. They range from -180° when traveling west to 180° when traveling east. If the
prime meridian is at Greenwich, then Australia, which is south of the equator and east of
Greenwich, has positive longitude values and negative latitude values.
4
It may be helpful to equate longitude values with X and latitude values with Y. Data defined on
a geographic coordinate system is displayed as if a degree is a linear unit of measure. This method
is basically the same as the Plate Carrée projection.
Learn more about the Plate Carrée projection
Although longitude and latitude can locate exact positions on the surface of the globe, they are
not uniform units of measure. Only along the equator does the distance represented by one
degree of longitude approximate the distance represented by one degree of latitude. This is
because the equator is the only parallel as large as a meridian. (Circles with the same radius as
the spherical earth are called great circles. The equator and all meridians are great circles.)
Above and below the equator, the circles defining the parallels of latitude get gradually smaller
until they become a single point at the North and South Poles where the meridians converge. As
the meridians converge toward the poles, the distance represented by one degree of longitude
decreases to zero. On the Clarke 1866 spheroid, one degree of longitude at the equator equals
111.321 km, while at 60° latitude it is only 55.802 km. Because degrees of latitude and longitude
don't have a standard length, you can’t measure distances or areas accurately or display the data
easily on a flat map or computer screen.
GIS technology is a crucial part of spatial data infrastructure, which the White House defines as
“the technology, policies, standards, human resources, and related activities necessary to acquire,
process, distribute, use, maintain, and preserve spatial data.”
GIS can use any information that includes location. The location can be expressed in many
different ways, such as latitude and longitude, address, or ZIP code.
Many different types of information can be compared and contrasted using GIS. The system can
include data about people, such as population, income, or education level. It can include
information about the landscape, such as the location of streams, different kinds of vegetation,
5
and different kinds of soil. It can include information about the sites of factories, farms, and
schools; or storm drains, roads, and electric power lines.
With GIS technology, people can compare the locations of different things in order to discover
how they relate to each other. For example, using GIS, a single map could include sites that
produce pollution, such as factories, and sites that are sensitive to pollution, such as wetlands and
rivers. Such a map would help people determine where water supplies are most at risk.
Data Capture
Data Formats
GIS applications include both hardware and software systems. These applications may include
cartographic data, photographic data, digital data, or data in spreadsheets.
Cartographic data are already in map form, and may include such information as the location of
rivers, roads, hills, and valleys. Cartographic data may also include survey data, mapping
information which can be directly entered into a GIS.
Digital data can also be entered into GIS. An example of this kind of information is computer
data collected by satellites that show land use—the location of farms, towns, and forests.
Remote sensing provides another tool that can be integrated into a GIS. Remote sensing
includes imagery and other data collected from satellites, balloons, and drones.
Finally, GIS can also include data in table or spreadsheet form, such as population demographics.
Demographics can range from age, income, and ethnicity to recent purchases and Internet
browsing preferences.
GIS technology allows all these different types of information, no matter their source or original
format, to be overlaid on top of one another on a single map. GIS uses location as the key index
variable to relate these seemingly unrelated data.
6
Putting information into GIS is called data capture. Data that are already in digital form, such as
most tables and images taken by satellites, can simply be uploaded into GIS. Maps, however,
must first be scanned, or converted to digital format.
The two major types of GIS file formats are raster and vector. Raster formats are grids of cells or
pixels. Raster formats are useful for storing GIS data that vary, such as elevation or satellite
imagery. Vector formats are polygons that use points (called nodes) and lines. Vector formats are
useful for storing GIS data with firm borders, such as school districts or streets.
Spatial Relationships
GIS technology can be used to display spatial relationships and linear networks. Spatial
relationships may display topography, such as agricultural fields and streams. They may also
display land-use patterns, such as the location of parks and housing complexes.
Linear networks, sometimes called geometric networks, are often represented by roads, rivers,
and public utility grids in a GIS. A line on a map may indicate a road or highway. With GIS
layers, however, that road may indicate the boundary of a school district, public park, or other
demographic or land-use area. Using diverse data capture, the linear network of a river may be
mapped on a GIS to indicate the stream flow of different tributaries.
GIS must make the information from all the various maps and sources align, so they fit together
on the same scale. A scale is the relationship between the distance on a map and the actual
distance on Earth.
Often, GIS must manipulate data because different maps have different projections. A projection
is the method of transferring information from Earth’s curved surface to a flat piece of paper or
computer screen. Different types of projections accomplish this task in different ways, but all
result in some distortion. To transfer a curved, three-dimensional shape onto a flat surface
inevitably requires stretching some parts and squeezing others.
A world map can show either the correct sizes of countries or their correct shapes, but it can’t do
both. GIS takes data from maps that were made using different projections and combines them
so all the information can be displayed using one common projection.
7
GIS Maps
Once all of the desired data have been entered into a GIS system, they can be combined to produce
a wide variety of individual maps, depending on which data layers are included. One of the most
common uses of GIS technology involves comparing natural features with human activity.
For instance, GIS maps can display what manmade features are near certain natural features, such
as which homes and businesses are in areas prone to flooding.
GIS technology also allows to “dig deep” in a specific area with many kinds of information. Maps
of a single city or neighborhood can relate such information as average income, book sales, or
voting patterns. Any GIS data layer can be added or subtracted to the same map.
GIS maps can be used to show information about numbers and density. For example,
GIS can show how many doctors there are in a neighborhood compared with the area’s population.
With GIS technology, researchers can also look at change over time. They can use
satellite data to study topics such as the advance and retreat of ice cover in polar regions, and how
that coverage has changed through time. A police precinct might study changes in crime data to
help determine where to assign officers.
One important use of time-based GIS technology involves creating time-lapse
photography that shows processes occurring over large areas and long periods of time. For
example, data showing the movement of fluid in ocean or air currents help scientists better
understand how moisture and heat energy move around the globe.
GIS technology sometimes allows users to access further information about specific areas
on a map. A person can point to a spot on a digital map to find other information stored in the
GIS about that location. For example, a user might click on a school to find how many students
are enrolled, how many students there are per teacher, or what sports facilities the school has.
GIS systems are often used to produce three-dimensional images. This is useful, for
example, to geologists studying earthquake faults.
GIS technology makes updating maps much easier than updating maps created manually.
Updated data can simply be added to the existing GIS program. A new map can then
8
be printed or displayed on screen. This skips the traditional process of drawing a map, which can
be time-consuming and expensive.
GIS Jobs
People working in many different fields use GIS technology. GIS technology can be used for
scientific investigations, resource management, and development planning.
Many retail businesses use GIS to help them determine where to locate a new store. Marketing
companies use GIS to decide to whom to market those stores and restaurants, and where that
marketing should be.
Scientists use GIS to compare population statistics to resources such as drinking water. Biologists
use GIS to track animal migration patterns.
City, state, or federal officials use GIS to help plan their response in the case of a natural disaster
such as an earthquake or hurricane. GIS maps can show these officials what neighborhoods are
most in danger, where to locate emergency shelters, and what routes people should take to reach
safety.
Engineers use GIS technology to support the design, implementation, and management of
communication networks for the phones we use, as well as the infrastructure necessary for
Internet connectivity. Other engineers may use GIS to develop road networks and transportation
infrastructure.
9
There is no limit to the kind of information that can be analyzed using GIS technology. GIS
(geographic information system)GIS allows multiple layers of information to be displayed on a
single map.
GIS can be used to solve the location based question such as “What is located here” or Where to
find particular features? GIS User can retrieve the value from the map, such as how much is the
forest area on the land use map. This is done using the query builder tool. Next important features
of the GIS is the capability to combine different layers to show new information. For example,
you can combine elevation data, river data, land use data and many more to show information
about the landscape of the area. From map you can tell where is high lands or where is the best
place to build house, which has the river view . GIS helps to find new
information.
10
How GIS Works:
Visualizing Data: The geographic data that is stored in the databases are displayed in the GIS
software. Combining Data: Layers are combined to form a maps of desire. The Query: To search
the value in the layer or making a geographic queries.
Definition by others:
A geographic information system (GIS) lets us visualize, question, analyze, and interpret data to
understand relationships, patterns, and trends. (ESRI)
In the strictest sense, a GIS is a computer system capable of assembling, storing, manipulating,
and displaying geographically referenced information (that is data identified according to their
locations). (USGS)
Advantage of GIS:
Better decision made by government people
Improve decision making with the help of layered information
Citizen engagement due to better system
Help to identify communities that is under risk or lacking infrastructure
Helps in identifying criminology matters
Better management of natural resources
Better communication during emergency situation
Cost savings due to better decision
Finding different kinds of trends within the community
Planning the demographic changes
11
1.6 History of GIS:
Modern GIS has seen series of development. GIS has evolved with the computer system. Here
are the brief events that has happened for the development of the GIS system.
The first application of the concept was in 1832 when Charles Picquet created a map representing
cholera outbreak across 48 districts of Paris. This map was an early version of a heat map, which
would later revolutionize several industries.
Year 1854 – The term GIS that used scientific method to create maps was used by John Snow in
1854. He used points on London residential map to plot outbreak of Cholera.
12
Year 1962 – Dr. Roger Tomlinson created and developed Canadian Geographic Information
System (CGIS) to store, analyze and manipulate data that was collected for the Canada Land
Inventory (CLI). This software had the capacity to overlay, measurement and digitizing
(converting scan hardcopy map to digital data). It is never provided in commercial format but Dr.
Tomlinson is the father of GIS.
Dr. Roger Tomlinson (1933-2014)
Year 1980 – This period saw rise of commercial GIS software’s like M&S Computing,
Environmental Systems Research Institute (ESRI) and Computer Aided Resource Information
System (CARIS). These all software were similar to CGIS with more functionality and user-
friendliness. Among all the above the most popular today is ESRI products like ArcGIS, ArcView
which hold almost 80 % of global market.
1.7.1 Hardware:
Hardware is Computer on which GIS software runs. Nowadays there are a different range of
computer, it might be Desktop or server based. ArcGIS Server is server based computer where
13
GIS software runs on network computer or cloud based. For computer to perform well all
hardware component must have high capacity. Some of the hardware components are:
Motherboard, Hard driver, processor, graphics card, printer and so on. These all component
function together to run a GIS software smoothly.
1. Motherboard: It is board where major hardware parts are installed or It is a place where
all components gets hooked up.
2. Hard Drive: It is also called hard disk, place to store data.
3. Processor: Processor is the major component in computer, it performs calculation. It is
called as Central processing Unit (CPU).
4. RAM: Random Access Memory (RAM) where all running programs load temporarily.
5. Printer: It is output device and used to print image, map or document. There are various
type of printer available in market.
6. External Disk: These are portable storage space such as USB drive, DVD, CD or external
disk.
7. Monitor: It is a screen for displaying output information. Nowadays there are various
type of monitor: CRT (cathode ray tube), LCD (Liquid Crystal Display), LED (Light
Emitting Diodes) and more.
1.7.2 Software:
GIS Software provides tools and functions to input and store spatial data or geographic data. It
provides tool to perform geographic query, run analysis model and display geographic data in the
map form. GIS software uses Relation Database Management System (RDBMS) to store the
geographic data. Software talks with the database to perform geographic query.
Software: Next component is GIS software which provide tools to run and edit spatial
information. It helps to query, edit, run and display GIS data. It uses RDBMS (Relational
Database Management System) to store the data. Few GIS software list: ArcGis, ArcView 3.2,
QGIS, SAGA GIS.
Software Components:
1. GIS Tools: Key tools to support the browsing of the GIS data
2. RDBMS: Relational Database Management System to store GIS data. GIS Software
retrieve from RDBMS or insert data into RDBMS.
3. Query Tools: Tools that work with database management system for querying, insertion,
deletion and other SQL (Standard Query Language).
4. GUI: Graphical User Interface that helps user and Software to interact well.
14
5. Layout: Good layout window to design map.
1.7.3 Data:
The most important and expensive component of the Geographic Information System is Data
which is generally known as fuel for GIS. GIS data is combination of graphic and tabular data.
Graphic can be vector or raster. Both type of data can be created in house using GIS software or
can be purchased. The process of creating the GIS data from the analog data or paper format is
called digitization. Digitization process involves registering of raster image using few GCP
(ground control point) or known coordinates. This process is widely known as rubber sheeting or
georefrencing. Polygon, lines and points are created by digitizing raster image. Raster image itself
can be registered with coordinates which is widely known as rectifying the image. Registered
image are mostly exported in TIFF format. As mentioned above, GIS data can be Raster or Vector.
GIS Data Types:
Raster: Raster image store information in a cell based manner. It can be aerial photo,
satellite image, Digital Elevation Model (DEM). Raster images normally store continuous
data.
Vector: Vector data are discrete. It store information in x, y coordinate format. There are
three types of Vector data: Lines, Points and Area.
1.7.4 People:
People are the user of the GIS system.People use all above three component to run a GIS system.
Today’s computer are fast and user friendly which makes it easy to perform geographic queries,
analysis and displaying maps. Today everybody uses GIS to perform their daily job. People are
user of Geographic Information System. They run the GIS software. Hardware and software have
seen tremendous development which made people easy to run the GIS software. Also computer
are affordable so people are using for GIS task. These task may be creating simple map or
performing advance GIS analysis. The people are main component for the successful GIS.
1.7.5 Methods: For successful GIS operation a well-designed plan and business operation rules
are important. Methods can vary with different organizations. Any organization has documented
their process plan for GIS operation. These document address number question about the GIS
methods: number of GIS expert required, GIS software and hardware, Process to store the data,
15
what type of DBMS (database management system) and more. Well designed plan will address
all these question.
The most well-known example of open source software is the Linux operating system, but there
are open source software products available for every conceivable purpose.
Open source software is distributed under a variety of licensing terms, but almost all have two
things in common: the software can be used without paying a license fee, and anyone can modify
the software to add capabilities not envisaged by its originators.
A standard is a technology specification whose details are made widely available, allowing many
companies to create products that will work interchangeably and be compatible with each other.
Any modern technology product relies on thousands of standards in its design — even the gasoline
you put in your car is blended to meet several highly-detailed specifications that the car’s
designers rely on.
For a standard to be considered an open standard, the specification and rights to implement it
must be freely available to anyone without signing non-disclosure agreements or paying royalties.
The best example of open standards at work is the Internet — virtually all of the technology
specifications it depends on are open, as is the process for defining new ones.
The common theme of “openness” in the above definitions is the ability of diverse parties to
create technology that interoperates. When evaluating your organization’s current and
16
anticipated software needs, consider a solution’s capability to interoperate as an important
criterion. To extend the value of your technology investment, select a software solution that is
based on open standards and APIs that facilitate interoperability and has the capability for direct
integration between various vendors’ products.
Difference Between Open Source and Proprietary Software
There’s no easy way to find out which is the better software development model for your
business, open-source or proprietary.
Open-source has its plate full of developers and programmers who are least intimidated by the
idea of commercializing software, but it poses threat to the commercial software industry who
are most threatened by the notion of open-source software.
The difference between the two is fairly clear because each model has its fair share of pros and
cons. However, weighing down the options between open-source and proprietary to find which
one’s superior is a difficult task.
As with any decision making complexities, you can only be certain about “it depends”. Clearly,
one has a little edge over the other in terms of features and characteristics which definitely set
them apart.
The idea that one totally contradicts the other is not exactly true. This article explains the
difference between the two.
What is Open-Source Software?
It all started with Richard Stallman who developed the GNU project in 1983 which fueled the
free software movement which eventually led to the revolutionary open-source software
movement.
The movement catapulted the notion of open-source collaboration under which developers and
programmers voluntarily agreed to share their source code openly without any restrictions.
The community of people working with the software would allow anyone to study and modify
the open-source code for any purpose they want. The open-source movement broke all the barriers
between the developers/programmers and the software vendors encouraging everyone to open
collaboration. Finally, the label “open-source software” was made official at a strategy session in
Palo Alto, California in 1998 to encourage the worldwide acceptance of this new term which
itself is reminiscent of the academic freedom.
The idea is to release the software under the open licenses category so that anyone could see,
modify, and distribute the source code as deemed necessary.
It’s a certification mark owned by the Open Source Initiative (OSI). The term open source
software refers to the software that is developed and tested through open collaboration meaning
17
anyone with the required academic knowledge can access the source code, modify it, and
distribute his own version of the updated code.
Any software under the open source license is intended to be shared openly among users and
redistributed by others as long as the distribution terms are compliant with the OSI’s open source
definition. Programmers with access to a program’s source code are allowed to manipulate parts
of code by adding or modifying features that would not have worked otherwise.
20
1.9 Types of GIS Data:
A geodatabase is a database that is in some way referenced to locations on the earth. Coupled
with this data is usually data known as attribute data. Attribute data generally defined as additional
information, which can then be tied to spatial data.
What types of GIS Data are there?
GIS data can be separated into two categories: spatially referenced data which is represented by
vector and raster forms (including imagery) and attribute tables which is represented in tabular
format. Within the spatial referenced data group, the GIS data can be further classified into two
different types: vector and raster. Most GIS software applications mainly focus on the usage and
manipulation of vector geodatabases with added components to work with raster-based
geodatabases.
21
Vector data
Vector data is split into three types: polygon, line (or arc) and point data. Polygons are used to
represent areas such as the boundary of a city (on a large scale map), lake, or forest. Polygon
features are two dimensional and therefore can be used to measure the area and perimeter of a
geographic feature. Polygon features are most commonly distinguished using either a thematic
mapping symbology (color schemes), patterns, or in the case of numeric gradation, a color
gradation scheme could be used.
In this view of a polygon based dataset, frequency of fire in an area is depicted showing a graduate
color symbology.
IN THIS VIEW OF A POLYGON BASED DATASET, FREQUENCY OF FIRE IN AN AREA IS DEPICTED SHOWING A GRADUATE
COLOR SYMBOLOGY.
Line (or arc) data is used to represent linear features. Common examples would be rivers, trails,
and streets. Line features only have one dimension and therefore can only be used to measure
length. Line features have a starting and ending point. Common examples would be road
centerlines and hydrology. Symbology most commonly used to distinguish arc features from one
another are line types (solid lines versus dashed lines) and combinations using colors and line
thicknesses. In the example below roads are distinguished from the stream network by designating
the roads as a solid black line and the hydrology a dashed blue line.
22
Point data is most commonly used to represent nonadjacent features and to represent discrete data
points. Points have zero dimensions, therefore you can measure neither length or area with this
dataset. Examples would be schools, points of interest, and in the example below, bridge and
culvert locations. Point features are also used to represent abstract points. For instance, point
locations could represent city locations or place names.
GIS point data showing the location of bridges and culverts.
Both line and point feature data represent polygon data at a much smaller scale. They help reduce
clutter by simplifying data locations. As the features are zoomed in, the point location of a school
is more realistically represented by a series of building footprints showing the physical location
of the campus. Line features of a street centerline file only represent the physical location of the
street. If a higher degree of spatial resolution is needed, a street curbwidth file would be used to
show the width of the road as well as any features such as medians and right- of-ways (or
sidewalks).
Raster Data
23
Raster data (also known as grid data) represents the fourth type of feature: surfaces. Raster data
is cell-based and this data category also includes aerial and satellite imagery. There are two types
of raster data: continuous and discrete. An example of discrete raster data is population density.
Continuous data examples are temperature and elevation measurements. There are also three
types of raster datasets: thematic data, spectral data, and pictures (imagery).
This example of a thematic raster dataset is called a Digital Elevation Model (DEM). Each cell
presents a 30m pixel size with an elevation value assigned to that cell. The area shown is the
Topanga Watershed in California and gives the viewer and understand of the topography of the
region.
Each cell contains one value representing the dominate value of that cell. Raster datasets are
intrinsic to most spatial analysis. Data analysis such as extracting slope and aspect from Digital
Elevation Models occurs with raster datasets. Spatial hydrology modeling such as extracting
watersheds and flow lines also uses a raster-based system. Spectral data presents aerial or
24
satellite imagery which is then often used to derive vegetation geologic information by classifying
the spectral signatures of each type of feature.
Raster data showing vegetation classification. The vegetation data was derived from NDVI
classification of a satellite image.
What results from the effect of converting spatial data location information into a cell based raster
format is called stairstepping. The name derives from the image of exactly that, the square cells
along the borders of different value types look like a staircase viewed from the side.
Unlike vector data, raster data is formed by each cell receiving the value of the feature that
dominates the cell. The stairstepping look comes from the transition of the cells from one value
to another. In the image above the dark green cell represents chamise vegetation. This means that
the dominate feature in that cell area was chamise vegetation. Other features such as developed
land, water or other vegetation types may be present on the ground in that area. As the feature in
the cell becomes more dominantly urban, the cell is attributed the value for developed land, hence
the pink shading.
25
Computer-aided-design (CAD)data, which include spatial information about how objects-such as
building, cars, or aircraft-are constructed. Other important example of computer-aided-design
databases are integrated-circuit and electronic-device layouts.
CAD systems traditionally stored data in memory during editing or other processing, and wrote
the data back to a file at the end of a session of editing. The drawbacks of such a schema include
cost(programming complexity, as well as time cost) of transforming data from one form to anther,
and the need to read in an entire file even if only parts of it are required. For large design of an
entire airplane, it may be impossible to hold the complete design in memory. Designers of object
oriented database were motivated in large part by the database requirements of CAD systems.
Object-oriented database represent components of design as objects, and the connections between
the objects indicate how the design is structure.
Geographic data such as road maps, land-usage maps, topographic elevation maps, political maps
showing boundaries, land-ownership maps, and so on. Geographical information system are
special purpose databases for storing geographical data. Geographical data are differ from design
data in certain ways. Maps and satellite images are typical examples of geographic data. Maps
may provide not only location information associated with locations such as elevations. Soil type,
land type and annual rainfall.
1.10 Spatial vs Attributes data
GIS Data is the key component of a GIS and has two general types: Spatial and Attribute data.
Spatial data are used to provide the visual representation of a geographic space and is stored as
raster and vector types. Hence, this data is a combination of location data and a value data to
render a map, for example.
26
Attribute data is the detailed data used in combination with spatial data to create a GIS. The more
available and appropriate attribute data used with spatial data, the more complete a GIS is as a
management reporting and analysis tool.
Sources of Spatial & Attribute Data
Spatial data can be obtained from satellite images or scanned maps and similar resources.
This data can then be digitised into vector data or maintained as raster graphic data.
Essentially, any format of a geographical image with location or co-ordinate points can be
used as spatial data.
Attribute data can be obtained from a number of sources or data can be captured
specifically for you application. Some popular sources of attribute data are from town
planning and management departments, policing and fire departments, environmental
groups, online media.
What is Attribute Data?
Attribute data are descriptions or measurements of geographic features in a map. It refers to
detailed data that combines with spatial data. Attribute data helps to obtain the meaningful
information of a map. Every feature has characteristics that we can describe. For example,
assume a building. It has a built year, the number of floors, etc. Those are attributes. Attributes
are the facts we know, but not visible such as the built year. It can also represent the absence of
a feature.
Difference Between Attribute Data and Spatial Data
Figure: GIS
27
Usually, a table helps to display attribute data. Each row represents a single feature. In a GIS,
clicking on the row will highlight the corresponding feature on the map.
What is Spatial Data?
Spatial data consists of points, lines, polygons or other geographic and geometric data primitives
that we can map by location. It is possible to maintain spatial data as vector data or raster data.
Each provides information connected to geographical locations. Vector data consist of sequential
points or vertices to define a linear segment. It has an x coordinate and a y coordinate.
Furthermore, raster data consists of a matrix of cells or pixels arranged into rows and columns.
Each cell contains a value representing information.
28
Usage:Attribute data describes the characteristics of a geographical feature while spatial
data describes the absolute and relative location of a geographic feature. Hence, this is
another difference between attribute data and spatial data.
Conclusion:GIS helps to analyze resources such as water, urban areas, roads, coasts,
vegetation, etc. It also allows solving problems related to pollution, forestry, health,
agriculture, health and many other areas. The main difference between Attribute Data
and Spatial Data is that the attribute data describes the characteristics of a geographical
feature while spatial data describes the absolute and relative location of geographic
features.
1.11 Types of attributes:
There are two components to GIS data: spatial information (coordinate and projection information
for spatial features) and attribute data. Attribute data is information appended in tabular format
to spatial features. The spatial data is the where and attribute data can contain information about
the what, where, and why. Attribute data provides characteristics about spatial data.
Types of Attribute Data
Attribute data can be store as one of five different field types in a table or database: character,
integer, floating, date, and BLOB.
1. Character Data
The character property (or string) is for text based values such as the name of a street or
descriptive values such as the condition of a street. Character attribute data is stored as a series
of alphanumeric symbols.
Aside from descriptors, character fields can contain other attribute values such as categories and
ranks. For example, a character field may contain the categories for a street: avenue, boulevard,
lane, or highway. A character field could also contain the rank, which is a relative ordering of
features. For example, a ranking of the traffic load of the street with “1” being the street with the
highest traffic.
Character data can be sorted in ascending (A to Z) and descending (Z to A) order. Since numbers
are considered text in this field, those numbers will be sorted alphabetically which means that a
number sequence of 1, 2, 9, 11, 13, 22 would be sorted in ascending order as 1, 11,
13, 2, 22, 9.
Because character data is not numeric, calculations (sum, average, median, etc.) can’t be
performed on this type of field, even if the value stored in the field are numbers (to do that, the
29
field type would need to be converted to a numeric field). Character fields can be summarized
to produced counts (e.g. the number of features that have been categorized as “avenue”).
2. Numeric Data
Integer and floating are numerical values (see: the difference between floating and integer values).
Within the integer type, the is a further division between short and long integer values. As would
be expected, short integers store numeric values without fractional values for a shorter range than
long integers. Floating point attribute values store numeric values with fractional values.
Therefore, floating point values are for numeric values with decimal points (i.e numbers to the
right of the decimal point as opposed to whole values).
Numeric values will be sorted in sequentially either in ascending (1 to 10) or descending (10 to
1) order.
Numerical value fields can have operations performed such as calculating the sum or average
value. Numerical field values can be a count (e.g. the total number of students at a school) or be
a ratio (e.g. the percentage of students that are girls at a school).
3. Date/Time Data
Date fields contains date and time values.
4. BLOB Data
BLOB stands for binary large object and this attribute type is used for storing information such
images, multimedia, or bits of code in a field. This field stores object linking and embedding
(OLE) which are objects created in other applications such as images and multimedia and linked
from the BLOB field
2. Ordinal Scale. The ordinal scale contains things that you can place in order. For example,
hottest to coldest, lightest to heaviest, richest to poorest. Basically, if you can rank data by 1st,
2nd, 3rd place (and so on), then you have data that’s on an ordinal scale.
Ordinal scale: The ordinal scale classifies according to rank.
3. Interval Scale. An interval scale has ordered numbers with meaningful divisions. Temperature
is on the interval scale: a difference of 10 degrees between 90 and 100 means the same as 10
degrees between 150 and 160. Compare that to high school ranking (which is ordinal), where the
difference between 1st and 2nd might be .01 and between 10th and 11th .5. If you have
meaningful divisions, you have something on the interval scale.
31
Measurement scales
4. Ratio Scale. The ratio scale is exactly the same as the interval scale with one major difference:
zero is meaningful. For example, a height of zero is meaningful (it means you don’t exist).
Compare that to a temperature of zero, which while it exists, it doesn’t mean anything in particular
(although admittedly, in the Celsius scale it’s the freezing point for water).
Weight is measured on the ratio scale.
32
UNIT II SPATIAL DATA MODELS 9
Database Structures – Relational, Object Oriented – ER diagram - spatial data models – Raster
Data Structures – Raster Data Compression - Vector Data Structures - Raster vs Vector Models-
TIN and GRID data models - OGC standards - Data Quality.
33
The database structure imposes certain constraints on the data values, which makes it more
reliable. For example, for the phone number, you cannot enter text, since that wouldn't make
sense.
While this example is quite simple, you can easily imagine what else could be stored in such a
database. For example, you could store the customer's mailing address, billing information,
history of past purchases, etc. For an organization with many thousands of customers, this quickly
becomes a large database. To use a large database effectively, you can use a database management
system (DBMS). A DBMS is specialized software to input, store, retrieve and manage all the
data.
Relational operators:
Retrieval from a relational database involves creating, perhaps temporarily, new relations which
are subsets or combinations of the permanently stored relations. There are several relational
34
algebra operators that can be used to search and manipulate relations in order to perform such
retrievals. Some of these operators are selection, project, union and join. Other standard operators
include product, divide and intersection. From the user's point of view, the operators are not
named as such but are implemented by means of the standard Structured Query Language
(SQL) using a number of commands and key words. For example, the command
SELECT settlement-name, county-name
FROM Settlement
will create a new table which consists only of the settlement name and county fields of the
Settlement table.
The selection (or restrict) operation is concerned with retrieving a subset of the records of a table
on the basis of retrieval criteria expressed in terms of the contents of one or more of the fields in
each record. For example, to retrieve all settlements in the county of Mereshire with a population
greater than 20000, the SQL command would be
SELECT
FROM Settlement
WHERE county-name = Mereshire AND settlement-population > 20000
Note that the WHERE condition consists of a logical expression. This query could have been
combined with a projection operation by specifying field names after the SELECT command.
The join operator is more complicated than projection and selection in that its purpose is to
combine fields from two or more tables. The operator depends on the tables being related to each
other by means of a common field.
Important features of Relational Data Bases
Primary and Foreign keys
Relational joins
Normal forms
The Primary and Foreign Keys
The Relational approach has important implications for the design of data base tables. Since each
table or relation represents a set, it cannot, therefore, have any rows whose entire contents are
duplicated. Secondly, as each row must be different to every other, it follows that a value in a
single column, or a combination of values in multiple columns, can be used to define a primary
key for the table, which allows each row to be uniquely identified. The uniqueness properly allows
the primary key to serve as the sole row level addressing mechanism in the relational data base
model.
35
A field that stores the key of another table is called a foreign key. It is important to realize that
the primary key of a table and any foreign keys that it may store consist of logical data items
which may be attributes such as names or some allocated numerical identifier. They do not consist
of physical addresses in the database. They will, however, be used as the basis of indexing
mechanisms which the database management system uses to provide efficient query processing.
Relational joins
The mechanism for linking data in different tables is called a relational join. Values in a column
or columns in one table are matched to corresponding values in a column or columns in a second
table. Matching is frequently based on a primary key in one table linked to a column in the second,
which is termed a foreign key. An example of the join mechanism is shown below :
36
that is they do not contain repeating groups of data, such as multiple values of a census variable
for different years.
The second requirement of normal form is that every column, which is not part of the primary
key, must be fully dependent on the primary key. The third normal form requires that every non-
primary key column must be non-transitively dependent on the primary key.
Nevertheless, the fundamental working rule for most circumstances ensure that each attribute of
a table represents a fact about the primary key, the whole primary key and nothing, but the primary
key, while this is entirely valid from the design view point, it must also be said that practical
implementation requirements may, on occasion, override theoretical considerations and lead to
tables being merged and denormalized, usually for performance reasons.
Advantages and disadvantages of relational systems
The Advantages can be summarized as follows:
Rigorous design methodology based on sound theoretical foundations
All the other data base structures can be reduced to a set of relational tables, so they are the
most general form of data representation
Ease of use and implementation compared to other types of system
Modifiability, which allows new tables and new rows of data within tables to be added
without difficulty
Flexibility in ad-hoc data retrieval because of the relational joins mechanism and powerful
query language facility.
The Disadvantages are as follows:
A greater requirement for processing resources with increasing numbers of users on a given
system than with the other types of data base.
On heavily loaded systems, queries involving multiple relational joins may give slower
response times than are desirable. This problem can largely be mitigated by effective use of
indexing and other optimization strategies, together with the continued improvements in price
performance in computing hardware from mainframes to PC’s.
The DBMS provides a wide range of ready-made data manipulation tools, so programming effort
can be concentrated on algorithms for spatial analysis and user interface requirements. Though, a
data base approach has several advantages over file system approach, GIS system designers prefer
the latter approach for storage of digital map coordinates. This had led to the development of two
different approaches to implementation, based on either a hybrid or an integrated data model.
37
Object Oriented Databases
A recent trend in both software engineering and in database design is towards the use of object-
oriented techniques. For the purposes of geographical databases these techniques are of great
interest since they hold the promise to overcome significant shortcomings, from the point of view
of GIS, of the widely used relational database methods.
Normal queries to a GIS require spatial data processing operations which such standard query
languages cannot currently handle. Object-oriented techniques provide the tools for building
databases which, unlike relational databases, model complex spatial objects. The database
representations of objects include, in addition to stored data, specialized procedures for spatial
searching and for executing queries which may require geometric and topological data processing.
Objects in an object oriented database are intended to correspond to classes of real- world object
and are implemented by combining data, which describe the object attributes, with the procedures,
or methods, which operate on them.
Accessing an object involves sending a message to it, which results in the addressed object using
its internal methods to respond to the message. A variety of types of message may be sent to an
individual object, depending upon its properties and the methods that it has implemented.
Examples of the types of message that might be sent to a polygon class of object would be to
return its coordinates, to return the result of a measurement, such as area or perimeter calculation,
or to display the polygon on a graphics device.
An individual object is an instantiation, or a particular example, of a class of objects, and as such
it is uniquely identified within the database with an object identifier. An object class may inherit
the properties, data attributes and methods of one or more other object classes. Thus having
defined typical object classes, new ones may be created which are combinations of or subclasses
of existing ones.
2.3 ER diagram
An entity relationship diagram (ERD) shows the relationships of entity sets stored in a database.
An entity in this context is an object, a component of data. An entity set is a collection of similar
entities. These entities can have attributes that define its properties. By defining the entities, their
attributes, and showing the relationships between them, an ER diagram illustrates the logical
structure of databases. ER diagrams are used to sketch out the design of a database.
38
There are two reasons to create a database diagram. You're either designing a new schema or you
need to document your existing structure. If you have an existing database you need to to
document, you create a database diagram using data directly from your database. You can export
your database structure as a CSV file,then have a program generate the ERD automatically. This
will be the most accurate potrait of your database and will require no drawing on your part. Here's
an example of a very basic database structure generated from data.
If you want to create a new plan, you can also edit the generated diagram and collaborate with
your team on what changes to make.
An ER diagram is a means of visualizing how the information a system produces is related. There
are five main components of an ERD:
39
Entities, which are represented by rectangles. An entity is an object or concept about
Attributes, which are represented by ovals. A key attribute is the unique, distinguishing
characteristic of the entity. For example, an employee's social security number might be
the employee's key attribute.
A multivalued attribute can have more than one value. For example,
41
42
ER Diagram Uses
When documenting a system or process, looking at the system in multiple ways increases the
understanding of that system. ERD diagrams are commonly used in conjunction with a data flow
diagram to display the contents of a data store. They help us to visualize how data is connected
in a general way, and are particularly useful for constructing a relational database.
Identify the entities. The first step in making an ERD is to identify all of the entities you
will use. An entity is nothing more than a rectangle with a description of something that
your system stores information about. This could be a customer, a manager, an invoice, a
schedule, etc. Draw a rectangle for each entity you can think of on your page. Keep
them spaced out a bit.
Identify relationships. Look at two entities, are they related? If so draw a solid line
connecting the two entities.
Describe the relationship. How are the entities related? Draw an action diamond between
the two entities on the line you just added. In the diamond write a brief description of how
they are related.
Add attributes. Any key attributes of entities should be added using oval-shaped
symbols.
Complete the diagram. Continue to connect the entities with lines, and adding diamonds
to describe each relationship until all relationships have been described. Each of your
entities may not have any relationships, some may have multiple relationships. That is
okay.
44
• Spatial data structures describe the methods and formats for physical storage and processing of
geographic information in GIS.
Spatial data structures are the core of a GIS and fundamentally affect its performance and
capabilities. Thus an understanding of spatial data structures is important in the study of
geographic data management.
Data structures can be divided into two groups. Vector data structures represent geographic
objects or phenomenon as distinct geometries with specific characteristics and may also include
topology.
Raster data structures represent geographic objects or phenomenon as a grid over which a given
characteristic varies continuously. Generally, raster data structures are suitable for continuously
varying phenomena like temperature, while vector data structures are suitable for the
representation of conceptually distinct objects like land ownership parcels.
45
representing objects as geometries, but also referencing the geometries against cells in a
hierarchy. This approach allows geographically neighbouring geometries to be found more
efficiently.
A raster is an array of cells, where each cell has a value representing a specific portion of an
object or a feature.
A point may be represented by a single cell, a line by a sequence of neighbouring cells and a
polygon by a collection of contiguous cells.
All cells in a raster must be the same size, determining the resolution. The cells can be any size,
but they should be small enough to accomplish the most detailed analysis. A cell can represent a
square kilometer, a square meter, or even a square centimeter.
46
Cells are arranged in rows and columns, an arrangement that produces a Cartesian matrix. The
rows of the matrix are parallel to the x-axis of the Cartesian plane, and the columns to the y- axis.
Each cell has a unique row and column address.
Resolution can affect the data storage size. Storage requirements increase by the square of the
image dimensions.
800 x
545 pixels 325KB Zoom in by 5
400 x
272 pixels 91KB Zoom in by 5
47
200 x
136 pixels 26KB Zoom in by 5
We can distinguish different ways of storing raster data, which basically vary in storage size and
consequently in the geometric organisation of the storage. The following types of geometric
elements are identified:
Lines
Stripes
Tiles
Areas (e.g. Quad-trees)
Hierarchy of resolution
Raster data are managed easily in computers; all commonly used programming languages well
support array handling. However, a raster when stored in a raw state with no compression can be
extremely inefficient in terms of computer storage space. As already said the way of improving
raster space efficiency is data compression.
Illustrations and short texts are used to describe different methods of raster data storage and raster
data compression techniques.
By convention, raster data is normally stored row by row from the top left corner.
48
Example: The Swiss Digital elevation model (DHM25-Matrixmodell in decimeters)
Geographical data tends to be "spatially autocorrelated", meaning that objects which are close to
each other tend to have similar attributes:
"All things are related, but nearby things are more related than distant things" (Tobler 1970)
Because of this principle, we expect neighboring pixels to have similar values. Therefore, instead
of repeating pixel values, we can code the raster as pairs of numbers - (run length, value).
49
The runlength coding is a widely used compression technique for raster data. The primary data
elements are pairs of values or tuples, consisting of a pixel value and a repetition count which
specifies the number of pixels in the run. Data are built by reading successively row by row
through the raster, creating a new tuple every time the pixel value changes or the end of the row
is reached.
We can note in Codes - III, that if a run is not required to break at the end of each line we can
compress data even further.
50
Quandtree coding (lossless)
The quadtree compression technique is the most common compression method applied to raster
data. Quadtree coding stores the information by subdividing a square region into quadrants, each
of which may be further subdivided in squares until the contents of the cells have the same values.
Reading versus
Example1:
Positioncode of 10: 3,2
Example 2:
51
On the following figure we can see how an area is represented on a map and the corresponding
quadtree representation. More information on constructing and addressing quadtrees,
52
LZ77 method (lossless compression)
LZ77 compression is a loss-less compression method, meaning that the values in your raster are
not changed. Abraham Lempel and Jacob Ziv first introduced this compression method in 1977.
The theory behind this compression method is relatively simple: when you find a match (a data
value that has already been seen in the input file) instead of writing the actual value, the position
and length (number of bytes) of the value is written to the output (the length and offset
- where it is and how long it is).
Some image-compression methods often referred to as LZ (Lempel Ziv) and its variants such as
LZW (Lempel Ziv Welch). With this method, a previous analysis of the data is not required. This
makes LZ77 Method applicable to all the raster data types.
The representation of the colors in the image is converted from RGB to YCbCr, consisting
of one luma component (Y), representing brightness, and two chroma components, Cb
and Cr), representing color. This step is sometimes skipped.
The resolution of the chroma data is reduced, usually by a factor of 2. This reflects the
fact that the eye is less sensitive to fine color details than to fine brightness details.
The image is split into blocks of 8×8 pixels, and for each block, each of the Y, Cb, and Cr
data undergoes a discrete cosine transform (DCT). A DCT is similar to a Fourier transform
in the sense that it produces a kind of spatial frequency spectrum.
The amplitudes of the frequency components are quantized. Human vision is much more
sensitive to small variations in color or brightness over large areas than to the strength of
53
high-frequency brightness variations. Therefore, the magnitudes of the high-frequency
components are stored with a lower accuracy than the low-frequency components. The
quality setting of the encoder (for example 50 or 95 on a scale of 0–100 in the Independent
JPEG Group's library) affects to what extent the resolution of each frequency component
is reduced. If an excessively low quality setting is used, the high- frequency components
are discarded altogether.
The resulting data for all 8×8 blocks is further compressed with a loss-less algorithm, a
variant of Huffman encoding.
JPEG-Compression (left to right: decreasing quality setting results in a 8x8 block generation of
pixels)
Vector data structures explicitly store the geometries that represent geographic objects. They are
referred to as vector data structures because they represent geometries using a series of points and
lines (vectors). In vector data structures, geometries may have different dimensions.
• A zero-dimensional geometry is a point. It is usually represented with x and y coordinates.
• A one-dimensional geometry is a line, arc or string of line segments (sometimes called a
polyline) connecting point geometries, sometimes requiring additional information about arc
properties.
• A two-dimensional geometry is a polygon, defined by a sequence of one-dimensional
geometries with the same start and end point. Polygons may be simple or complex. Complex
polygons may be aggregations of more than one polygon and may have holes, possibly with island
geometries inside the holes.
54
• A three-dimensional geometry is a solid, defined by a collection of two-dimensional
geometries
with a z coordinate (usually representing height relative to a reference point). The solid may
also contain holes or be an aggregation of multiple solids.
Figure : Geometries
Geometries may have attributes attached to them. For example, a land ownership parcel may be
represented by a two-dimensional geometry (polygon), with attributes for the name of the parcel
owner and the land-use controls that apply over it.
55
Figure : Topology
The main difference between vector and raster graphics is that raster graphics are composed of
pixels, while vector graphics are composed of paths. A raster graphic, such as a gif or jpeg, is an
array of pixels of various colors, which together form an image.
56
In GIS, vector and raster are two different ways of representing spatial data. However, the
distinction between vector and raster data types is not unique to GIS: here is an example from the
graphic design world which might be clearer.
Raster data is made up of pixels (or cells), and each pixel has an associated value. Simplifying
slightly, a digital photograph is an example of a raster dataset where each pixel value corresponds
to a particular colour. In GIS, the pixel values may represent elevation above sea level, or
chemical concentrations, or rainfall etc. The key point is that all of this data is represented as a
grid of (usually square) cells. The difference between a digital elevation model (DEM) in GIS
and a digital photograph is that the DEM includes additional information describing where the
edges of the image are located in the real world, together with how big each cell is on the ground.
This means that your GIS can position your raster images (DEM, hillshade, slope map etc.)
correctly relative to one another, and this allows you to build up your map.
Vector data consists of individual points, which (for 2D data) are stored as pairs of (x, y) co-
ordinates. The points may be joined in a particular order to create lines, or joined into closed rings
to create polygons, but all vector data fundamentally consists of lists of co-ordinates that define
vertices, together with rules to determine whether and how those vertices are joined.
Note that whereas raster data consists of an array of regularly spaced cells, the points in a vector
dataset need not be regularly spaced.In many cases, both vector and raster representations of the
same data are possible:
57
At this scale, there is very little difference between the vector representation and the "fine" (small
pixel size) raster representation. However, if you zoomed in closely, you'd see the polygon edges
of the fine raster would start to become pixelated, whereas the vector representation would remain
crisp. In the "coarse" raster the pixelation is already clearly visible, even at this scale.
Vector and raster datasets have different strengths and weaknesses, some of which are described
in the thread linked to by @wetland. When performing GIS analysis, it's important to think about
the most appropriate data format for your needs. In particular, careful use of raster algebra can
often produce results much, much faster than the equivalent vector workflow.
58
If we look at the anatomy of a TIN, is composed of points, edges, and faces. A point represents
an input data value that is preserved, and defines an endpoint for a triangle. The edge, is the line
drawn between each point which creates the outline of the triangles. The face is the area, or
surface, inside each of the triangles.
Triangulated irregular networks can be quite large, and look quite complex. However, if you think
about what each point of the triangle represents, in this case, an elevation point, and you know
that faces are the flat face of a triangle, you can decipher some of the features in a TIN without
seeing it colorized. For instance, the very large triangles in the top left corner of this trying to the
irregular network represents a lake behind the dam. The very dense triangles flowing from the
top right of the triangulated irregular network to the bottom right of triangular irregular network
represents a river.
59
When we apply colors to this triangular irregular network based on the elevation value, it becomes
clearer what the triangulated irregular network is representing. It is representing the continuous
spatial phenomenon known as elevation. And in this particular case, shows a lake behind the dam,
the surrounding terrain, and the river leading away from the dam.
A data grid is an architecture or set of services that gives individuals or groups of users the ability
to access, modify and transfer extremely large amounts of geographically distributed data for
research purposes.Data grids make this possible through a host of middleware applications and
services that pull together data and resources from multiple administrative domains and then
present it to users upon request. The data in a data grid can be located at a single site or multiple
sites where each site can be its own administrative domain governed by a set of security
restrictions as to who may access the data. Likewise, multiple replicas of the data may be
distributed throughout the grid outside their original administrative domain and the security
restrictions placed on the original data for who may access it must be equally applied to the
replicas. Specifically developed data grid middleware is what handles the integration between
users and the data they request by controlling access while making it available as efficiently as
possible. The adjacent diagram depicts a high level view of a data grid.
60
Data grids have been designed with multiple topologies in mind to meet the needs of the scientific
community. On the right are four diagrams of various topologies that have been used in data
grids.Each topology has a specific purpose in mind for where it will be best utilized. Each of these
topologies is further explained below.
Federation topology is the choice for institutions that wish to share data from already existing
systems. It allows each institution control over their data. When an institution with proper
authorization requests data from another institution it is up to the institution receiving the request
to determine if the data will go to the requesting institution. The federation can be loosely
integrated between institutions, tightly integrated or a combination of both.
Monadic topology has a central repository that all collected data is fed into. The central
repository then responds to all queries for data. There are no replicas in this topology as compared
to others. Data is only accessed from the central repository which could be by way of a web portal.
One project that uses this data grid topology is the Network for Earthquake Engineering
Simulation (NEES) in the United States. This works well when all access to the data is local or
within a single region with high speed connectivity.
Hierarchical topology lends itself to collaboration where there is a single source for the data and
it needs to be distributed to multiple locations around the world. One such project that will benefit
from this topology would be CERN that runs the Large Hadron Collider that generates enormous
amounts of data. This data is located at one source and needs to be distributed around the world
to organizations that are collaborating in the project.
61
2.10 OGC Standards
What is the OGC?
The Open Geospatial Consortium (OGC) is a not-for-profit organisation focused on developing
and defining open standards for the geospatial community to allow interoperability between
various software, and data services. The OGC provides open standard specifications with the aim
to facilitate and encourage the use of these standards when organisations develop their own
geospatial software, or online geoportals offering data and software services online. The
collection of geoportals and various other compliemntary services, create a Spatial Data
Infrastructure (SDI).
Open Geospatial Consortium Standards: Introduction
The Open Geospatial Consortium (OGC) was founded in 1994 to make geographic information
an integral part of the world’s information infrastructure. OGC members – technology providers
and technology users – collaboratively develop open interface standards and associated encoding
standards, and also best practices, that enable developers to create information systems that can
easily exchange “geospatial” information and instructions with other information systems.
Requirements range from complex scheduling and control of Earth observation satellites to
displaying simple map images on the Web and encoding location in just a few bytes for geo-
tagging and messaging. A look at the OGC Domain Working Groups shows the wide scope of
current activity in the OGC.
62
The OGC Baseline and OGC Reference Model
The OGC Standards Baseline consists of the OGC standards (for interfaces, encodings, profiles,
application schemas, and best practice documents. The OGC Reference Model (ORM) (describes
these standards and the relationships between them and related ISO standards. The ORM provides
an overview of OGC standards and serves as a useful resource for defining architectures for
specific applications.
In developing a Web services application using OGC standards (and in learning about the
relationships between OGC standards) it helps to think of publish, find and bind as the key
functions for applications in a Web services environment.
Publish: Resource providers advertise their resources.
Find: End users and their applications can discover resources that they need at run-time.
Bind: End users and their applications can access and exercise resources at run-time.
63
Most of the OGC standards developed in recent years are standards for the Web services
environment, and these standards are collectively referred to as OGC Web Services (OWS). The
figure below provides a general architectural schema for OGC Web Services. This schema
identifies the generic classes of services that participate in various geoprocessing and location
activities.
Acronyms in the figure are defined below. Some of these are “OGC standards” and others are
publicly available “Discussion Papers”, “Requests” and “Recommendation Papers”. (Note that
some in work candidate standards are not yet public, but are accessible by OGC members.)
Catalogue Service for the Web (CSW)
Filter Encoding (FE)
Geography Markup Language (GML)
KML Encoding Standard (KML)
Sensor Model Language (SensorML)
Style Layer Descriptor (SLD)
Sensor Observation Service (SOS)
Web Coverage Service (WCS)
Web Feature Service (WFS)
Web Map Service (WMS)
Web Processing Service (WPS)
Sensor Planning Service (SPS)
Web Terrain Service (WTS)
Grid Coverage Service
Coordinate Transformation Service
Web Coverage Processing Service (WCPS)
Web Map Tile Service (WMTS)
Simple Features (SF)
Sensor Web Enablement (SWE)
XML for Image and Map Annotation (XIMA)
CityGML
GeosciML
GML in JPEG 2000
Observations and Measurements (O&M)
Symbology Encoding
Transducer Markup Language (TML)
64
2.11 Data Quality
Data quality is a perception or an assessment of data's fitness to serve its purpose in a given
context. The quality of data is determined by factors such as accuracy, completeness, reliability,
relevance and how up to date it is. As data has become more intricately linked with the operations
of organizations, the emphasis on data quality has gained greater attention.
Why data quality is important
Poor-quality data is often pegged as the source of inaccurate reporting and ill-conceived strategies
in a variety of companies, and some have attempted to quantify the damage done. Economic
damage due to data quality problems can range from added miscellaneous expenses when
packages are shipped to wrong addresses, all the way to steep regulatory compliance fines for
improper financial reporting.
An oft-cited estimate originating from IBM suggests the yearly cost of data quality issues in the
U.S. during 2016 alone was about $3.1 trillion. Lack of trust by business managers in data quality
is commonly cited among chief impediments to decision-making.
The demon of poor data quality was particularly common in the early days of corporate
computing, when most data was entered manually. Even as more automation took hold, data
quality issues rose in prominence. For a number of years, the image of deficient data quality was
represented in stories of meetings at which department heads sorted through differing spreadsheet
numbers that ostensibly described the same activity.
Determining data quality
Aspects, or dimensions, important to data quality include: accuracy, or correctness; completeness,
which determines if data is missing or unusable; conformity, or adherence to a standard format;
consistency, or lack of conflict with other data values; and duplication, or repeated records.
As a first step toward data quality, organizations typically perform data asset inventories in which
the relative value, uniqueness and validity of data can undergo baseline studies. Established
baseline ratings for known good data sets are then used for comparison against data in the
organization going forward.
Methodologies for such data quality projects include the Data Quality Assessment Framework
(DQAF), which was created by the International Monetary Fund (IMF) to provide a common
method for assessing data quality. The DQAF provides guidelines for measuring data dimensions
that include timeliness, in which actual times of data delivery are compared to anticipated data
delivery schedules.
65
UNIT III DATA INPUT AND TOPOLOGY 9
Scanner - Raster Data Input – Raster Data File Formats – Vector Data Input –Digitiser – Topology
- Adjacency, connectivity and containment – Topological Consistency rules – Attribute Data
linking – ODBC – GPS - Concept GPS based mapping.
3.1 SCANNER:
Scanner are used to convert from analog maps or photographs to digital image data in raster
format. Digital image data are usually integer-based with one byte gray scale (256 gray tones
from 0 to 255) for black and white image and a set of three gray scales of red (R), green (G) and
blue(B) for color image.
The following four types of scanner are commonly used in GIS and remote sensing.
a. MechanicalScanner
It is called drum scanner since a map or an image placed on a drum is digitized
mechanically with rotation of the drum and shift of the sensor as shown in Figure 3.4(a).
It is accurate but slow.
b. VideoCamera
Video camera with CRT (cathode ray tube) is often used to digitize a small part of map of
firm. This is not very accurate but cheap. (see Figure 3.4(b))
66
c. CCDCamera
Area CCD camera (called digital still camera) instead of video camera will be also convenient to
acquire digital image data (see Figure 3.4 (c)). It is more stable and accurate than video camera.
d. CCDScanner
Flat bed type or roll feed type scanner with linear CCD (charge coupled device) is now commonly
used to digitize analog maps in raster format, either in mono-tone or color mode. It
67
is accurate but expensive.
In its simplest form, a raster consists of a matrix of cells (or pixels) organized into rows and
columns (or a grid) where each cell contains a value representing information, such as
temperature. Rasters are digital aerial photographs, imagery from satellites, digital pictures, or
even scanned maps.
68
Data stored in a raster format represents real-world phenomena:
Thematic data (also known as discrete) represents features such as land-use or soils data.
Continuous data represents phenomena such as temperature, elevation, or spectral data
such as satellite images and aerial photographs.
Pictures include scanned maps or drawings and building photographs.
Thematic and continuous rasters may be displayed as data layers along with other geographic data
on your map but are often used as the source data for spatial analysis with the ArcGIS Spatial
Analyst extension. Picture rasters are often used as attributes in tables—they can be displayed
with your geographic data and are used to convey additional information about map features.
While the structure of raster data is simple, it is exceptionally useful for a wide range of
applications. Within a GIS, the uses of raster data fall under four main categories:
Rasters as basemaps
A common use of raster data in a GIS is as a background display for other feature layers.
For example, orthophotographs displayed underneath other layers provide the map user
with confidence that map layers are spatially aligned and represent real objects, as well as
additional information. Three main sources of raster basemaps are orthophotos from aerial
photography, satellite imagery, and scanned maps. Below is a raster used as a
basemap for road data.
69
Rasters as surface maps
Rasters are well suited for representing data that changes continuously across a landscape
(surface). They provide an effective method of storing the continuity as a surface. They
also provide a regularly spaced representation of surfaces. Elevation values measured
from the earth's surface are the most common application of surface maps, but other
values, such as rainfall, temperature, concentration, and population density, can also
define surfaces that can be spatially analyzed. The raster below displays elevation—using
green to show lower elevation and red, pink, and white cells to show higher elevations.
Rasters representing thematic data can be derived from analyzing other data. A common
analysis application is classifying a satellite image by land-cover categories. Basically,
this activity groups the values of multispectral data into classes (such as vegetation type)
and assigns a categorical value. Thematic maps can also result from geoprocessing
operations that combine data from various sources, such as vector, raster, and terrain data.
For example, you can process data through a geoprocessing model to create a
70
raster dataset that maps suitability for a specific activity. Below is an example of a
classified raster dataset showing land use.
Sometimes you don't have the choice of storing your data as a raster; for example, imagery is only
available as a raster. However, there are many other features (such as points) and measurements
(such as rainfall) that could be stored as either a raster or a feature (vector) data type.
71
The advantages of storing your data as a raster are as follows:
A simple data structure—A matrix of cells with values representing a coordinate and
sometimes linked to an attribute table
A powerful format for advanced spatial and statistical analysis
The ability to represent continuous surfaces and perform surface analysis
The ability to uniformly store points, lines, polygons, and surfaces
The ability to perform fast overlays with complex datasets
There are other considerations for storing your data as a raster that may convince you to use a
vector-based storage option. For example:
There can be spatial inaccuracies due to the limits imposed by the raster dataset cell
dimensions.
Raster datasets are potentially very large. Resolution increases as the size of the cell
decreases; however, normally cost also increases in both disk space and processing
speeds. For a given area, changing cells to one-half the current size requires as much as
four times the storage space, depending on the type of data and storage techniques used.
There is also a loss of precision that accompanies restructuring data to a regularly spaced
raster-cell boundary.
In raster datasets, each cell (which is also known as a pixel) has a value. The cell values represent
the phenomenon portrayed by the raster dataset such as a category, magnitude, height, or spectral
value. The category could be a land-use class such as grassland, forest, or road. A magnitude
might represent gravity, noise pollution, or percent rainfall. Height (distance) could represent
surface elevation above mean sea level, which can be used to derive slope, aspect, and watershed
properties. Spectral values are used in satellite imagery and aerial photography to represent light
reflectance and color.
72
Cell values can be either positive or negative, integer, or floating point. Integer values are best
used to represent categorical (discrete) data and floating-point values to represent continuous
surfaces. For additional information on discrete and continuous data, see Discrete and
continuous data. Cells can also have a NoData value to represent the absence of data. For
information on NoData, see NoData in raster datasets.
Rasters are stored as an ordered list of cell values, for example, 80, 74, 62, 45, 45, 34, and so on.
The area (or surface) represented by each cell consists of the same width and height and is an
equal portion of the entire surface represented by the raster. For example, a raster representing
elevation (that is, digital elevation model) may cover an area of 100 square kilometers. If there
were 100 cells in this raster, each cell would represent 1 square kilometer of equal width and
height (that is, 1 km x 1 km).
73
The dimension of the cells can be as large or as small as needed to represent the surface conveyed
by the raster dataset and the features within the surface, such as a square kilometer, square foot,
or even square centimeter. The cell size determines how coarse or fine the patterns or features in
the raster will appear. The smaller the cell size, the smoother or more detailed the raster will be.
However, the greater the number of cells, the longer it will take to process, and it will increase
the demand for storage space. If a cell size is too large, information may be lost or subtle patterns
may be obscured. For example, if the cell size is larger than the width of a road, the road may not
exist within the raster dataset. In the diagram below, you can see how this simple polygon feature
will be represented by a raster dataset at various cell sizes.
The location of each cell is defined by the row or column where it is located within the raster
matrix. Essentially, the matrix is represented by a Cartesian coordinate system, in which the rows
of the matrix are parallel to the x-axis and the columns to the y-axis of the Cartesian plane. Row
and column values begin with 0. In the example below, if the raster is in a Universal Transverse
Mercator (UTM) projected coordinate system and has a cell size of 100, the cell location at 5,1
would be 300,500 East, 5,900,600 North.
74
Learn about transforming the raster dataset
Often you need to specify the extent of a raster. The extent is defined by the top, bottom, left, and
right coordinates of the rectangular area covered by a raster, as shown below.
The geodatabase is the native data model in ArcGIS for storing geographic information, including
raster datasets, mosaic datasets, and raster catalogs. However, there are many file formats you
can work with that are maintained outside a geodatabase. The following table gives a description
of the supported raster formats (raster datasets) and their extensions and identifies if they are read-
only or if they can also be written by ArcGIS.
75
Supported data Supports
Format Description Extensions Read / Write
types multiband
Laboratory (JPL). .dat. For
ArcGIS supports the example:
polarimetric AIRSAR mission_l.dat
data (POLSAR). (L-Band) and
mission_c.dat
(C-Band).
ARC Digitized Distributed on CD-
Raster ROM by the National
Graphics Geospatial-
(ADRG) Intelligence Agency
ADRG Legend (NGA). ADRG is Multiple files
geographically
referenced using the Data file—
equal arc-second extension Yes
raster chart/map *.img or
8-bit unsigned
(ARC) system in *.ovr Read-only (Always
integer
which the globe is three
ADRG divided into 18 Legend bands)
Overview latitudinal bands, or file—
zones. The data extension
consists of raster *.lgg
images and other
graphics generated by
scanning source
documents.
The ArcGIS Desktop Read-only 16-bit signed
No
Advanced ASCII integer
Single file—
Grid format is (Write—
ASCII Grid extension
an ArcGIS Desktop requires ArcGIS 32-bit floating
*.asc No
AdvancedGrid Spatial Analyst point
exchange file. extension)
Multiple files 1-,and 4-bit
Yes
This format provides a unsigned integer
method for reading Data file— 8-, and 16-bit
and displaying extension signed and Yes
decompressed, BIL, *.bil, *.bip, unsigned integer
Band
BIP, and BSQ image or *.bsq
interleaved by
data. By creating an
line (BIL),
ASCII description file Header file—
band
that describes the extension
interleaved by Read and write
layout of the image *.hdr
pixel (BIP),
data, black-and-white,
band 32-bit signed and
grayscale, pseudo Color map Yes
sequential unsigned integer
color, and multiband file—
(BSQ)
image data can be extension
displayed without *.clr
translation into a
proprietary format. Statistics
file—
76
Supported data Supports
Format Description Extensions Read / Write
types multiband
extension
*.stx
The Bathymetric
Yes
Bathymetric Attributed Grid is a Single file—
32-bit floating
Attributed Grid nonproprietary file extension Read-only
point (Always
(BAG) format for storing *.bag
two bands)
bathymetric data.
The Binary Terrain 16-, and 32-bit
No
format was created by signed integer
Single file—
the Virtual Terrain
extension
Project (VTP) to store
*.bt
elevation data in a Read-only
Binary Terrain
more flexible file (Write—
(BT) Projection 32-bit floating
format. The BT format developer only) No
file— point
is flexible in terms of
extension
file size and spatial
*.prj
reference
system.
BMP files are
Bitmap (BMP),
Windows bitmap Single file—
device- Yes
images. They are extension
independent
usually used to store *.bmp
bitmap (DIB) 8-bit unsigned (Limited
pictures or clip art that Read and write
format, or integer to one or
can be moved between World file—
Microsoft three
different applications extension
Windows bands)
on *.bpw
bitmap
Windows platforms.
This is a compressed
Multiple
raster format used in
files—
the distribution of 8-bit unsigned
BSB extensions Read-only Yes
raster nautical charts integer
*.bsb, *.cap,
by MapTech
and *.kap
and NOAA .
Optimized for writing
and reading large files
in a distributed
Directory—
processing and
extension
storage environment. 8-, 16-, and 32-bit
*.crf
In a CRF file, large unsigned/signed
Cloud raster
rasters are broken Read and write integer, 32-bit Yes
format (CRF) Bundle
down into smaller floating point, 64-
files—
bundles of tiles which bit complex
extension
allows multiple
*.bundle
processes to write
simultaneously to a
single raster.
Cloud A COG is a regular Single file— 8-, 16-, and 32-bit
Read and write Yes
Optimized GeoTIFF that has possible file unsigned/signed
77
Supported data Supports
Format Description Extensions Read / Write
types multiband
GeoTIFF been optimized for extensions integer, 32-bit
(COG) being hosted and *.tif, *.tiff, floating point, 64-
worked with on a and *.tff bit complex
HTTP file server.
Optimization depends
on the ability of a
COG to store and
organize raw pixel
data in addition to
utilizing HTTP GET
requests such that
only selected portions
of imagery is obtained
at a time.
Committee on
This format reads
Earth
CEOS SAR image
Observing
files—specifically
Sensors 8-bit unsigned
those radar images .raw Read-only Yes
(CEOS) integer
provided from
Synthetic
Radarsat and ERS
Aperture Radar
data products.
(SAR)
Distributed by the
NGA.
CADRG/ECRG is
geographically
referenced using the
ARC system in which
the globe is divided
into 18 latitudinal
Compressed bands, or zones. The File
ARC Digitized data consists of raster extension is
8-bit unsigned
Raster images and other based on Read-only No
integer
Graphics graphics generated by specific
(CADRG) scanning source product.
documents. CADRG
achieves a nominal
compression ratio of
55:1. ECRG uses
JPEG 2000
compression using a
compression ratio of
20:1
Panchromatic
File
(grayscale) images
Controlled extension is
that have been 8-bit unsigned
Image Base based on Read-only No
georeferenced and integer
(CIB) specific
corrected for
product.
distortion due to
78
Supported data Supports
Format Description Extensions Read / Write
types multiband
topographic relief
distributed by NGA.
Thus, they are similar
to digital orthophoto
quads and have
similar applications,
such as serving as a
base or backdrop for
other data or as a
simple map.
Multiple files
Main raster
image—
extension
*.img
Transmission
header file—
extension
*.thf
The DIMAP format is 8-, and 16-bit
Yes
an open format in the unsigned integer
public domain;
Digital Image however, its primary
Directory Read-only
Map purpose was for the 16-bit signed
Yes
distribution of data integer
from the SPOT
satellite. The format
79
Supported data Supports
Format Description Extensions Read / Write
types multiband
is composed of a
GeoTIFF file and a
metadata file.
Single file—
various file
extensions
A simple, regularly
*.dt0, *.dt1, Read
Digital Terrain spaced grid of
*.dt2. All
Elevation Data elevation points based 16-bit signed
possible file Write using the No
(DTED) Level on 1 degree latitude integer
extensions Raster to DTED
0, 1, and 2 and longitude extents.
are available tool
Created by NGA.
by default
(*.dt0, *.dt1,
*.dt2).
Earth
Yes
Resources This format is from
Single file—
Laboratory the ELAS remote 8-bit unsigned
extension Read-only (Always
Applications sensing system used integer
*.elas three
Software within NASA.
bands)
(ELAS)
Distributed by the
NGA.
CADRG/ECRG is
geographically
referenced using the
ARC system in which
the globe is divided
into 18 latitudinal
Enhanced bands, or zones. The File Yes
Compressed data consists of raster extension is
8-bit unsigned
ARC Raster images and other based on Read-only (Always
integer
Graphics graphics generated by specific three
(ECRG) scanning source product. bands)
documents. CADRG
achieves a nominal
compression ratio of
55:1. ECRG uses
JPEG 2000
compression using a
compression ratio of
20:1.
ECW is a propriatary
Enhanced
format. It is a wavelet- Single file—
Compressed 8-bit unsigned
based, lossy extension Read-only Yes
Wavelet integer
compression, similar *.ecw
(ECW)
to JPEG 2000.
Envisat
Multiple data 8-bit unsigned
Envisat (Environmental Read-only Yes
files integer
Satellite) is an Earth-
80
Supported data Supports
Format Description Extensions Read / Write
types multiband
observing satellite
operation by the
European Space
agency (ESA). This
format supports
Advanced Synthetic
Aperture Radar
(ASAR) Level 1 and
above products, and
some Medium
Resolution Imaging
Spectrometer
(MERIS) and
Advanced Along
Track Scanning
Radiometer (AATSR)
products.
When ENVI works Header file— 8-bit unsigned
Yes
with a raster dataset it extension integer
Read and write
creates a header file *.hdr
containing the
(.dat—via UI) 16-, and 32-bit
information the Multiple data
ENVI Header unsigned/signed
software requires. files—
(.bsq, .img, and integer, and 32-, Yes
This header file can be extension
.raw— and 64-bit floating
created for multiple *.raw, *.img,
developer only point
raster file formats. *.dat, *.bsq,
The Earth etc.
Observation Satellite
(EOSAT) FAST
format support
consists of the Single file—
8-bit unsigned
EOSAT FAST following: FAST- extension Read-only Yes
integer
L7A (Landsat TM) *.fst
and FAST Rev. C.
(IRS).
Multiple files
Header file—
A proprietary raster
extension
format from ER
*.ers
Mapper. Produced 8-, 16-, and 32-bit
ER Mapper using the ER Mapper unsigned/signed
Data file— Read-only Yes
ERS image processing integer, and 32-bit
usually same
software. floating point
as header file
without the
*.ers
extension but
could be any
81
Supported data Supports
Format Description Extensions Read / Write
types multiband
and is
defined in the
header file.
Multiple files
Data file—
Single-band thematic extension
1-, 2-, 4-, 8-, and
ERDAS 7.5 images produced by *.GIS
Read-only 16-bit unsigned No
GIS ERDAS 7.5 image
integer
processing software. Color map
file—
extension
*.trl
Single- or multiband
continuous images Single file—
ERDAS 7.5 8- and 16-bit
produced by the extension Read-only Yes
LAN unsigned integer
ERDAS 7.5 image *.lan
processing software.
Provides a method for
reading and
displaying files that
are not otherwise
supported by another
format but are
formatted in such a
way that the
arrangement of the
data can be described
by a relatively small Single file— 1-, 2-, 4-, 8-, and
ERDAS 7.5
number of extension Read-only 16-bit unsigned No
RAW
parameters. By *.raw integer
creating an ASCII file
that describes the
layout of the raster
data, it can be
displayed without
translation in a
proprietary format.
The format is defined
in the ERDAS
IMAGINE software.
Produced using Single file— 1-, 2-, and 4-bit
Yes
IMAGINE image extension unsigned integer
processing software *.img 8-, 16-bit
ERDAS
created by ERDAS. Read and write unsigned/signed Yes
IMAGINE
IMAGINE files can If image is integer
store both continuous bigger than 2 32-bit
and discrete single- GB— Yes
unsigned/signed
82
Supported data Supports
Format Description Extensions Read / Write
types multiband
band and multiband extension integer
data. *.ige 32-bit floating
Yes
point
World file—
extension 64-bit double
Yes
*.igw precision
A 32-bit signed
No
proprietary Esriformat integer
that supports 32-bit
integer and 32-bit
floating-point raster
grids. Grids are useful Directory
for representing
geographic color map
Esri Grid Read and write
phenomena that vary file— 32-bit floating
No
continuously over extension point
space and for *.clr
performing spatial
modeling and analysis
of flows, trends, and
surfaces such as
hydrology.
Used to reference 32-bit signed
Yes
multiple Esri Grids as integer
a multiband raster
Esri Grid stack dataset. A stack is Directory Read and write
32-bit floating
stored in a directory Yes
point
structure similar to a
grid or coverage.
Used to reference 32-bit signed
Yes
multiple Esri Grids as integer
a multiband raster
dataset. A stack file is Single file—
Esri Grid stack a simple text file that possible file
Read and write
file stores the path and extension 32-bit floating
Yes
name of *.stk point
each EsriGrid
contained within it on
a separate line.
Format used for
storing data
representing n-
Extensible N- dimensional arrays of
Directory—
Dimensional numbers, such as 8-bit unsigned
extension Read-only Yes
Data images. Uses integer
*.sdf
Format(NDF) container files
(directories
containing files and
directories) to manage
83
Supported data Supports
Format Description Extensions Read / Write
types multiband
the data objects.
A floating-point file is Read-only
a binary file of Single file— (Write—Only
Floating point 32-bit floating
floating-point values extension via Raster To No
file point
that represent raster *.flt Float tool, or
data. developer code)
This is a file format
created by the
Geospatial Data
8-, 16-, and 32-bit
Abstraction Library Single file— Read-only
GDAL Virtual unsigned integer
(GDAL). It allows a extension (Write— Yes
Format (VRT) and 64-bit complex
virtual dataset to be *.vrt developer only)
integer
derived from other
datasets that GDAL
can read.
The geodatabase is 1-, and 4-bit
Yes
the native data unsigned integer
structure for ArcGIS 8-bit
and is the primary unsigned/signed Yes
data format for integer
representing and 16-bit
managing geographic unsigned/signed Yes
Raster
information, integer
Geodatabase datasets
including raster Read and write 32-bit
Raster stored within
datasets and mosaic unsigned/signed
*.gdb folder Yes
datasets. integer, or floating
point
The geodatabase is a 32-bit floating
collection of various Yes
point
types of GIS datasets
held in a file system 64-bit double
Yes
folder. precision
There are three types
of Golden Software
Grids that are
supported: Golden Read-only
Golden Software ASCII Single file— (Write— 32-bit floating
Software Grid GRID (GSAG), extension developer only point and 64-bit No
(.grd) Golden Software *.grd for GSAG and double precision
Binary Grid (GSBG), GSBG)
and Golden Software
Surfer 7 Binary Grid
(GS7BG).
Single file—
Graphic A bitmap image extension
8-bit unsigned
Interchange format generally used *.gif Read and write No
integer
Format (GIF) for small images.
World file—
84
Supported data Supports
Format Description Extensions Read / Write
types multiband
extension
*.gfw
The gridded binary
format is used for the
storage, transmission,
and manipulation of
meteorological
archived data and Single file—
64-bit double
GRIB forecast data. The extension Read-only Yes
precision
World Meteorological *.grb
Organization (WMO)
is responsible for the
design and
maintenance of this
format standard.
The next generation
standard for GRIB. Single file—
64-bit double
GRIB 2 extension Read-only No
precision
Subdataset support is *.grb2
not available.
This is an ASCII
format, primarily used Single file—
Grid eXchange 32- and 64-bit
in Geosoft . The extension Read-only No
File floating point
GXF Revision 3 *.gxf
(GXF-3) is supported.
Single file—
extension
This is a compressed
*.hf2
heightfield format 32-bit floating
Heightfield Read-only No
used to support terrain point
Gzip file—
data as a raster.
extension
*.hfz
Raw SRTM height
files containing
elevation measured in
meters above sea Single file—
16-bit signed
HGT level, in a geographic extension Read-only No
integer
projection (latitude *.hgt
and longitude array),
with voids indicated
using -32768.
A self-defining file 8-, and 16-bit
Yes
format used for signed integer
Hierarchical storing arrays of Single file— Read-only 8-, and 16-bit
Yes
Data Format multidimensional extension (Write— unsigned integer
(HDF) 4 data. *.h4 or *.hdf developer only) 32-bit
unsigned/signed Yes
Subdataset support is integer, or floating
85
Supported data Supports
Format Description Extensions Read / Write
types multiband
not available. point
8-, and 16-bit
Yes
signed integer
The next generation
Single file— 8-, and 16-bit
Hierarchical standard for HDF. Yes
extension unsigned integer
Data Format Read-only
*.h5 or 32-bit
(HDF) 5 Subdataset support is
*.hdf5 unsigned/signed
not available. Yes
integer, or floating
point
HRE data is intended
for a wide variety of
National Geospatial-
Intelligence Agency
(NGA) and National
System for Geospatial
Intelligence (NSG)
partners and
members, and
Multiple files
customers external to
the NSG, to access
Raw image—
High and exploit
extension 16-bit signed
Resolution standardized data
*.hr* Read-only integer and 32-bit No
Elevation products. HRE data
floating point
(HRE) replaces the current
Metadata—
non-standard High
extension
Resolution Terrain
*.xml
Elevation/Information
(HRTE/HRTI)
products and also
replaces non-standard
products referred to as
DTED level 3 thru 6.
Color map—
extension
86
Supported data Supports
Format Description Extensions Read / Write
types multiband
*.smp
Georeference
file—
extension
*.ref
Raster Byte, 16-, and 32-
ILWIS format for
Map—*.mpr, bit unsigned
ILWIS raster maps and map Read-only Yes
Maplist— integer, and 64-bit
lists.
*.mpl floating point
Image Display Single file—
File format used by
and Analysis extension Read-only 8-bit binary No
WinDisp 4.
(IDA) *.img
ISIS Cube format as
created by the United
Integrated
States Geological
Software for Single file— 8-bit unsigned
Survey (USGS) for
Imagers and extension Read-only integer and 32-bit No
the mapping of
Spectrometers *.cub floating point
planetary imagery.
(ISIS)
Versions 2 and 3 are
supported.
Binary
Intergraph's
imagery—
Intergraph CIT proprietary format for Read-only 1-bit No
extension
16-bit imagery (CIT).
*.cit
Intergraph's Grayscale
Intergraph proprietary format for imagery— 8-bit unsigned
Read-only No
COT 8-bit unsigned extension integer
imagery (COT). *.cot
This format was
created by the
Japanese
Japanese Aerospace
Aerospace Single file—
eXploration Agency
eXploration extension 16-bit signed
(JAXA) to store data Read-only Yes
Agency *1.5GUD or integer
from processed
(JAXA) *1.1 A
PALSAR data. Level
PALSAR
1.1 and Level 1.5 are
supported.
A standard
Single file—
compression
Joint extension Yes
technique for storing
Photographic *.jpg, *.jpeg,
full-color and
Experts Group *.jpc, or *.jpe 8-bit unsigned (Limited
grayscale images. Read and write
(JPEG) File integer to one or
Support for JPEG
Interchange World file— three
compression is
Format (JFIF) extension bands)
provided through the
*.jgw
JFIF file format.
JPEG 2000 A compression Single file— Read and write 8-, and 16-bit Yes
87
Supported data Supports
Format Description Extensions Read / Write
types multiband
technique especially extension unsigned integer
for maintaining the *.jp2, *.j2c,
quality of large *.j2k, or
imagery. Allows for a *.jpx
high-compression
ratio and fast access
to large amounts of
data at any scale.
Magellan's BLX/XLB
file format is
primarily used for
storing topographic
data. The tile size for
these files must be a
multiple of 128 by
Magellan
128 pixels. The Single file— Read-only
Mapsend 16-bit signed
projection for these extension (Write— No
BLX/XLB integer
files is WGS84. *.blx or *.xlb developer only)
format
When the file is
ordered using
littleendian, the file
extension is BLX. If
bigendian is used, the
file extension is XLB
format.
8-bit unsigned Single
integer band
Single file— Read-only
PCRaster's raster 32-bit signed Single
MAP extension (Write—
format. integer band
*.map developer only)
32-bit floating Single
point band
A map cache created
by ArcGIS Server can
be viewed as a single
Read-only (in
raster dataset. You
Map service Desktop); write 8-bit unsigned
cannot build pyramids Directory Yes
cache using ArcGIS integer
or calculate statistics.
Server
It should not be used
for any analysis or
processing.
MRF is a technology Multiple files
developed
8-, 16-, and 32-bit
by NASAthat Raw image—
unsigned/signed
Meta Raster combines raster extension
Read and write integer, 32-bit Yes
Format (MRF) storage with cloud *.mrf
floating point, 64-
computing for tiling,
bit complex
indexing and multiple Metadata—
resolutions support. It extension
88
Supported data Supports
Format Description Extensions Read / Write
types multiband
is a raster storage *.xml
format, but also a tile
cache format for web Index—
services and a extension
dynamic tile cache for *.idx
another raster.
A proprietary
compression
technique especially
Yes
for maintaining the
quality of large
Generation
images. Allows for a Single file—
2 and
Multiresolution high compression extension
generation
Seamless ratio and fast access *.sid
8-, and 16-bit 3—limited
Image to large amounts of Read-only
unsigned integer to 1 or 3
Database data at any scale. The World file—
bands
(MrSID) MrSID Encoder is extension
developed and *.sdw
Generation
supported
4—
by LizardTech, Inc.
unlimited
Supports generations
2, 3, and 4.
MrSID generation 4
(MG4) format, used
Single file—
to support point
extension
clound (lidar) data.
*.sid
Will be rendered as a
64-bit double
MrSID Lidar raster. Read-only No
Optional precision
View file—
View files can be
extension
used to define how the
*.view
point cloud data will
be viewed.
National A collection of
Imagery standards and
1-, 8-, and 16-bit
Transmission specifications that Yes
unsigned integer
Format (NITF) allow interoperability
2.0 in the dissemination
of imagery and its Single file— 1-, 2-, 4-, 8-, and
metadata among extension Read-only 16-bit unsigned Yes
various computer *.ntf integer
NITF 2.1/NSIF systems. Developed
1.0 by the NGA. 1-, 2-, 4-, 8-, and
16-bit signed Yes
Subdataset support is integer
not available.
National Land The NLAPS data Multiple files 8-bit unsigned
Read-only Yes
Archive format (NDF) is used integer
89
Supported data Supports
Format Description Extensions Read / Write
types multiband
Production by the USGS to Main File
System distribute their (Header)—
(NLAPS) Landsat MSS and TM extension
data. *.H1, *.H2,
or *.HD
Image data—
extension
*.I1, *.I2,
etc.
Support for NOAA's
Polar Orbiter Data
Multiple
(POD); specicically
NOAA Polar Files—
for the Advanced 16-bit unsigned
Orbiter Level extension Read-only Yes
Very High Resolution integer
1b *.1b, *.sv,
Radiometer
*.gc
(AVHRR) Level 1b
digital data.
This is a simple ASCII 8-bit, 16-bit
Single File
file used and created unsigned integer,
PCI .aux — extension Read-only Yes
by PCI to read raw 16-bit signed
*.aux
binary raster data. integer, and 32-bit
8-, and 16-bit
Yes
unsigned integer
Single file— Read-only
PCI Geomatics 16-bit signed
PCIDSK extension (Write— Yes
raster dataset format. integer
*.pix developer only)
32-bit floating
Yes
point
The Planetary Data
System (PDS) is
Single file—
managed by NASA to
possible
Planetary Data archive and distribute 16-bit signed
extensions Read-only No
System (PDS) data from its integer
*.img and
planetary missions.
*.lbl
PDS version 3 is
supported.
Provides a well- 1-, 2-, and 4-bit
Read and write unsigned integer No
compressed, lossless
compression for raster *Yes
Note:ArcGIS is
files. It supports a (limited to
able to read an
Portable large range of bit one or
Single file— alpha band from
Network depths from 8-bit unsigned three
extension an existing
Graphics monochrome to 64-bit integer bands
*.png PNG; however,
(PNG) color. Its features only, no
it will only
include indexed color alpha
write one- or
images of up to 256 channel)
three-band PNG
colors and effective 16-bit unsigned *Yes
files.
100 percent lossless integer (limited to
90
Supported data Supports
Format Description Extensions Read / Write
types multiband
images of up to 16 bits one or
per pixel. three
bands
only, no
alpha
channel)
RADARSAT-2
satellite produces 16-bit unsigned
RADARSAT-2 imagery using the C- Directory Read-only integer and 32-bit Yes
band SAR and X- complex integer
band frequencies.
Raster Product 8-bit unsigned
No
Format (RPF) integer
8-bit unsigned
RPF (CIB) No
integer
8-bit unsigned
RPF (CADRG) Yes
The underlying Single file— integer
format of CADRG no standard Read-only Yes
and CIB. file extension
8-bit unsigned
RPF (ADRG) (Always
integer
three
bands)
8-bit unsigned
RPF (ADRG) No
integer
SAGA binary grid
datasets are composed 8-, 16-, and 32-bit
Multiple
of an ASCII header unsigned, integer,
files—
SAGA GIS (.sgrd) and a binary 16-, and 32-bit
extension Read-only Yes
Binary Grid data (.sdat) file with a signed integer, 32-,
*.sdat and
common basename. and 64-bit floating
*.sgrd
Select the .sdat file to point
access the dataset.
Sandia National
Laboratories created a
Sandia
complex image Single file —
Synthetic 16-, and 32-bit
format to extension Read-only Unknown
Aperture Radar complex integer
accommodate the data *.gff
(GFF)
from its synthetic
aperture radar.
The HGT format is
used to store elevation
Shuttle Radar data from the Shuttle
Single file— Read-only
Topography Radar Topography 32-bit signed
extension (Write— No
Mission Mission (SRTM). integer
*.hgt developer only)
(SRTM) SRTM-3 and SRTM-
1 v2 files can be
displayed.
Spatial Data The Spatial Data Multiple Read-only 16-bit signed No
91
Supported data Supports
Format Description Extensions Read / Write
types multiband
Transfer Transfer Standard files— integer or 32-bit
Standard (SDTS) was created extension floating point
(SDTS) by the USGS. The *.ddf
digital purpose of this format
elevation was to transfer digital
model (DEM) geospatial data
between various
computer systems in a
compatible format
that would not lose
any information.
Widespread use in the 1-bit unsigned
No
desktop publishing integer
world. It serves as an 4-bit signed integer No
interface to several Single file— 8-bit unsigned
scanners and graphic possible file Yes
integer
Tagged Image arts packages. TIFF extensions 8-bit signed integer No
File Format supports black-and- *.tif, *.tiff, 16-bit unsigned
(TIFF) white, grayscale, and *.tff Read and write Yes
integer
(GeoTIFF tags pseudo color, and true
16-bit signed
are supported) color images, all of World file— No
integer
which can be stored in extension
a compressed or *.tfw 32-bit
decompressed format. unsigned/signed
No
integer, or floating
BigTIFF is supported. point
The Terragen Terrain Single file—
file was created by possible file Read-only
Terragen 16-bit signed
Planetside Software. extensions (Write— No
terrain integer
It stores elevation *.ter, developer only)
data. *.terrain
The TerraSAR-X
radar satellite
16-bit unsigned
produces earth
TerraSAR-X Directory Read-only integer and 32-bit No
observation data
complex integer
using the X-band
SAR.
National Oceanic and
Atmospheric
Single file —
Transformation Administration's 32-bit floating
extension Read-only No
Grids (NOAA's) files used point
*.gtx
for shifting a vertical
datum.
United States This format consists Single file—
Geological of a raster grid of extension
Survey regularly spaced *.dem (need 16-bit signed
Read-only No
(USGS) digital elevation values to change integer
elevation derived from the .dat
model (DEM) USGS topographic extension to
92
Supported data Supports
Format Description Extensions Read / Write
types multiband
map series. In their .dem)
native format, they are
written as ANSI-
standard ASCII
characters in fixed-
block format.
Single file—
USGS Digital This is the new, possible file
Orthophoto labeled DOQ (DOQ2) extensions 8-bit unsigned
Read-only Yes
Quadrangels format from the *.doq, *.nes, integer
(DOQ) USGS. *.nws, *.ses,
*.sws
Stores color images in
Single file— Read-only
XPixMap a format consisting of 8-bit unsigned
extension (Write— No
(XPM) an ASCII image and a integer
*.xpm developer only)
C library.
= is supported
NA = not applicable
93
16-bit
NA NA NA
unsigned
16-bit
NA NA NA NA
signed
32-bit
NA NA NA NA NA
unsigned
32-bit
NA NA NA NA NA
signed
32-bit
floating NA NA NA NA
point
Tablet digitizers with a free cursor connected with a personal computer are the most common
device for digitizing spatial features with the planimetric coordinates from analog maps. The
analog map is placed on the surface of the digitizing tablet as shown in Figure 3.2. The size of
digitizer usually ranges from A3 to A0 size.
94
3.5 DIGITIZERS
Digitizers are the most common device for extracting spatial information from maps and
photographs
o the map, photo, or other document is placed on the flat surface of the digitizing
tablet
Hardware
the position of an indicator as it is moved over the surface of the digitizing tablet is
detected by the computer and interpreted as pairs of x,y coordinates
o the indicator may be a pen-like stylus or a cursor (a small flat plate the size of a
hockey puck with a cross-hair)
frequently, there are control buttons on the cursor which permit control of the system
without having to turn attention from the digitizing tablet to a computer terminal
digitizing tablets can be purchased in sizes from 25x25 cm to 200x150 cm, at
approximate costs from $500 to $5,000
early digitizers (ca. 1965) were backlit glass tables
95
o a magnetic field generated by the cursor was tracked mechanically by an arm
located behind the table
o the arm's motion was encoded, coordinates computed and sent to a host
processor
o some early low-cost systems had mechanically linked cursors - the free-cursor
digitizer was initially much more expensive
the first solid-state systems used a spark generated by the cursor and detected by linear
microphones
o problems with errors generated by ambient noise
contemporary tablets use a grid of wires embedded in the tablet to generate a magnetic
field which is detected by the cursor
o accuracies are typically better than 0.1 mm
o this is better than the accuracy with which the average operator can position the
cursor
o functions for transforming coordinates are sometimes built into the tablet and
used to process data before it is sent to the host
arise since most maps were not drafted for the purpose of digitizing
o paper maps are unstable: each time the map is removed from the digitizing table,
the reference points must be re-entered when the map is affixed to the table again
o if the map has stretched or shrunk in the interim, the newly digitized points will
be slightly off in their location when compared to previously digitized points
96
o errors occur on these maps, and these errors are entered into the GIS database as
well
o the level of error in the GIS database is directly related to the error level of the
source maps
maps are meant to display information, and do not always accurately record locational
information
o for example, when a railroad, stream and road all go through a narrow mountain
pass, the pass may actually be depicted wider than its actual size to allow for the
three symbols to be drafted in the pass
discrepancies across map sheet boundaries can cause discrepancies in the total GIS
database
o e.g. roads or streams that do not meet exactly when two map sheets are placed next
to each other
user error causes overshoots, undershoots (gaps) and spikes at intersection of lines
diagram
Digitizing costs
a common rule of thumb in the industry is one digitized boundary per minute
o e.g. it would take 99/60 = 1.65 hours to digitize the boundaries of the 99 counties
of Iowa
3.6 TOPOLOGY
Topology expresses explicitly the spatial relationships between connecting or adjacent vector
features (points, polylines and polygons) in a GIS, such as two lines meeting perfectly at a point
and directed line having an explicit left and right side.
Topological or topology based data are useful for detecting and correcting digitizing error in
geographic data set and are necessary for some GIS analyses.
97
Topologic data structures help insure that information is not unnecessarily repeated. The database
stores one line only in order to represent a boundary (as opposed to two lines, one for each
polygon). The database tells us that the line is the “left side” of one polygon and the “right side”
of the adjacent polygon.
Topology is the study of those properties of geometric objects that remain invariant under certain
transformations such as bending or stretching.
Topology is often explained through graph theory. Topology has least two main advantages.
Topological relationships are built from simple elements into complex elements: points (simplest
elements), arcs (sets of connected points), areas (sets of connected arcs), and routes (sets of
sections, which are arcs or portions of arcs).
Components of Topology
Topology has three basic components:
Points along an arc that define its shape are called Vertices.
Endpoints of the arc are called Nodes.
Arcs join only at the Nodes.
2. Area Definition / Containment (Polygon – Arc Topology):
98
1. Topology in different GIS Format
Coverage
Coverage is a topology based vector data format. Coverage can be a point coverage, line coverage,
or polygon coverage.
Figure: Diagram showing the coverage data structure for storing vector data.
Shapefile
Shapefile is a standard non topological data format. Shape file are a first attempt an object spatial
features.They are very simple floating point geometry feature. A Shapefile is a digital vector
storage format for storing geometric location and associated attribute information.
It maintains data in separate layers. But it does not support topology. It is AutoCAD format.
Geodatabase
A relational database is a collection of tables logically associated with each other by common
key attribute field.
A geodatabase can store geographic information because, besides storing a number or a string in
a attribute field; tables in a geodatabase can also store geometric coordinates to define the shape
and locations of points, lines or polygon.
A personal geodatabase is file with extension .mdb, which is the file extension used by Microsoft
access. A file geodatabase is folder in which file stored with .gdb extension.
Topological Error
100
method of producing vector maps is the precise scanning of analog maps into raster formats and
then digitizing into vector forms (Grimshaw, 1994).
Topological errors with polygon features can include unclosed polygons, gaps between polygon
borders or overlapping polygon borders. A common topological error with polyline features is
that they do not meet perfectly at a point (node). This type of error is called an undershoot if a
gap exists between the lines, and an overshoot if a line ends beyond the line it should connect to.
Containment is the property that defines one entity as being within another. For example,
if an isolated node (representing a household) is located inside a face (representing a
congressional district) in the MAF/TIGER database, you can count on it remaining inside that
face no matter how you transform the data. Topology is vitally important to the Census Bureau,
whose constitutional mandate is to accurately associate population counts and characteristics with
political districts and other geographic areas.
Connectedness refers to the property of two or more entities being connected. Recall the
visual representation of the geometric primitives in Figure 6.3. Topologically, node N14 is not
connected to any other nodes. Nodes N9 and N21 are connected because they are joined by edges
E10, E1, and E10. In other words, nodes can be considered connected if and only if they are
reachable through a set of nodes that are also connected; if a node is a destination, we must have
a path to reach it.
Connectedness is not immediately as intuitive as it may seem. A famous problem related to
topology is the Königsberg bridge puzzle (Figure 6.5).
Try This: Can you solve the Königsberg bridge problem?
101
The challenge of the puzzle is to find a route that crosses all seven bridges, while respecting the
following criteria:
At first, the two nodes in Figure 6.6 might look like they are adjacent. Zooming in or tilting the
plane of view reveals otherwise. This is because nodes, as points made from coordinate pairs, do
not have a length or width; they are size-less and shapeless. Without any size or dimensionality,
it is impossible for nodes to be adjacent. The only way for two nodes to ‘touch’ would be for them
to have the exact same coordinates – which then means that there aren’t really two nodes, just
one that has been duplicated.
This is exactly why features in the MAF/TIGER database are represented only once. As David
Galdi (2005) explains in his white paper “Spatial Data Storage and Topology in the Redesigned
102
MAF/TIGER System,” the “TI” in TIGER stands for “Topologically Integrated.” This means that
the various features represented in the MAF/TIGER database—such as streets, waterways,
boundaries, and landmarks (but not elevation!)—are not encoded on separate “layers.” Instead,
features are made up of a small set of geometric primitives — including 0- dimensional nodes
and vertices, 1-dimensional edges, and 2-dimensional faces —without redundancy. That means
that where a waterway coincides with a boundary, for instance, MAF/TIGER represents them
both with one set of edges, nodes and vertices. The attributes associated with the geometric
primitives allow database operators to retrieve feature sets efficiently with simple spatial queries.
To accommodate this efficient design and eliminate the need for visual or mental exercises in
order to determine topological states, the MAF/TIGER structure abides by very specific rules that
define the relations of entities in the database (Galdi 2005):
1. Every edge must be bounded by two nodes (start and end nodes).
2. Every edge has a left and right face.
3. Every face has a closed boundary consisting of an alternating sequence of nodes and edges.
4. There is an alternating closed sequence of edges and faces around every node.
5. Edges do not intersect each other, except at nodes.
Compliance with these topological rules is an aspect of data quality called logical consistency. In
addition, the boundaries of geographic areas that are related hierarchically — such as blocks,
block groups, tracts, and counties (all defined in Chapter 3 ) — are represented with common,
non-redundant edges. Features that do not conform to the topological rules can be identified
automatically, and corrected by the Census geographers who edit the database. Given that the
MAF/TIGER database covers the entire U.S. and its territories, and includes many millions of
primitives, the ability to identify errors in the database efficiently is crucial.
So how does topology help the Census Bureau assure the accuracy of population data needed for
reapportionment and redistricting? To do so, the Bureau must aggregate counts and characteristics
to various geographic areas, including blocks, tracts, and voting districts. This involves a process
called “address matching” or “address geocoding” in which data collected by household is
assigned a topologically-correct geographic location. The following pages explain how that
works.
Topology
Rule description Potential fixes Examples
rule
Must Be Requires that a Delete: The
Larger feature does not Delete fix removes
Than collapse during a polygon features
Cluster validate process. that would
Tolerance This rule is collapse during the
mandatory for a validate process
topology and based on the Any polygon feature, such as the one in red,
applies to all line topology's cluster
and polygon tolerance. This fix that would collapse when validating the topology
103
feature classes. In can be applied to is an error.
instances where one or more Must
this rule is Be Larger Than
violated, the Cluster Tolerance
original geometry errors.
is left unchanged.
104
to create a planar
representation of
the feature
geometry. This fix
can be applied to
one or more
selected Must Not
Overlap errors.
105
Must Not Requires that the Subtract: The
Overlap interior of Subtract fix
With polygons in one removes the
feature class (or overlapping
subtype) must not portion of each
overlap with the feature that is
interior of causing the error
polygons in and leaves a gap
another feature or void in its
class (or subtype). place. This fix can
Polygons of the be applied to one
two feature classes or more selected
can share edges or Must Not Overlap
vertices or be With errors.
completely Merge: The
disjointed. This Merge fix adds the
rule is used when portion of overlap
an area cannot from one feature
belong to two and subtracts it
separate feature
from the others
classes. It is useful that are violating
for combining two the rule. You need
mutually exclusive to pick the feature
systems of area that receives the
classification, portion of overlap
such as zoning and using the Merge
water body type, dialog box. This
where areas fix can be applied
defined within the to one Must Not
zoning class Overlap With
cannot also be error only.
defined in the
water body class
and vice versa.
108
Boundary Requires that Create
Must Be boundaries of Feature: The
Covered polygon features Create Feature fix
By must be covered by creates a new line
lines in another feature from the
feature class. This boundary
rule is used when segments of the
area features need polygon feature
to have line generating the
features that mark error. This fix can
the boundaries of be applied to one
the areas. This is or more selected
usually when the Boundary Must Be
areas have one set Covered By
of attributes and errors.
their boundaries
have other
attributes. For
example, parcels
might be stored in
the geodatabase
along with their
boundaries. Each
parcel might be
defined by one or
more line features
that store
information about
their length or the
date surveyed, and
every parcel
should exactly
match its
boundaries.
Line rules
Topology
Rule description Potential fixes Examples
rule
Must Be Requires that a Delete: The Delete
Larger feature does not fix removes line
Than collapse during a features that would
Cluster validate process. collapse during the
Tolerance This rule is validate process
mandatory for a based on the
topology and topology's cluster Any line feature, such as these
applies to all line tolerance. This fix
and polygon feature can be applied to one lines in red, that would collapse
classes. In instances or more Must Be when validating the topology is an
where this rule is Larger Than Cluster
violated, the Tolerance errors. error.
original geometry is
left unchanged.
111
toolbar to create
single-part features.
This fix can be
applied to one
selected Must Not
Overlap error only.
115
Must Not Requires that a line Merge To
Have connect to at least Largest: The Merge
Pseudo two other lines at To Largest fix will
Nodes each endpoint. merge the geometry
Lines that connect of the shorter line into
to one other line (or the geometry of the
to themselves) are longest line. The
said to have pseudo attributes of the
nodes. This rule is longest line feature
used where line will be retained. This
features must form fix can be applied to
closed loops, such one or more Must Not
as when they define Have Pseudo Nodes
the boundaries of errors.
polygons or when Merge: The Merge
line features fix adds the geometry
logically must of one line feature
connect to two other into the other line
line features at each feature causing the
end, as with
error. You must pick
segments in a the line feature into
stream network, which to merge. This
with exceptions fix can be applied to
being marked for one selected Must
the originating ends Not Have Pseudo
of first-order Nodes error.
streams.
Point rules
Topology
Rule description Potential fixes Examples
rule
Must Requires that Snap: The Snap
Coincide points in one fix will move a
With feature class (or point feature in the
subtype) be first feature class
coincident with or subtype to the
points in another nearest point in
feature class (or the second feature
Where a red point is not coincident
subtype). This is class or subtype
with a blue point is an error.
useful for cases that is located
where points must within a given
be covered by distance. If no
other points, such point feature is
as transformers found within the
must coincide with tolerance
power poles in specified, the
electric distribution point will not be
networks and snapped. The Snap
121
observation points fix can be applied
must coincide with to one or more
stations. Must Coincide
With errors.
122
Must Be Requires that Delete: The
Properly points fall within Delete fix removes
Inside area features. This point features that
is useful when the are not properly
point features are within polygon
related to features. Note that
polygons, such as you can use the
wells and well pads Edit tool and
or address points move the point
and parcels. inside the polygon
feature if you do
The squares are errors where there are
not want to delete
points that are not inside the polygon.
it. This fix can be
applied to one or
more Must Be
Properly Inside
errors.
124
3.10 ODBC
PostgreSQL
...
EXAMPLE
In this example we copy the dbf file of a SHAPE map into ODBC, then connect GRASS to the
ODBC DBMS. Usually the table will be already present in the DBMS.
MS-Windows
Linux
Configure ODBC driver for selected database (manually or with 'ODBCConfig'). ODBC drivers
are defined in /etc/odbcinst.ini. Here an example:
[PostgreSQL]
Description = ODBC for PostgreSQL
Driver = /usr/lib/libodbcpsql.so
Setup = /usr/lib/libodbcpsqlS.so
FileUsage =1
125
Create DSN (data source name). The DSN is used as database name in db.* modules. Then DSN
must be defined in $HOME/.odbc.ini (for this user only) or in /etc/odbc.ini for (for all users)
[watch out for the database name which appears twice and also for the PostgreSQL protocol
version]. Omit blanks at the beginning of lines:
[grass6test]
Description = PostgreSQL
Driver = PostgreSQL
Trace = No
TraceFile =
Database = grass6test
Servername = localhost
UserName = neteler
Password =
Port = 5432
Protocol = 8.0
ReadOnly = No
RowVersioning = No
ShowSystemTables = No
ShowOidColumn = No
FakeOidIndex = No
ConnSettings =
Configuration of an DSN without GUI is described on http://www.unixodbc.org/odbcinst.html,
but odbc.ini and .odbc.ini may be created by the 'ODBCConfig' tool. You can easily view your
DSN structure by 'DataManager'. Configuration with GUI is described on
http://www.unixodbc.org/doc/UserManual/
psql -V
Using the ODBC driver
Now create a new database if not yet existing:
To store a table 'mytable.dbf' (here: in current directory) into PostgreSQL through ODBC, run:
Next link the map to the attribute table (now the ODBC table is used, not the dbf file):
Finally a test: Here we should see the table columns (if the ODBC connection works):
126
db.tables -p
db.columns table=mytable
Note that you can also connect mySQL, Oracle etc. through ODBC to GRASS.
You can also check the vector map itself concerning a current link to a table:
v.db.connect -p mytable.shp
which should print the database connection through ODBC to the defined RDBMS.
3.11 GPS
GPS or Global Positioning System is a constellation of 27 satellites orbiting the earth at about
12000 miles. These satellites are continuously transmitting a signal and anyone with a GPS
receiver on earth can receive these transmissions at no charge. By measuring the travel time of
signals transmitted from each satellite, a GPS receiver can calculate its distance from the satellite.
Satellite positions are used by receivers as precise reference points to determine the location of
the GPS receiver. If a receiver can receive signals from at least 4 satellites, it can determine
latitude, longitude, altitude and time. If it can receive signals from 3 satellites, it can determine
latitude, longitude and time. The satellites are in orbits such that at any time anywhere on the
planet one should be able to receive signals from at least 4 satellites. The basic GPS service
provides commercial users with an accuracy of 100 meters, 95% of the time anywhere on the
earth. Since May of 2000, this has improved to about 10 to 15 meters due to the removal of
selective availability.
GPS technology offers several advantages: First and foremost, the service is free worldwide and
anyone with a receiver can receive the signals and locate a position. Second, the system supports
unlimited users simultaneously. Third, one of the great advantages of GPS is the fact that it
provides navigation capability.
127
Limitations of GPS
As with any technology, GPS also has some limitations. It is essential that the users are aware
of these limitations.
128
In brief, following are the key features of GPS:-
2 A GPS receiver measures distance using the travel time of radio signals.
3 To measure travel time GPS needs very accurate timing that is achieved with some
techniques.
4 Along with distance, one needs to know exactly where the satellites are in space.
5 Finally one must correct for any delays, the signal experience as it travels through the
atmosphere.
The whole idea behind GPS is to use satellites in space as reference points for location here
on earth. By very accurately measuring the distances from at least three satellites, we can
GPS Elements
GPS has 3 parts: the space segment, the user segment, and the control segment, Figure-1.2
illustrates the same. The space segment consists of a constellation of 24 satellites, each in its
own orbit, which is 11,000 nautical miles above the Earth. The user segment consists of
receivers, which can be held in hand or mount in the vehicle. The control segment consists o f
ground stations (six of them, located around the world) that make sure the satellites are working
properly. More details on each of these elements can be referred from any standard book or
online literature on GPS.
129
Figure 1.2: GPS segments
130
GPS Satellite Navigation System
GPS is funded and controlled by the U. S. Department of Defense (DOD). While there are many
thousands of civil users of GPS worldwide, the system was designed for and is operated by
the U. S. military. It provides specially coded satellite signals that can be processed in a GPS
receiver, enabling the receiver to compute position, velocity and time. Four GPS satellite signals
are used to compute positions in three dimensions and the time offset in the receiver clock.
GPS positioning techniques may be categorized as being predominantly based on code or carrier
measurements. Code techniques are generally simple and produce low accuracy, while carrier
techniques are more complex and produce higher accuracy. There exist a variety of positioning
methods for both code and carrier measurements. The suitability of each for a specific
application is dependent on the desired accuracy, logistical constraints and costs. Many
variables affect this accuracy, such as the baseline lengths, ionospheric conditions, magnitude
of selective availability, receiver types used, and processing strategies adopted.
The technique used to augment GPS is known as “differential”. The basic idea is to locate
one or more reference GPS receivers at known locations in users‟ vicinities and calibrate
ranging errors as they occur. These errors are transmitted to the users in near real time. The
errors are highly correlated across tens of kilometers and across many minutes. Use of such
corrections can greatly improve the accuracy and integrity. To increase the accuracy of
positioning, Differential-GPS (D-GPS) was introduced. The idea is as follows: a reference
station is located at a known and accurately surveyed point. The GPS reference station
determines its GPS position using four or more satellites. Given that the position of the GPS
reference station is exactly known, the deviation of the measured position to the actual position
and more importantly the measured pseudo range to each of the individual satellites can be
calculated. The differences are either transmitted immediately by radio or used
131
afterwards for correction after carrying out the measurements. The man made error like
Due to the high accuracy, usability, ease and economy of operations in all weather, offered by
GPS, it has found numerous applications in many fields ranging from accurac y level of mm for
the high precision geodesy to several meters for navigational positioning. Some of the
applications in urban and transportation field are: i) establishment of ground control points
for imageries / map registration, ii) determination of a precise geo ID using GPS data, iii) survey
control for topographical and cadastral surveys, iv) air, road, rail, and marine navigation, v)
intelligent traffic management system, vi) vehicle tracking system etc.
132
UNIT IV DATA ANALYSIS 9
Vector Data Analysis tools - Data Analysis tools - Network Analysis - Digital Education models
- 3D data collection and utilisation.
Analysis is often considered to be "the heart" of GIS. Through analysis new information is gained.
As a GIS stores both attribute and spatial data, analysis can be conducted on both types of data –
however, it is the spatial analysis capability that sets GIS apart from database applications.
There are a great many GIS analyses that can be conducted. For convenience sake we often group
the analyses into categories. For the purpose of this course I have decided to create a bit of a hierarchy.
This is done to reflect your lab experiences using GIS. Our first encounter with GIS analyses is with
vector data. Later we will (hopefully) move into analysis of raster data. The focus of this module
is vector data. The breakout of vector analysis is as follows: Attribute
Attribute Query - Select by Attribute
Arithmetic Calculation
Statistical Summary
Reclassification
Relating Tables
Spatial Join
Spatial
Spatial Query - Select by Location
Spatial Calculation
Spatial Join
Overlay
Buffer
Dissolve
Network Analysis
ATTRIBUTE DATA OPERATIONS
There are many operations that can be conducted on the attribute database (the data tables). These
can be divided into 4 categories: query (or logical), arithmetic, statistical and reclass operations. I
133
included a short note about relating tables as it enhances our analytical capabilities. There is also the
spatial join function, which straddles attribute and spatial analysis.
Queries = Select by Attributes
Queries include both comparison (=, >, <, >=, <=, <>) and Boolean (AND, OR and NOT) operators
(for a simple but effective visual, check-out the Boolean Machine). These operators are used to
perform queries.
Example 1: a simple comparison query,
Forest_Age >= 250 (years)
would query for a subset of forest polygons that could be considered ‘old growth’.
Note the query has 3 parts: field name, operator and value.
Example 2: in Pacific Northwest critical deer winter range has old growth Douglas-fir trees. The
query has two criteria and would look like,
Forest_Age >= 250 AND Fir% >= 50.
This second query operates on two attribute fields and is more specific (restrictive) than the
first – it would yield a smaller subset of polygons as both conditions would have to be met.
Example 3: if deer simply liked old growth and/or Douglas-fir (a fictitious example), then the query
would be
Forest_Age >= 250 OR Fir% >= 50.
This query is more inclusive as a stand can be either old OR composed of Douglas-fir.
Arithmetic
Arithmetic operators perform simple mathematical functions on values in the attribute database;
operators include:
· +
· -
· /
· *
n
· (raised to the power of)
· √
· Sin
· Cos
· Tan
134
These operators can be utilized to to calculate values to be placed in a new field:
convert square metres (m2) to hectares (ha) [e.g. divide by 10,000] - results would be placed in a
new field in the table
convert driving distance to driving time [ e.g. divide by average driving speed] - results would be
placed in a new field in the table
determine total volume (m3) [e.g. multiply area (ha) by inventory volume (m3/ha)] - again results
would be placed in a new field in the table
Statistics
Statistical operations can also be performed on the attribute data. There are 2 options available when
you right-click on a field name: 'statistics' and 'summarize'. 'Statistics' provides a temporary pop-up
table with the typical parameters:
· count
· minimum
· maximum
· sum
· mean
· standard deviation
Plus the data are plotted in a histogram (frequency distribution).
'Summarize' creates a an output data table. Statistics are based on unique values in a chosen field.
selected fields from these operations are placed in a summary table. In the example below, the field
Group was chosen and the statistics count, sum and mean were calculated.
135
B 70
A 300
Reclassification
Reclassification is another operation that can be conducted on attribute data. Reclassification results
in a generalization (i.e. a simplification) of the original data set. For instance, raw property values
in a data set can be put in 3 classes: lower, middle, and upper class. We typically use the legend editor
(in ArcGIS the Symbology tab of the Layer Properties dialog box) to classify the data
- altering the legend is temporary and we can change the colouring at anytime. (If we want a
"permanent reclassification", e.g. a new map, then we would use Merge / Dissolve - this is described
in section 3.5 below).
Table Relations
Relating tables to each other involves joining or linking records between two tables. This may not
be considered ‘an operation’, but it does allow us to relate outside source data to our themes to allow
the features in our themes to be analyzed based on ‘outside’ data. Refer to the database lecture notes.
Spatial Join
As with relating tables, a spatial join will relate records between two tables. But the records
are not joined based on a common attribute value (usually ID); instead records are joined based on
‘common location’ (as defined by the coordinates of the spatial features). This type of operation is
a combination of spatial and attribute; it is described in more detail in section 3.2 below.
Spatial Calculations
Simple spatial calculations determine areas, perimeters, and distances based on the coordinates
(in ArcView these are accessed through the ‘shape’ field as it contains the vertices); the calculations
utilize the coordinates that define the features, but the results are stored in the database table (so this
operation also straddles both attribute and geometric).
Spatial Join
As previously stated, this operation is a mix of spatial and attribute operations. The end result is a
join of two database tables, but the basis for the join is ‘coincident space’. As with a ‘regular join’
the relation has to be one-to-one or many-to-one between records in the ‘destination-to-source tables’.
As an example we could have two themes: Cities and Countries of the world. A spatial join could
be done for cities as the destination theme, as it yields a many-to-one relation (many cities to one
country). A spatial join would thus bring data from the Countries theme to the Cities theme.
A spatial join could not be done with Countries as the destination table as the relation would be
one-to-many.
137
Overlays – bringing together two themes – the line work of the two themes are combined (lines are
broken, new nodes and links/arcs are recognized, topology is redone, note that sliver polygons may
have to be eliminated) and the fields from both theme databases are combined into one new database.
Three common types of overlays:
Union – is a complete merging of two themes where the new theme is composed of the entire map
area of both themes and all the fields from both theme data tables
Intersect – is a merging of two themes but only where they share space such that the ‘map area’
of the new theme is the area that was in common for both themes and the attribute database is
composed of all the fields from both theme data tables
138
Clip – is akin to pressing a cookie cutter onto a theme such that the new theme is a miniature
version of the first (a mini-me), the map area is defined by the overlay (cookie cutter) theme
and thedatabase comes only from the input (cookie dough) theme.
Update - features from the 'update layer' descend upon the input theme and replace whatever was
underneath. An example would be updating a timber type map (input layer) with a cut block (update
layer) where the cut block shape supersedes the timber types it overlaps.
Erase - the polygons from the 'erase layer' descend upon the input theme and eliminate that area.
An example would be if land were expropriated from a woodlot owner to create a park. The 'park
area' would be erased from the woodlot area.
Buffers
Buffering creates a new theme with new polygon features (geometric objects) based on a constant
139
measure from features in a source theme; buffers around points are circles, around lines are ‘corridors’
(snake-like with rounded ends) and around polygons are ‘donuts’; buffers can be created based on:
a single set width (i.e. all features by 50m)
multiple widths where more than one buffer is created around each feature (i.e. a 50 and a 150m
buffer created around each feature – gives a “bull’s eye” effect)
varying width based on an attribute field (i.e. width for each feature is stored in the data table,
buffer width depends on the value in this field)
Dissolve
With dissolve, also known as merge polygons, boundaries between adjacent polygons with the same
attribute value (i.e. class = poor) are ‘dissolved’ and the two (or more) polygons are merged into one
larger polygon; a new map layer (theme) results with the generalized data. This is the ‘spatial
equivalent’ to reclassification of attribute data.
140
Network Routing – as the name implies, this type of analysis assesses movement through a network.
Consider the difference between the shortest route and the fastest route. During the middle of the
night the ‘shortest route’ is likely the ‘fastest route’, however, during rush hour I would consider
traffic and use the ‘fastest route’ (which may have a longer distance). The network is modeled using
lines (arcs) and intersections (nodes). Arc-node topology provides information regarding
connectivity. The attribute database would provide additional information regarding impedance to
flow (or movement). Examples would include speed limit and traffic loads at different times of the
day. There is a ‘cost’ to making turns at intersections – i.e. you have to slow down rather than use
just two wheels to make the turn. One-way streets would provide for an absolute barrier. As well,
making a turn off of an overpass onto a highway below would be prohibited. Other routing examples
include most efficient route (for making several stops or deliveries) and location-allocation (where
school catchment areas can be determined based on road network and not just a straight-line distance).
4.2 Data Analysis tools
Data analytics tools can help deliver that value and bring that data to life. A lot of hard work goes
into extractingand transforming data into a usable format, but once that’s done, data analytics can
provide users with greater insights into their customers, business, and industry.
There are three broad categories of data analytics that offer different levels of insight:
141
Self-Service Analytics enable end users to structure their own analyses within the context of
IT-provided data and tools.
Embedded Analytics provide business intelligence within the confines of a traditional
business application, like an HR system, CRM, or ERP. These analytics provide context-
sensitive decision support within users’ normal workflow.
If you’re not using an analytics tool, you should be. Gartner predicts that by 2019, self-service
analytics and BI users will actually produce more analysis than data scientists. No matter what level
of insight you need, here are 15 of the best data analytics tools to get you started on your journey, in
no particular order.
Tableau
Tableau features robust functionality with fast speed to insight. With connectivity to many different
local and cloud-based data sources, Tableau’s intuitive interface combines data sourcing, preparation,
exploration, analysis, and presentation in a streamlined workflow.
Tableau’s flexibility makes it well-suited to the three types of analytics discussed above. Tableau
Server can easily house recurring reports. Power users will appreciate the integrated statistical and
geospatial functionality for advanced self-service. And finally, Tableau uses application integration
technologies like JavaScript APIs and single sign-on functionality to seamlessly embed Tableau
analytics into common business applications.
Looker
Looker strives to provide a unified data environment and centralized data governance with heavy
emphasis on reusable components for data-savvy users. Using an extract/load/transform (ELT)
approach, Looker gives users the ability to model and transform data as they need it.
Looker also features proprietary LookML language, which harnesses SQL in a visual and reusable
way. The reusability concept extends to Looker’s Blocks components, which are reusable utilities
for data connections, analysis, visualization, and distribution. Finally, Looker is designed to easily
142
integrate with popular collaboration and workflow tools such as Jira, Slack, and Segment.
Solver
BI360 offers modern, dynamic reporting with out-of-the-box integrations to many of the world’s most
popular on-premise and cloud-based ERP systems. This easy-to-use report writer offers Excel, web,
and mobile interfaces, and provides finance professionals with powerful financial and operational
reporting capabilities in a variety of layouts and presentation formats.
BI360 also offers integrated budgeting workflow and analytics, including industry-specific templates.
Once you connect data sources to the BI360 Suite, use these templates to access data, collaboratively
develop a budget, and display results on predefined dashboards.
Dataiku
Dataiku DSS combines much of the data analysis lifecycle into one tool. It enables analysts to source
and prep data, build predictive models, integrate with data mining tools, develop visualizations for
end users and set up ongoing data flows to keep visualizations fresh. DSS’ collaborative environment
enables different users to work together and share knowledge, all within the DSS platform.
With its focus on data science, DSS tends to serve deeply analytical use cases like churn analytics,
demand forecasting, fraud detection, spatial analytics, and lifetime value optimization.
KNIME
An open-source, enterprise class analytics platform, KNIME is designed with the data scientist in
mind. KNIME’s visual interface includes nodes for everything from extracting to presenting data,
with an emphasis on statistical models. KNIME integrates with several other data science tools
including R, Python, Hadoop, and H2O, as well as many structured and unstructured data types.
KNIME supports leading edge, data science use cases such as social media sentiment analysis,
143
medical claim outline detection, market basket analysis, and text mining.
RapidMiner
RapidMiner emphasizes speed to insight for complex data science. Its visual interface includes pre-
built data connectivity, workflow, and machine learning components. With R and Python integration,
RapidMiner automates data prep, model selection, predictive modeling, and what-if gaming. This
platform also accelerates “behind-the-scenes” work with a combined development and collaboration
environment and integration with Hadoop and Spark big data platforms.
Finally, RapidMiner’s unique approach to self-service utilizes machine learning to glean insight from
its 250,000-strong developer community for predictive analytics development. Its context- sensitive
recommendations, automated parameter selection, and tuning accelerate predictive model
deployment.
Pentaho
Pentaho emphasizes IoT data collection and blending with other data sources like ERP and CRM
systems, as well as big data tools like Hadoop and NoSQL. Its built-in integration with IoT end points
and unique metadata injection functionality speeds data collection from multiple sources. Pentaho’s
visualization capabilities range from basic reports to complex predictive models.
Talend
Talend’s toolset is meant to accelerate data integration projects and speed time to value. An open-
source tool, Talend comes with wizards to connect to big data platforms like Hadoop and Spark. Its
integrated toolset and unique data fabric functionality enable self-service data preparation by business
users. By making data prep easier for users who understand the business context for the data, Talend
removes the IT bottleneck on clean and usable data, which reduces the time to merge data sources.
144
Domo
Domo focuses on speed to insight for less technical users. It features 500+ built-in data connectors
and a visual data prep interface to accelerate data sourcing and transformation. Its robust business
intelligence capabilities enable visualization and social commenting to facilitate collaboration. Domo
also boasts native mobile device support with the same analysis, annotation, and collaboration
experience as desktop.
Sisense
Sisense offers an end-to-end analytics platform with a strong governance component. It offers a visual
data sourcing and preparation environment, plus alerts that notify users when a given metric falls
outside a configurable threshold. Sisense deploys to on-premises, private-cloud, or Sisense- managed
environments, and enables governance at the user role, object, and data levels.
Qlik
Qlik emphasizes speed to insight by automating data discovery and relationships between multiple
data sources during data acquisition and preparation. Instead of the traditional query-based approach
to acquiring data, Qlik’s Associative Engine automatically profiles data from all inbound sources,
identifies linkages, and presents this combined data set to the user. Multiple, concurrent users can
quickly explore large and diverse data sets because of Qlik’s in-memory processing architecture,
which includes compressed binary indexing, logical inference, and dynamic calculation.
Qlik supports RESTful APIs as well as HTML5 and JavaScript. This support enables web, business
application, and mobile platform integration for enterprise-wide embedded analytics.
145
Microstrategy
Founded in 1989, Microstrategy is one of the older data analytics platforms, and has the robustness
that one would expect from such a mature toolset. Microstrategy connects to numerous enterprise
assets like ERPs and cloud data vendors, and integrates with multiple common user clients like
Android, iOS, and Windows. It also provides a variety of common services such as alerts, distribution,
and security, and enables many BI functions like data enrichment, visualization, and user
administration.
Microstrategy enhances data governance by using end-point telemetry to manage user access. By
gathering location, access, authentication, timestamp, and authorization data, this functionality can
help analyze utilization and strengthen security practices.
Thoughtspot
Thoughtspot features a search engine-like interface and AI to enable users to take a conversational
approach to data exploration and analytics. Its SpotIQ engine parses search requests such as “revenue
by country for 2014,” and produces a compelling visualization showing a bar chart ordered least to
greatest.
The Thoughtspot platform helps companies quickly deploy this unique approach to analytics with a
visual data sourcing and preparation pane, extensive in-memory processing, back-end cluster
management for big data environments, centralized row-level security, and built-in embeddable
components.
Birst
Birst focuses on solving one of the most vexing challenges in data analytics: establishing trust in data
from many different sources within the enterprise. Birst’s user data tier automatically sources, maps,
and integrates data sources and provides a unified view of the data to users.
Then, using Birst’s Adaptive User Experience, which breaks down the silo between data discovery
and dashboarding, users can access the unified data sources to develop analytics with no coding or IT
intervention. Finally, Birst enables distribution to multiple platforms and other analytics tools
146
like R and Tableau.
SQL Server Reporting Services (SSRS) is a business intelligence and reporting tool that tightly
integrates with the Microsoft data management stack, SQL Server Management Services, and SQL
Server Integration Services. This toolset enables a smooth transition from database to business
intelligence environment. SSRS in particular offers a visual authoring environment, basic self- service
analytics, and the ability to output spreadsheet versions of reports and visualizations.
SSRS and the Microsoft data management stack are the workhorses of traditional BI. They are a
mature tool set that performs very well with recurring reports and user-entered parameters.
The well-known vendors above support multiple uses cases across many industries. However, the
volume of data generated by traditional business activity, social media, and IoT technology continues
to explode every year, so data analytics options continue to evolve. With so many options, choosing
a vendor can be daunting.
The key to making an informed choice is to understand the unique analytics needs of your
organization and industry. Knowing where your needs fall on the analytics spectrum will help you
productively engage with vendors – and make the most of the analytics you produce.
The network analysis tool uses the javascript library vis.js to generate the interactive force
diagrams. For more information, go to: http://visjs.org/docs/network/.
This tool uses the R tool. Go to Options > Download Predictive Tools and sign in to the Alteryx
Downloads and Licenses portal to install R and the packages used by the R Tool.
Connect inputs
147
N anchor: A Nodes data stream that contains a field named _name_ that uniquely identifies each
node in the network.
Eanchor: An Edges data stream that contains fields named “from” and “to” identifying nodes
that are connected by an edge. Note that the “from” and “to” fields must use the same unique
node identifiers as described in (1) above.
Nodes
o By Statistic: Select a statistical measure to scale the nodes by. For a description of different
centrality measures, go to: https://en.wikipedia.org/wiki/Centrality.
Group Nodes:
Edges
Directed: Select the check box to indicate if the network is a directed network.
Layout
Outputs
D anchor: An Alteryx data stream with network centrality measures for each node.
From the introduction of the World Wide Web in 1993 the young of the world have experienced
two models of digital education, that outside the school walls and that within. Outside the young
and the digitally connected families of the world employed – unseen – the naturally evolving
laissez faire model. Within the school the young worked within the traditional, highly structured
model.
149
It is time the difference is understood, the global success and benefits of the laissez faire recognised
and lauded, and the serious shortcomings of the highly structured understood and addressed. For
much of the period the two models ran in parallel, with most schools showing little or no interest
in the out of school digital education.
Around 2010 – 2012 the scene began to change when a handful of digitally mature schools began
genuinely collaborating with their families in the 24/7/365 digital education of the children. Those
schools had reached the evolutionary stage where their teaching model and culture closely mirrored
that of the families. They revealed what was possible with collaboration.
That said it took time for that collaboration to take hold more widely and for the most part the
parallel models continue in operation today, with the difference between the in and out of school
teaching growing at pace.
It is surely time for schools and government to question the retention of the parallel modes and to
ask if taxpayers are getting value for the millions upon millions spent solely on schools when the
digitally connected families receive no support. Might it be time to employ a more collaborative
approach where the schools complement and add value to the contribution of the families?
Without going into detail, it bears reflecting on the distinguishing features of the learning
environment and digital education model, of both the digitally connected family and the school,
and asking what is the best way forward,
That of the families we know well. It has been built around the home’s warmth and support, and
the priority the parents attached to their children having a digital education that would improve
their education and life chances. The focus has always been on the child – the individual learner –
with the children from the outset being provided the current technology by their family and
empowered to use that technology largely unfettered.
150
Importantly the family as a small regulating unit, with direct responsibility for a small number of
children could readily trust each, and monitor, guide and value their learning from birth onwards,
assisting ensure each child had use of the current technology and that the use was wise and
balanced.
The learning occurred within a freewheeling, dynamic, market driven, naturally evolving
environment, anywhere, anytime, just in time and invariably in context. Those interested could
operate at the cutting edge and the depth desired. Very early on the young’s use of the digital was
normalised, with the learning occurring as a natural part of life, totally integrated, with no regard
for boundaries
The time available to the digitally connected family was – and continues to be – at least four/five
times greater than that in the school. It was to many seemingly chaotic, but also naturally evolving.
Very quickly the family learning environment became collaborative, socially networked, global in
its outlook, highly enjoyable and creative where the young believed anything was possible. By the
latter 2000’s most families had created – largely unwittingly – their own increasingly integrated
and sophisticated digital ecosystem, operating in the main on the personal mobile devices that
connected all in the family to all manner of other ecosystems globally.
The general feature of the school digital learning environment has been invariably one of unilateral
control, where the ICT experts controlled every facet of the technology and its teaching. They
chose, configured and controlled the use of both the hardware and software, invariably opting for
one device, one operating system and a standard suite of applications.
The students were taught within class groups, using highly structured, sequential, teacher directed,
regularly assessed instructional programs. The school knew best. The clients – the parents and
students – were expected to acquiesce. There was little or no recognition of the out of school
learning or technology or desire to collaborate with the digitally connected families. The teaching
was insular, inward looking, highly site fixated.
151
In reflecting on school’s teaching with the digital between 1993 and 2016 there was an all-
pervasive sense of constancy, continuity, with no real rush to change. There was little sense that
the schools were readying the total student body to thrive within in a rapidly evolving digitally
based world.
Significantly by 2016 only a relatively small proportion of schools globally were operating as
mature digital organisations, growing increasingly integrated, powerful higher order digitally
based ecosystems.
The reality was that while the learning environment of the digitally connected families evolved
naturally at pace that of most schools changed only little, with most schools struggling to
accommodate rapid digital evolution and transformation.
With the advantage of hindsight, it is quite remarkable how hidden the laissez faire model has
remained for twenty plus years, bearing in mind it has been employed globally since the advent of
the WWW.
For years, it was seen simply as a different, largely chaotic approach used by the kids – with the
focus being on the technological breakthroughs and the changing practices rather than on the
underlying model of learning that was being employed.
It wasn’t until the authors identified and documented the lead role of the digitally connected
families of the world did we appreciate all were using basically the same learning approach. The
pre-primary developments of the last few years affirmed the global application of the model. We
saw at play a natural model that was embraced by the diverse families of the world. All were using
the same model – a naturally evolving model where the parents were ‘letting things take their own
course ‘(OED).
The learning was highly individualized, with no controls other than the occasional parent nudge.
That said the learning was simultaneously highly collegial, with the young calling upon and
collaborating with their siblings, family members, peers and social networks when desired.
152
Interestingly from early on the young found themselves often knowing more about the technology
in some areas than their elders – experiencing what Tapscott (1998) termed an ‘inverted authority’
– being able to assist them use the technology. Each child was free to learn how to use, and apply
those aspects of the desired technologies they wanted, and to draw upon any resources or people
if needed.
In the process the children worldwide – from as young as two – directed their own learning, opting
usually for a discovery based approach, where the learning occurred anytime, anywhere 24/7/365.
Most of the learning was just in time, done in context and was current, relevant, highly appealing
and intrinsically motivating. Invariably it was highly integrated, with no thought given to old
boundaries – like was it educational, entertainment, communication, social science or history.
In contrast the school digital teaching model has always been highly structured and focused on
what the school or education authority ‘experts’ believed to be appropriate. Throughout the period
the teaching has been unilaterally controlled, directed by the classroom teacher, with the students
disempowered, distrusted and obliged to do as told.
The teaching built upon linear, sequential instructional programs where the digital education was
invariably treated like all other subjects, shoehorned into an already crowded curriculum and
continually assessed. Some authorities made the ‘subject’ compulsory, others made it optional.
The focus – in keeping with the other ‘subjects’ in the curriculum – was academic. There was little
interest in providing the young the digital understanding for everyday life. The teaching took place
within a cyber walled community, at the time determined by the teaching program. Increasingly
the course taught and assessed became dated and irrelevant.
In considering why the young and the digitally connected families of the world have embraced the
laissez faire model of digital education aside from the young’s innate curiosity and desire to learn
we might do well to examine the model of digital learning we have used over the last twenty plus
years and reflect on how closely it approximates that adopted by the young.Might they be
following that ancient practice of modelling the behaviour of their parents?
153
The way forward.
Near a quarter of a century on since the introduction of the WWW and an era of profound
technological and social change it is surely time for governments and educators globally to
Publicly recognise the remarkable success of the digitally connected families and the
laissez faire teaching model in the 24/7/365 digital education of both the children and the
wider family
Understand the digitally connected families are on trend to play an even greater lead role
Identify how best to support the family’s efforts without damaging the very successful
teaching model employed
Consider how best to enhance the educational contribution of all the digitally connected
families in the nation, including the educationally disadvantaged
Rethink the existing, somewhat questionable contribution of most schools and the
concept of schools as the sole provider of digital education for the young
Examine where scarce taxpayer monies can best be used to improve the digital education
in the networked world.
154
remain manual or semi-manual. It is a technique which has been well-studied and documented
(see Manuals of Photogrammetry, 2004; Henricsson and Baltsavias, 1997; Tao and Hu, 2001).
Active scanning techniques, such as laser and acoustic methods, have been an enormous
success in recent years because they can produce very dense and accurate 3D point clouds.
Applications that need terrain or seabed surfaces regularly make use of the 2.5D grids obtained
from airborne or acoustic points clouds. The integration of direct geo-referencing (using GPS
and inertial systems) into laser scanning technologies has given a further boost to 3D
modelling. Although extraction of height (depth) information is largely automated, complete
3D object reconstruction and textures (for visualisation) are often weak, and the amount of data
to be processed is huge (Maas and Vosselman, 1999; Wang and Schenk, 2000; Rottensteiner
et al 2005). Hybrid approaches overcome the disadvantages mentioned above by using
combinations of optical images, point cloud data and other data sources (e.g. existing maps or
GIS/CAD databases) (Tao, 2006). The combination of images, laser scanning point clouds and
existing GIS maps is considered to be the most successful approach to automatically creating
low resolution, photo-textured models. There are various promising studies and publications
focused on hybrid methods (Schwalbe et al, 2005; Pu and Vosselman, 2006) and even on
operational solutions. These approaches are generally more flexible, robust and successful but
require additional data sources, which may influence the quality of the model. In summary, 3D
data acquisition has become ubiquitous, fast and relatively cheap over the last decade.
However, the automation of 3D reconstruction remains a big challenge. There are various
approaches for 3D reconstruction from a diverse array of data sources, and each of them has
some limitations in producing fully automated, detailed models. However, as the cost of
sensors, platforms and processing hardware decreases, simultaneous and integrated 3D data
collection using multiple sensing technologies should allow for more effective and efficient
3D object reconstruction. Designing integrated sensor platforms, processing and integrating
sensors measurements and developing algorithms for 3D reconstruction are among topics
which should be addressed in the near future. Besides these, I expect several more general
issues to emerge:
155
specific requirements for detail and realism. Indeed, 3D reconstruction is closely related to the
application that uses the model, but such a chaotic creation of 3D models may become a major
bottleneck for mainstream use of 3D data in the very near future. Early attempts to specify
LOD are already being done by the CityGML team, but this work must be further tested and
refined (Döllner et al, 2006).
4. Change detection. Detection of changes is going to play a crucial role in the maintenance
and update of 3D models. Assuming that automated 3D acquisition mechanisms will be
available, the initial high costs of acquiring multiple data sources can be balanced and justified.
Changes can then be detected against existing data from previous periods or initial design
models (e.g. CAD). In both cases, robust and efficient 3D computational geometry algorithms
must be studied.
156
Three-dimensional GIS data incorporates and extra dimension—a z-value—into its definition
(x,y,z). Z-values have units of measurement and allow the storage and display of more information
than traditional 2D GIS data (x,y). Even though z-values are most often real-world elevation
values—such as the height above sealevel or geological depth—there is no rule that enforces this
methodology. Z-values can be used to represent many things, such as chemical concentrations, the
suitability of a location, or even purely representative values for hierarchies.There are two basic
types of 3D GIS data: feature data and surface data.
3D feature data
Feature data represents discrete objects, and the 3D information for each object is stored in the
feature's geometry.Three-dimensional feature data can support potentially many different z- values
for each x,y location. For example, a vertical line has an upper vertex and a lower vertex, each
with the same 2D coordinate, but each having different z-values. Another example of 3D feature
data would be a 3D multipatch building, whose roof, interior floors, and foundation would all
contain different z-values for the same 2D coordinate. Other 3D feature data, such as an aircraft's
3D position or a walking trail up a mountain, would only have a single z-value for each x,y
location.
Surface data
Surface data represents height values over an area, and the 3D information for each location within
that area can be either stored as cell values or deduced from a triangulated network of 3D
faces.Surface data is sometimes referred to as 2.5D data because it supports only a single z-value
for each x,y location. For example, the height above sealevel for the surface of the earth will only
return a single value.
Since 3D GIS data can be more difficult to create and maintain than 2D data, modeling your data
in three dimensions should only be done when the extra effort will add value to your work. While
some GIS features, such as aircraft locations or underground wells, naturally lend themselves to
being modeled in 3D, other data can be just as effective in 2D as in 3D. For
157
example, having a road network modeled in 3D might seem useful for investigating gradients, but
the additional effort to maintain z-values might outweigh the benefits.
These are some important considerations when deciding to model your data in 3D:
GIS data does not have to be modeled in 3D to be displayed inside a 3D view.
Height values from a surface can easily be added to 2D objects, when you need them,
through the use of geoprocessing tools.
If the source of your z-values is a surface, consider how often that underlying surface
changes. The more it changes, the less useful it will be to store z-values for features
generated against it.
If you decide to model some or all of your data in three dimensions, the most important decision
will be the units of the z-values. A solid understanding of what your z-values represent will be
critical when you start editing and maintaining them. A general rule to follow whenever possible
is that the z-units should match your x,y units. For example, if your data is in a (meter-based) UTM
zone, you should also model your z-values as meters. This will help you interact with the data in
an intuitive way, such as when you measure 3D distances or move objects in x, y, and z.
158
UNIT V APPLICATIONS 9
GIS Applicant - Natural Resource Management - Engineering - Navigation - Vehicle tracking and
fleet management - Marketing and Business applications - Case studies.
GIS is convergence of technological fields and traditional disciplines. GIS has been called an
"enabling technology" because of the potential it offers for the wide variety of disciplines dealing
with spatial data. Many related fields of study provide techniques which make up GIS. These
related fields emphasise data collections while GIS brings them together by emphasising
integration, modelling and analysis. Thus GIS often claims to be the science of spatial information.
Fig. 17.1 shows the technical and conceptual development of GIS.. The list of contributing
disciplines can be classified according to (1) Heritage (2) Data Collection (3) Data analysis (4)
Data Reporting .
An important distinction between GIS applications is whether the geographic phenomena studied
are man-made or natural. Clearly, setting up a cadastral information system, or using GIS for urban
planning purposes involves a study of man-made things mostly: the parcels, roads, sidewalks, and
at larger scale, suburbs and transportation routes are all man-made, those entities often have clear-
cut boundaries. On the other hand, geomorphologists, ecologists and soil scientists often have
natural phenomena as their study objects. They may be looking at rock formations, plate tectonics,
distribution of natural vegetation or soil units. Often, these entities do not have clear-cut
boundaries, and there exist transition zones where one vegetation type, for instance, is gradually
replaced by another. (de et al, 2001).
The applications of GIS include mapping locations, quantities and densities, finding distances and
mapping and monitoring change. Function of an Information system is to improve one’s ability to
make decisions. An Information system is a chain of operations starting from planning the
observation and collection of data, to store and analysis of the data, to the use of the derived
information in some decision making process. A GIS is an information system that is designed to
work with data referenced to spatial or geographic coordinates. GIS is both a database system with
specific capabilities for spatially referenced data, as well as a set of operation for working with
data. There are three basic types of GIS applications which might also represent stages of
development of a single GIS applications.
The major application of GIS in natural resource management is in confronting with environmental
issues like a flood, landslide, soil erosions, drought, earthquake etc. GIS in natural resource
management also address the current problems of climate change, habitat loss, population growth,
pollution etc. The solution to these problems is the application of GIS in natural resource
management. Yes, introduction to GIS has solved many problems related to the natural
environment. GIS is a powerful tool that is used in the management of natural resources. Some
applications of GIS in major fields are discussed below:
159
Hazard and risk assessment
GIS in natural resource management is used in the reduction of a natural hazard such as flood,
landslide, soil erosion, forest fires, earthquake, drought etc. One cannot totally stop these natural
disasters but can minimize these warnings by early planning, preparation, and strategies. GIS in
natural resource management is being used in analyzing, organizing, managing and monitoring the
natural hazards. GIS in natural resource management provides a spatial data of the disasters that
have taken place before or might to occur so that early risk can be prevented. It is indicated through
GIS-based map.
Risk Assessment
Change detection
GIS in natural resource management provides information about land area change between time
periods. The land change documents detected through satellite imagery or aerial photographs. It is
a useful application in land change, deforestation assessment, urbanization, habitat fragmentation
etc. The information obtained from GIS in natural resource management help to study the specific
area and monitoring can be done in and around the area. It is a way of studying the variations
taking place in landscape and managing the environment.
160
Change detection in land use
Natural resource inventory
Natural resource inventory is a statistical survey of the condition of natural resources. It provides
relevant information about the environment condition and policy including conservation program
that is obtained through GIS in natural resource management. The information through maps in
GIS provides the information of location and current resources.
Forest Inventory
Environmental Monitoring
GIS in natural resource management provides a graphical data that helps in monitoring the
environment. It determines the qualitative and quantitative data about the environment issues
161
such as pollution, land degradation, soil erosions etc. GIS in natural resource management detects
these problems and predicts the future hazards. Thus, GIS in natural resource management
monitors all these environment problems.
GIS application in Natural Resource Management:
GIS helps in the management of land providing resourceful data in doing construction works or
any agricultural works. It selects a suitable site before any change is done.
GIS in natural resource management is conserving wide range of biodiversity by the pre
information obtained through it. Many biological habitats are protected and further planning for
the protection of flora and fauna is promoted.
GIS in natural resource management provides hydrological data for the analysis of watershed
management and Watershed analysis.
GIS in water resource management is now extended in the use of mineral exploration in various
developed countries like USA, Canada and Australia.
Further applications are briefly pointed as below:
Facility management
Topographic analysis
Network analysis
Transportation modeling
Engineering design
Demographic analysis
Geo process modeling
Therefore, GIS is a suitable technology for the understanding of natural resource management. It
is an effective technique to learn the factors affecting environment including its result and
execution. The geo spatial data taken through this GIS meet the sustainable use of natural
resources. Thus, GIS in natural resource management guides in managing the resources properly
and wisely in present and future generation. In addition, GIS in natural resource management helps
in management of natural resources effectively and efficiently.
5.3 ENGINEERING
An advanced information system like GIS plays a vital role and serves as a complete platform in
every phase of infrastructure life cycle. Advancement and availability of technology has set new
marks for the professionals in the infrastructure development areas. Now more and more
professionals are seeking help of these technologically smart and improved information systems
like GIS for infrastructure development. Each and every phase of infrastructure life-cycle is greatly
affected and enhanced by the enrollment of GIS.
Planning: In planning its major contribution is to give us with an organized set of data
which can help professionals to combat complex scenarios relating to the selection of site,
environmental impact, study of ecosystem, managing risk regarding the use of natural
resources, sustainability issues, managing traffic congestion, routing of roads and pipelines
etc.
162
Data Collection: Precise and accurate data is the core driving factor of any successful
project. GIS is equipped with almost all those tools and functions that enables user to have
access to the required data within a reasonable time.
Analysis: Analysis is one of the major and most influential phases of infrastructure life
cycle. Analysis guides us about the validity or correctness of design or we can say that
analysis is a method which supports our design. Some of the analyses that can be performed
by GIS are:
Water distribution analysis
Traffic management analysis
Soil analysis
Site feasibility analysis
Environment impact analysis
Volume or Area analysis of catchment
River or canals pattern analysis
Temperature and humidity analysis
Construction: It is the stage when all layout plans and paper work design come into
existence in the real world. The GIS helps the professionals to understand the site conditions
that affect the schedule baseline and cost baseline. To keep the construction within budget
and schedule GIS guides us about how to utilize our resources on site efficiency by:
Timely usage of construction equipment.
Working Hours
Effects of seasonal fluctuations.
Optimizing routes for dumpers and concrete trucks
Earth filling and cutting
Calculation of volumes and areas of constructed phase thereby helping in Estimation
and Valuation.
Operations: Operations are controlled by modeling of site data and compared by the
baselines prepared in planning phase. Modeling of site may be in the form of raster images
or CAD drawings. These can help us to keep track of timely operations of activities.
GIS can help to make a record of work that has been completed and can give us visualization
in the form of thematic maps which will guide us about rate of operations, completed
operations and pending operations.
In short we can say that GIS will prove to be the foundation of next generation civil
engineering.
Web-based navigation maps encourage safe navigation in waterway. Ferry paths and shipping
routes are identified for the better routing. ArcGIS supports safe navigation system and provides
accurate topographic and hydrographic data. Recently DNR, s Coastal Resources Division began
163
the task of locating, documenting, and cataloging these no historic wrecks with GIS. This division
is providing public information that make citizens awareness of these vessel locations through web
map. The web map will be regularly updated to keep the boating public informed of these coastal
hazards to minimize risk of collision and injury.
5.5 VEHICLE TRACKING AND FLEET MANAGEMENT
The core of a Fleet Management Tracking system is a GNSS tracking system used in conjunction
with data transmission by means of the selected communications system, for instance
GSM or GPRS.
This combination of GNSS technology with GSM/GPRS wireless coverage, can keep track on the
position of all the resources, such as vehicles, personnel, assets, as well as incidents. This
information is sent to a server and can be visualized using a Geographic Information System (GIS),
where the location, stops, idling and distance covered by each item can be monitored. Many
systems keep the tracking data stored locally or centrally, which can be retrieved for further
analysis.
The GNSS unit is essential to identify the position of the vehicle. The tracking systems usually use
one of the following architectures, which always include a GNSS receiver:
Passive Tracking: The tracking system stores the vehicles location, through a GNSS receiver,
and other data, such as vehicle condition or container status. This data is stored and can be
collected and analyzed at the end of the trip.
Active Tracking: The tracking device obtains the vehicle location, through the GNSS
receiver, and sends it through a wireless communication system to a control center on regular
intervals or if certain condition are met.
Real-time, cellular network: The vehicle's locations and speed are transmitted periodically
over a GSM cellular network. The controller accesses to the information by logging on to the
vendor's website, which requires a monthly fee, or by receiving the data directly on a cell
phone, which requires a cell phone account. The positions of trucks or goods are updated every
few minutes, according with the system specification.
Real-time, satellite: The vehicle's data is transmitted through satellite to the vendor and the
controller accesses the data by logging on to the vendor's website. This method also requires a
monthly subscription fee.[5]
A fleet management tracking is constituted by the following components:
On Board Unit (OBU), which includes the GNSS receiver and other types of sensors to collect
the status of the vehicle and the cargo. This device will also have the ability to connect to a
central tracking server. The vehicle's information can include latitude, longitude, altitude,
computed odometer, door open or close, fuel amount, tire pressure, turnoff ignition, turn on
headlight, engine temperature, as well as cargo information and other vehicle's sensors.
164
Driver Console, most systems include a driver console where the driver can register shift
starts/end, route used, stops, pickups, dropoffs and other labor and business related information
that cannot be acquired automatically. This console can be also used to provide messaging or
warnings to the drivers. Warnings can be issued if the adequate procedures or schedule are not
being followed.
Central tracking server, which have the capability to receive, store and publish the tracking
data to an user interface, which usually encompasses a Geographic Information System.
Application Characterization
The main benefits of Fleet Management and Vehicle Tracking Applications are:
Improved operational efficiency of the vehicle fleet - Fleet Management provides businesses
with operational data of the fleet allowing the optimization and planning of the resources,
improving response times, increasing the number of services and using the most suitable
routes.
Improved customer care - Knowing were each vehicle of the fleet is at a given time allows
companies to be able to provide to its customers accurate information about the location and
expected arrival time of vehicles and/or goods transported in the vehicles.
Reduction of theft risk - In case of theft, the vehicle is easily locatable which makes it
possible to act immediately in order to recover it.
Facilitated Fleet Maintenance - Usually fleet management systems provide tools to plan the
vehicle maintenance based on the distances run providing alarms to inspections and
maintenance activities.
Enforcement of Transport Regulations - The transport of persons or goods normally follow
specific regulations such as forbidden areas (e.g. some areas are not allowed for vehicles
transporting dangerous goods), velocity limits, labor regulations (e.g. maximum number of
consecutive hours a driver can work). Fleet management systems allow companies to
guarantee that these regulations are being followed by their drivers.
Some of the sectors that use fleet management are:
Public Services Fleets - Fleets providing public services (e.g. waste collection, road
maintenance, taxi fleets, etc) use GNSS for the optimization of routes, planning of services
and determine closest responder.
Emergency and Assistance Fleets - Emergency and Assistance fleets use GNSS to determine
which is the vehicle most adequate to respond to a assistance request.
Car Rental Companies - Car rental companies use GNSS to determine closest available
vehicle for a client, to monitor mileage or area limits on rented vehicles and as anti-theft
system.
Goods Transportation and Distribution - Freight transportation companies use GNSS to
monitor the goods transportation, providing information to customers about their cargo and
determine closest vehicle for unscheduled pickups.
Sales Force Management - Companies with a mobile sales force can use GNSS to determine
the closest representative in case of unscheduled visits and to monitor their representatives
activity, mileage and work hours.
165
Hazardous Goods or Valuables Transportation - Hazardous goods or valuables
transportation companies are using GNSS to monitor in realtime the transported goods,
supporting alarms when the vehicle deviates from the scheduled route or violates
transportation regulations. Fleet management systems for these companies normally support
panic button functionalities that will send the position of the vehicle to the central tracking
server in case of emergency or theft.
Public Transportation - Public transportation operators are using GNSS to track the vehicle
fleet, to eventually reroute vehicles if needed and to provide information to the user.
The use of GNSS for Fleet Management and Vehicle Tracking in certain sectors has been driven
by transport regulations and policies. A specific example of the use of such systems is for
Livestock transportation in Europe which is detailed in the following section.
Tracking of Livestock Transportation
The application of satellite positioning for livestock traceability is becoming a general objective
to support livestock transportation policies. Regulation in Europe, requires an appropriate
navigation system allowing for recording and providing information equivalent to those required
in the journey log and information concerning opening/closing of the loading doors. It also requires
a temperature monitoring and recording system which alerts the driver when the temperature in
the animal compartment reaches the maximum of 30°C or the minimum of 5°C.
In livestock transportation, GNSS will permit to:
Localize and continuously track and trace the vehicles transporting livestock in order to
increase the efficiency of all activities related with livestock transportation.
Generate reports about sensors information such as temperature, loading doors information,
warning signals, etc. in order to improve the animal's welfare.
Optimal route calculation to specify the most suitable roads and hence, to ensure a fast and
safe delivery of the cargo.
Geofencing and alarming.
Recording of data for statistical and enforcement/governmental use.
Application Examples
The fleet management devices can can range from more simple devices without any interface with
the user or to devices that have graphical human-machine interfaces and some might have
interfaces with the vehicle's on-board diagnostics or other specific vehicle systems such as
temperature sensors, door opening sensors, etc. Although usually these systems are attached
permanently to the vehicle it is possible (although not usual) to use GNSS cell phones running
specific applications for fleet management. The devices normally used for fleet management are
usually called Vehicle Trackers and are described in more detail here.
These systems can be sold as a product where the on-board device is sold to the costumer and the
management of the fleet can be done by an application or server bundled with it. The complexity
of the management application can vary from a simple application that shows the position of the
vehicles and can generate reports based on the data received from the on-board devices to realtime
servers that can be customized supporting realtime alarms and that can provide complex services
such routing, planning and customized reporting. Alternatively some providers offer
166
these systems as a service where the equipment can be rented and the centralized services are
provided as a service. In some cases even communications costs are handled by the provider and
a monthly fee per vehicle monitored is charged to the costumer.
1. Banking: Being market driven banks need to provide customer centric services around planning
of resources and marketing. GIS plays an important role providing planning, organizing and
decision making.
2. Assets Management: GIS helps organizations to locate and store information about their assets.
Operations and maintenance staff can also deploy their enterprise and mobile workforce.
3. Dairy Industry: Geographic Information Systems are used in the distribution of products,
production rate, location of shops and their selling rate. These can also be monitored by using a
GIS system.
4. Tourism: Tourists can get all the information they need on a click, measuring distance, finding
hotels, restaurants and even navigate to their respective links. This Information plays a vital role
to tourists in planning their travel from one place to another.
5. Business: GIS is used for managing business information based on its location. GIS can keep
track of where customers are located, site business, target marketing campaigns, and optimize sales
territories and model retail spending patterns.
6. Market Share: Examining branch locations, competitor locations and demographic
characteristics to identify areas worthy of expansion or determine market share in Maptitude.
7. ATM Machine: Filling in market and service gaps by understanding where customers,
facilities, and competitors are with address locating, database management and query tools.
8. World Bank Economic Statistics: Slicing and dicing raw financial data from the World Bank.
9. Merger and Acquisitions: Profiling and finding opportunities to gain and build where
customers are with market profiling
10. Supply and Demand: Identifying under-served areas and analyzing your competitor's market.
11. Community Reinvestment Act (CRA): Fulfilling the obligations to loan in areas with
particular attention to low- and moderate-income households – using GIS to understand spatial
demographics.
12. Mobile Banking: Capturing locations where existing mobile transaction occur and assisting
in mobile security infrastructure.
13. Internet of Things: Improving efficiency, accuracy and economic benefit through a
network of physical objects such as devices, vehicles, buildings and other items—embedded with
electronics, software, sensors, and network connectivity that enables these objects to collect and
exchange information with one another.
14. Market Share Analysis: Optimizing the locations of facilities so the allocated demand is
maximized in the presence of competitors using tools like location-allocation in ArcGIS.
167
15. Integrated Freight Network Model: Integrating highly detailed information about shipping
costs, transfer costs, traffic volumes and network interconnectivity properties in a GIS-based
platform. (Integrated Freight Network Model)
1. Agriculture
Background
Identification, description and mapping of Rangeland sites to a scale of 1:250,000; estimation of
the present and potential grazing productivity and load; and presentation of recommendations for
sustainable management for each site.
Solution
RMSI created a comprehensive Remote Sensing and GIS-based database using satellite data,
thematic maps, floral species inventory data and biomass estimation complemented with field
expertise. The study provided ready-to-use maps for managers and decision/ policy makers to
ensure Sustained and Secured Rangelands. The database also provided qualitative & quantitative
information related to the benefits derived from Rangelands in terms of ecosystem services such
as available biomass for livestock and other uses by humans. ClientBenefit
The comprehensive Remote Sensing and GIS database with multiple layers of information served
as a key reference for evaluation, management and monitoring of Rangelands in the arid regions
of the Saudi Arabia. This in particular reflects the Kingdom’s commitment towards Climate
Change resilience by conserving Rangelands and its vegetation.
2. Forestry
Background
Realizing the need to increase the adaptive capacity of farming communities, the World Bank
commissioned RMSI to develop an application suite of a web and a mobile-phone application to
disseminate location-specific climate/ weather information, and related agro-advisories, which are
understandable to the farmers on real-time basis. The agro-weather tool disseminates vital weather-
forecast linked agro-advisories through SMS, IVRS, mobile app, and website for farmers to
better plan and manages weather risks, and maximizes productivity. Solution
RMSI experts developed web and mobile phone (i.e., IVRS, SMS, android app) based agro-
weather tool to disseminate weather forecast information and best-bet agronomic management
practices for the farmers in Ethiopia and Kenya. The tool was developed for the main cultivated
crops in Ada’a district of Ethiopia (chickpea, lentil, teff, and wheat) and in Embu district of Kenya
(bean, maize, sorghum, tea, and coffee) on a pilot basis. ClientBenefit
The key benefit is the availability of location-specific agro-advisories to farmers to minimize crop
losses, and practice climate-smart agriculture.
168
CONTENT BEYOND SYLLABUS
GIS in the Cloud using ArcGIS Online:
Esri cloud ecosystem allows you to access, create, edit, analyze and share maps, apps, and
geospatial data from anywhere in the world. While traditional GIS is installed on your desktop or
server, Cloud GIS makes use of the flexibility of the cloud environment for data capture,
visualization, analysis and sharing.
Cloud GIS
The cloud computing technology has revolutionized the way one works. Although GIS has been a
late adopter of the cloud technology, the many advantages are compelling organizations to shift
their geospatial functions to the cloud. Cloud-based tools are accessed for web-based geographic
information system. Data generated as maps are helping analyze and optimize operations in real-
time. Apps in the cloud are helping manage isolated silos of GIS workflows and geodatabases.
Thus, Cloud GIS could be defined as a next generation on-demand GIS technology that uses a
virtualized platform or infrastructure in a scalable elastic environment.
How does Cloud GIS work?
The cloud computing environment offers three base service models – Software-as-a-Service
(SaaS); Platform-as-a-Service (PaaS); and Infrastructure-as-a-Service (IaaS).
169
Cloud GIS Service Models
In the geospatial environment, the Cloud SaaS supports three other service models
GIS-as-a-Service (GaaS),
Applications-as-a-Service (AaaS)
Imagery-as-a-Service (IaaS), where ready-to-use GIS datasets are available as Data-as-a-
Service (DaaS)
These are accessed as private, public, hybrid or community cloud services, depending upon the
organization’s need for security, collaboration and ownership.
170
Key Benefits of GIS in the Cloud
Other Benefits of Cloud GIS
On demand service of online maps, geospatial data, imagery, computing or analysis
Large volumes of data handling, app management and geospatial analysis possible
Supports viewing, creating, monitoring, managing, analyzing and sharing maps and data
with other users
Facilitates inputs, validation and collaboration by a global mobile workforce in real time
As optimizing with spatio temporal principles is possible, it provides effective geospatial
validations and analysis
Managed services prevent data and work loss from frequent outages, minimizing
financial risks, while increasing efficiency
Competitive advantage – shorter time to share and publish maps, with always on always
available data / maps; and effective ROI
Choice of various deployment, service and business models to best suit organization
goals
Supports offerings of client-rich GIS software solutions as a software plus service model
– geocoding, mapping, routing, and more
Applications
Earth observation data
Citizen and social science
Road infrastructure projects
Mobile data collection and integration
171
Traffic management
E-commerce and geo-targeted advertising
Geo-referenced Weather Service
Crime analysis
Web mapping
Research
Public safety and emergency response
Case Studies
Education and Research in the GIS Cloud, using ArcGIS Online and Mango Map
This application was constructed using the map services and application templates on ArcGIS
Online.
172
Relationship between rates in obesity, diabetes, and percentage of people on restricted sugar diet
(US).
Another user-contributed initiative makes use of the Mango Map interactive web map platform.
173
Deforestation in Cambodia 1976 – 2006
Transport Departments making use of the GIS Cloud – Maryland, Idaho, Utah
The Maryland Department of Transportation has deployed a four level cloud-based model. The
first is a hybrid application (MUTTS) which coordinates and tracks responses of construction work
and excavations. The second model uses a hybrid cloud configuration of interactive mobile
application for travelling truck drivers, highway motorists and cyclists. The third focuses on
integration with the interagency mapping and GIS data portal MD iMap. The fourth deployment
is a private cloud integrating the enterprise GIS of MDTA for access of staff members only. The
whole system seamlessly blends the functionality of Esri, with data and applications stored within
the MDOT’s firewall.
Security and Crime management in the Cloud
The US Department of Homeland Security uses ArcGIS cloud products to access and share critical
data for protection of US citizens – in aviation, border security, emergency response,
cybersecurity, and chemical facility inspection.
The Ogden Police Department operates a real-time crime center for a 24 hour support using
ArcGIS cloud services.
Cloud GIS Products and Vendors
ArcGIS Business Analyst Online – on-demand reports and maps for informed decision.
174
ArcGIS Online. –creation of interactive web maps and apps, shared with anyone, on any device.
Explorer for ArcGIS – access, author, share maps from any device.
ArcGIS Server on Amazon EC2 – deploy ArcGIS Server and use enterprise geodatabases on
Amazon EC2.
Vendors – GIS Cloud, Mango Map, MapBox, Map2Net, MapInfo Stratus, ThunderMaps, GIS
Direct, Spatial Vision, Interroute, Aerometric, and more.
175