Unit - 5.3 - Data Analysis (1)
Unit - 5.3 - Data Analysis (1)
3 – Data Analysis
• Spatial data analysis
• Non spatial data analysis
• Manipulation – spatial interpolation
• Data retrieval – Reclassification techniques
• Buffer analysis
• Vector and Topological Overlay analysis
• Raster overlay analysis
• Measurement – query
• Record modeling and expert system
Spatial data analysis
Data analysis involves operations with geographic data and their attributes to
obtain derived information, generate query, statistics etc. broad categories and
operations therein are as follows.
• Data retrieval involves the capability to easily select data for graphic or
attribute editing, updating, querying, analysis and/or display. The ability to
retrieve data is based on the unique structure of the DBMS and command
interfaces are commonly provided with the software.
• Reclassifying attributes is the technique in GIS and other database
software of creating a new categorical attribute in a dataset by classifying
features based on existing attributes or other criteria, such as location. The
uses of reclassification include quickly updating cells when new
information is available, compiling data for suitability analyses, and
eliminating unneeded information by reclassifying cells as No Data. The
goal is often to simplify the output data in order to aid the interpretation.
Reclassification can also be performed on multiple layers as part of overlay
operation (described later).
• It can also be used for generalization for map simplification. In the vector
model, the process is known as map dissolve involving elimination of
boundaries.
Contd.,
• The reclassification functions reclassify or change cell values
to alternative values using a variety of methods. You can
reclass one value at a time or groups of values at once using
alternative fields; based on a criteria, such as specified
intervals (for example, group the values into 10 intervals); or
by area (for example, group the values into 10 groups
containing the same number of cells). The functions are
designed to allow you to easily change many values on an
input raster to desired, specified, or alternative values.
• This operation assigns new values to the existing values in a
map. This assignment may be a function of initial values, size
and shape of spatial configuration. Some times the data may
not be compatible with the user need or for further analysis.
Data may be at different resolution than needed by the user
Uses of reclassification
• Updating new information
• Compiling data for suitability analysis
• Eliminating unneeded information by reclassifying cells as No Data
1. Quantitative data – Quantitative data includes data that has
measurable values. In terms of GIS, some examples of raster
data that is quantitative could include the following:
precipitation, population density, temperature, etc. One of the
most classic examples of reclassifying attributes is regrouping
large amounts of data (say the numbers 1-100) to be
represented on the map by only 5 symbols (for example, the
number 1 on the map would refer to those cells with data
between 1 and 20).
Contd.,
2. Nominal data - Reclassification can also be used on nominal fields. For
example, a GIS user may wish to take five road types (interstate,
highways, main roads, collectors, and neighborhood roads) and reduce
them to two types by reclassifying interstates, highways, and main roads
as 'major streets' and collectors and neighborhood roads as 'local streets'.
In addition to simplifying attributes, the reclassify tool can be used to
assign values of sensitivity, priority or preference to a raster. The reclassify
tool can change nominal values (values that represent a class) to ratio or
interval data so that it can relate to other data on a common scale. In a
suitability analysis, data is usually reclassified to a scale of 1 to 10, giving
higher values to the more suitable areas.
Contd.,
3. Raster Data – Reclassifying attributes can be especially helpful in analyzing
raster attribute data, which is often interval or ratio data that is
continuous. This is easily done by assigning values to bins or ranges. For
example, a Digital Elevation Model that has unique elevation values for
each cell can be reclassified and symbolized to show specific elevation
ranges i.e., 1,000-1,200 feet.
Setback (Inward
buffer)
Contd.,
Buffer Analysis is a basic GIS spatial operation. It automatically builds zones with a
certain width around point, line, or region geometric objects according to a specified
buffer distance. For example, in an environmental protection project, a zone can be
drawn to include areas within a certain distance of a polluted river to represent the
contamination area; a zone with a certain size can be drawn around an airport to define
a non-residential area for public health concerns.
• Point buffer analysis – Builds buffer for point objects. Example, buffers can be
build for two radio broadcasting stations to analyze the residential area that can be
covered by the signal transmitted from each of the two stations as well as the area
covered by signals from both stations.
• Point Multi-buffer analysis – Builds multi-buffers for point objects. Example, zones
with different radii can be created around a pollution source to represent the
diffusion process of the pollutant
Clip
Split
Overlay Operations
Identity
Erase
Input features Overlay features Operation Result
Identity
Intersect
Symmetrical difference
Union
Update
Network Analysis
It is a type of line analysis which involves set of interconnected lines.
Railways, highways, transportation routes, rivers etc are examples of
networks. Network analysis can be used for the following:
• Address Geocoding - It is the process of estimating the locations of
addresses in GIS coordinate system. It requires a table of addresses and
theme that contains attributes that can be used to match to the table of
addresses.
Contd.,
• Imagine that the fire department is reported about a fire in a building at
1000 West Main Street. To estimate the location, GIS determines the arc by
matching its name, type and suffix. Once the arc is determined the address
can be estimated using linear interpolation. The arc corresponding to West
Main Street is the Arc 01. The address is an even number and lies on the
left side of the arc. The left side has addresses ranging from 100 to 1300
(range is 1200). The length of the arc is 2000 meters. The address of 1000
can be geocoded as
Contd.,
• Optimal Routing - Optimal routing is the process of finding out the best
route to go from one location to another location. The most common path
finding algorithm is Dijkstra algorithm which was published by E.W.
Dijkstra in 1956. It is a graph search algorithm that provides the shortest
path for a single source shortest path problem. Optimal routing is the
process of finding out the best route to go from one location to another
location. The most common path finding algorithm is Dijkstra algorithm
which was published by E.W. Dijkstra in 1956. It is a graph search
algorithm that provides the shortest path for a single source shortest path
problem.
We build two tables, one for the nodes that have been already processed and the other for the
adjacent nodes which are to be processed. We begin with Node A as follows:
Final Output
Node E is the destination so we stop here. The quickest route to reach node E takes
40 minutes and it is Node A Node B Node D Node E
Contd.,
• Finding closest facilities - Sometimes we try to find out a point closest to a given location.
The point is called a facility and the given location is called an event location. Finding which
flat would be near to the working place, which fire station has the best response time to a
report to the fire location, which houses are close to the schools are examples of optimal
routing for closest facilities. In the illustration, imagine that the office (red star) is located at
some address and we want to know the closest place around the office where an employee can
live. The address of the office is geocoded to a street location and then optimal path can be
computed from each house to the office. House 4 is closest to the office as the travel time is
least for it.
Functions Description
Union is a fundamental overlay operation. Performs the spatial or geometric combination of two spatial
data sets to generate a new integrated output. Union also performs attribute combination and carries
UNION
forward the attributes of the two input datasets into the output. All spatial features are combined into the
new output theme.
Intersect is also an overlay operation where the geometry or spatial combination of two input themes is
INTERSECT done on a selective basis based upon the commonality of the spatial features. Intersect also carries
forward the attributes of the two input themes into the output theme.
CLIPPING is a variant of the overlay operation where features in an input theme are either erased or
CLIPPING preserved based upon the spatial extent of another theme. Thus theme features which fall inside the
outermost bound of the CLIP theme are removed or preserved.
Generates a buffer region around points, lines or polygons. Useful for corridor generation analysis. The
BUFFER distance for buffering needs to be specified. For polygon features, buffering could be done either
inwards or outwards.
Aggregates or merges polygon features based upon the commonality of a specified attribute value.
AGGREGATION
Mainly a map aggregation function.
Merges polygon features based upon an attribute specification. This is mainly useful where sliver
SLIVER
polygons generated during union process are to be eliminated.
Provides facility to transform spatial data from one co-ordinate system to another coordinate system. The
TRANSFORM
transformation is set based on the specification of common Registration Points in the spatial data.
APPEND Allows for appending spatial data which are adjacent and need to be mosaicked.
3D ANALYSIS Slope generation, aspect generation, viewshed, perspectives etc are some of the analysis functions.
PATH determination, resource allocation, facility location etc are some of the modeling functions for
NETWORK
NETWORK data.
Raster overlay analysis
• In raster overlay, each cell of each layer
references the same geographic location. That
makes it well suited to combining characteristics
for numerous layers into a single layer. Usually,
numeric values are assigned to each
characteristic, allowing you to mathematically
combine the layers and assign a new value to
each cell in the output layer.
• An example of raster overlay by addition is
shown in figure. Two input rasters are added
together to create an output raster with the
values for each cell summed. This approach is
often used to rank attribute values by suitability
or risk, then add them to produce an overall rank
for each cell. The various layers can also be
assigned a relative importance to create a
weighted ranking (the ranks in each layer are
multiplied by that layer's weight value before
being summed with the other layers).
Contd.,
• Below is an example of raster overlay by addition for suitability modeling.
Three raster layers (steep slopes, soils, and vegetation) are ranked for
development suitability on a scale of 1 to 7. When the layers are added
(bottom), each cell is ranked on a scale of 3 to 21.
Raster overlay tools
Raster overlay tools are located in several toolsets in the Spatial Analyst
toolbox. Spatial Analyst is an ArcGIS extension that is licensed separately. If
your site has a Spatial Analyst license and the Spatial Analyst extension has
been installed, you will have access to the Spatial Analyst toolbox in Arc
Toolbox.