An Introduction To Geographical Information Systems: Training Manual
An Introduction To Geographical Information Systems: Training Manual
An Introduction to
Geographical Information Systems
Training Manual
Emily Schmidt, Helina Tilahun,
Mekamu Kedir, and Hailu Shiferaw
December 2011
In IDW, the weight, λi, depends solely on the distance to the prediction location. However, with the
kriging method, the weights are based not only on the distance between the measured points and the
prediction location but also on the overall spatial arrangement of the measured points. To use the
spatial arrangement in the weights, the spatial autocorrelation must be quantified. Thus, in ordinary
kriging, the weight, λi, depends on a fitted model to the measured points, the distance to the
prediction location, and the spatial relationships among the measured values around the prediction
1. It creates the variograms and covariance functions to estimate the statistical dependence
(called spatial autocorrelation) values that depend on the model of autocorrelation (fitting a
model).
2. It predicts the unknown values (making a prediction).
It is because of these two distinct tasks that it has been said that kriging uses the data twice: the first
time to estimate the spatial autocorrelation of the data and the second to make the predictions.
Variography
Fitting a model, or spatial modeling, is also known as structural analysis, or variography. In spatial
modeling of the structure of the measured points, you begin with a graph of the empirical
semivariogram, computed with the following equation for all pairs of locations separated by distance
h:
2
Semivariogram(distanceh) = 0.5 * average{(valuei – valuej} ]
The formula involves calculating the difference squared between the values of the paired locations.
The image below shows the pairing of one point (the red point) with all other measured locations. This
process continues for each measured point.
Semivariogram models
ArcGIS Spatial Analyst provides the following functions from which to choose for modeling the
empirical semivariogram:
The selected model influences the prediction of the unknown values, particularly when the shape of
the curve near the origin differs significantly. The steeper the curve near the origin, the more influence
the closest neighbors will have on the prediction. As a result, the output surface will be less smooth.
Each model is designed to fit different types of phenomena more accurately.
The diagrams below show two common models and identify how the functions differ:
Making a prediction
After you have uncovered the dependence or autocorrelation in your data (see Variography section
above) and have finished with the first use of the data—using the spatial information in the data to
compute distances and model the spatial autocorrelation—you can make a prediction using the fitted
model. Thereafter, the empirical semivariogram is set aside.
You can now use the data to make predictions. Like IDW interpolation, kriging forms weights from
surrounding measured values to predict unmeasured locations. As with IDW interpolation, the
measured values closest to the unmeasured locations have the most influence. However, the kriging
weights for the surrounding measured points are more sophisticated than those of IDW. IDW uses a
simple algorithm based on distance, but kriging weights come from a semivariogram that was
developed by looking at the spatial nature of the data. To create a continuous surface of the
phenomenon, predictions are made for each location, or cell centers, in the study area based on the
semivariogram and the spatial arrangement of measured values that are nearby.
Kriging methods
There are two kriging methods: ordinary and universal. Ordinary kriging is the most general and
widely used of the kriging methods and is the default. It assumes the constant mean is unknown. This
is a reasonable assumption unless there is a scientific reason to reject it.
Universal kriging assumes that there is an overriding trend in the data—for example, a prevailing
wind—and it can be modeled by a deterministic function, a polynomial. This polynomial is subtracted
from the original measured points, and the autocorrelation is modeled from the random errors. Once
the model is fit to the random errors and before making a prediction, the polynomial is added back to
the predictions to give meaningful results. Universal kriging should only be used when you know there
is a trend in your data and you can give a scientific justification to describe it.
After interpolation
8. You may get a warning message stating that your ―Table Does Not Have Object-ID Field‖. This is
a unique identifier that ArcGIS builds into all of its shapefiles. Press OK and ArcGIS will create
this field for you.
9. Now, can you see your ―Rf_July‖ points? Where are they? Right click on the ―Rf_July.csv
Events‖ layer, and from the menu choose ―Zoom to layer‖. The ―Rf_July‖ layer should now be
visible. If the ―Regions‖ layer is not visible, please check the projection of the weather station
points (in this case follow the STEPS 10 to 18, otherwise go to STEP 19.
11. Go to the main tool bar, and select the ―Zoom to full Extent‖ button
12. You should now see the entire ―Regions‖ layer incomplete overlap each other. If you use the
regular zoom tool , and zoom repeatedly into these layers, you will realize that it is in fact the
―Rf_July.csv Events‖ layer. As it is in a different projection, it is unable to locate and resize itself
correctly in relation to the ―Regions‖ Layer.
13. If we take the assumption that this information was collected by GPS, then reverting to the default
coordinate system used by GPS will correct this issue.
14. So let‘s try our hypothesis! First, Right click on the ―Rf_July.csv Events‖ you select Remove.
15. Now, right click on the ―Rf_July.csv‖ layer. Left click on the ―Display XY Data‖.
16. As you can see, the coordinate system is unknown. Click on the ―Edit‖ button. In the next
window, click the ―Select‖ button and choose the following path:
Use either of the following projection according to your boundary projection
Geographic Coordinate Systems Projected Coordinate Systems
World UTM
WGS1984 WGS 1984
WGS 1984 UTM Zone 37N.prj
19. Your Rainfall measurement points (weather stations points) should be geographically contiguous
with your ―Regions‖ layer.
20. Your ―Rf_July.csv Events‖ is currently only a cosmetic layer. We know this because it has the
word ―Events‖ following the name. It is not yet a shapefile.
21. To create a permanent shapefile from this cosmetic layer, right click on the ―Rf_July.csv Events‖,
scroll down to ―Data‖ and select the ―Export Data‖ option.
22. Leave all the initial options as default, but make sure to save the final file to your working
directory, calling the file ―Rf_July_ETh.shp‖
23. A pop-up window will ask you if you would like to ―Add to map‖. Select OK and the new shapefile
should automatically add to the data frame.
―Start‖ menu, point to ―All Programs‖, point to ―ArcGIS‖ and select ―ArcMap‖)
2. You may receive the following welcome screen, if so, select ―a new empty map,‖ and press OK. If
you do not receive this screen, ArcMap has selected a blank map by default.
3. You are now looking at the basic ArcMap screen with its various menus and tools. To begin with,
we will add rainfall data. From the ―File‖ menu, select ―Add Data‖.
From you working directory add ―Regions‖ and ―Rf_July_ETh‖ files.
4. From the standard menu click ArcToolbox window and double click the Spatial Analyst Tools
from the ArcTool Box Window and you will find a number of functionality tools and when you
double click the Interpolation tool you will find IDW, Kriging, Natural Neighbor, Spline,…Trend
Fixed
Uses a specified fixed distance within which all input
points will be used for the interpolation. Distance—
Specifies the distance as a radius within which input
sample points will be used to perform the interpolation.
16. Click the Environments tab located at the bottom of the kriging window (as shown above figure).
The Environments helps to set up Cell Size, Current Workspace, Output Coordinate System,
Extent, Scratch Workspace, and more.
17. Click the General Setting tab and click the drop down box under Extent and you will find different
extent setting options as
Default-No extent set
Union of inputs- The maximum area of all input(s)
Intersection of inputs-The minimum area common to all input(s)
As specified below (you need to specify the Left, right, bottom and top value of X and Y)
Same as display- the extent of the current ArcMap display wil be used
Same as layer <layer>-The extent will be based on the extent of the specified layer
18. For this particular exercise choose the Same as Regions and click Ok the Environments Setting
window
Following the above steps, you are expected to create rainfall grids for the remaining months
(January-December), and also for temperature datasets.
Once you have 10-58 rainfall raster grids follow the following step to clip out the grid by the
country boundary and create a lay out for presentation.
Repeat the above step to clip the rest of rainfall grid files.
9. Double-click on the layer name ―Rf_clipped‖; this will take you directly to the ―Layer Properties‖ for
the layer. Select the tab ―Symbology‖.
10. In the ―Show‖ box (see graphic), click on the option ―Classified‖, and next click on the ―Classify‖
tab. This will take you to the ―Classification‖ drop down menu (see graphic below).
15. Now, if you want to present more than one map per page you
can insert new data frame using ―Insert‖ menu and resize it
to fit page setup and map orientation.
16. The classification scheme you set for the rainfall data can be
saved as a layer file that can be used for similar use.
When you save a layer to disk, you save everything about
the layer, such as the symbolization and labeling. When you
import/add a layer file to another map, it will draw exactly as
it was saved.
One of the main features of a layer is that it can exist outside
your map as a file on disk. This makes it easy for others to
access the layers you've built.
Right click the rain98_clip layer from the table of content and
click Save As Layer File… option and type appropriate name
and save the layer in your working directory.
Overview
Zonal Statistics are a way of summarizing the information in a raster map layer using the boundaries
of zones in a second map layer. For example, we might have a raster grid depicting the rainfall and
have a second map layer depicting Woreda/Zone/Kebele boundaries. Zonal Statistics can be used to
calculate the average rainfall in each of the administrative boundaries (woreda, kebele or zone).
These boundaries can be stored in either raster or vector format - Zonal Statistics will work with either
format for the zones.
Following parameters can be calculated (per operation, only one parameter can be calculated, to get
several parameters it is necessary to save obtained table in a separate file):
sum - sum of all values
min - minimum value of all values
max - maximum value of all values
count - number of value (number of pixels inside zone, excluding pixels with NODATA value)
area - area covered by pixels with value different than NODATA
range - values range
std - standard deviation
mean - average value (NODATA pixels, they are not used in calculation)
median - median value (NODATA pixels, they are not used in calculation)
majority
minority
variety
8. The data file name of ea01011 means east Africa 2001, January and first decadal (January 1-10).
9. ArcToolboxspatial Analyst Tools Map Algebra Raster calculator
18. Now, you can compare the annual rainfall either for long term or for annual rainfall estimation from
satellite data and weather station data. To do this, you can pick the points of RFE data by taking
sample statistics from which weather stations are found.
19. Locate the mean annual rainfall
from RFE in the input raster, and
also locate Weather station
points as input point feature, and
specify the output table, and click
OK.
20. Link the output table to the feature
points by creating common Id
(RowId for the new table, and
new Id by calculating FID+1 for
the point feature ID).
21. Compare the figures obtained
from the RFE and weather
stations; and reason out if there
are differences on the same
point.
13. When complete, open the attribute table of the ―Zone_Cereal_Area‖ layer to make sure the join
was successful. As you will see, some of the fields will say <null>, this is okay for this specific
join because data were not collected for these specific zones.
14. Before continuing, save your map to your Lab12 folder as ―Lab12_YOURNAME‖.
15. Double click on ―Zone_Cereal_Area‖. The ―Layer Properties‖ Window should now pop-up. If the
tab ―Symbology‖ is not selected, then select that tab.
20. Reopen the ―Layer Properties‖ dialog for the ―Zone_Cereal_Area‖ layer. Return to the symbology
tab. Under the Classification menu (top right-hand corner), Click the ―Classify‖ button.
21. In the ―Classificaton Wizard‖ you will see a histogram illustrating the data distrubution along the
number line.
22. In the ―Method‖ drop down list, you will see several classification alernatives to the ―Natural
Breaks‖ system. You also have the opportunity to change the number of classes that you use.
23. Experiment with the different classification schemes, and look at how they alter the classification
breaks (blue lines) on the histogram data. By clicking OK on both wizards, you will see the effect
of your class scheme changes on the map itself.
24. Return to the ―Classification Wizard‖ screen. Choose first the ―Quantiles‖ scheme; next change
the number of classes to 7.
25. In the ―Break Values‖ box to the right hand side of the wizard, set the
break values to the following numbers, by simply typing over the
existing values.
26. When done, click OK. Now you are back at the ―Symbology‖ window. The ―Label‖ side of the
menu will reflect the changes that you make to the ―Range‖ side, but you may also use text in
your labels.
27. There are a number of Zones with ―Null‖ values. (You should always account for these in your
mapping). Given that we have only done a Table Join, and we haven‘t exported our data as a
new shapefile, our values reflect ―Null‖, we will show these by adding another layer.
31. For a softer more subtle style, you will remove the
boundaries from between the individual zones. Click on the
word ―Symbol‖ above the colored category symbols, and in
the pop-up menu, choose ―Properties for all symbols‖. (see
right)
32. In the ―Symbol Selector‖ dialog, change the outline color to ―No color‖.
33. Click OK.
34. To distinguish the boundaries of the higher order administrative units, add the ―Regions‖ layer
from the Lab01 folder, and symbolize as hollow with an appropriate outline thickness.
Remember how to do this? (hint: double click on colored box symbol below the layer name in
the Table of Contents)
window.
40. Now in all four data frames you should have map portraying the total area covered by cereals in
2007, shown in the figure below.
41. To show which year‘s cereals coverage data is shown in which map
window, go to ―Insert‖ menu and click on ―Text‖ as indicated beside.
43. What we want to do is to integrate and show four years (2007, 2008, 2009, & 2011) of zonal
tabular data about the area covered by cereals. However, all four maps in the above figure show
the total area covered by cereals in 2007. To compare change in area covered by cereals over
the years, you need to join 2008, 2009, & 2011 zonal tabular data of area covered by cereals to
―Zone_Cereal_Area‖ shapefile.
44. In the Lab12 folder you would find ―Zonal_Cereal_2008‖, ―Zonal_Cereal_2009‖, &
―Zonal_Cereal_2011‖ Excel files. These Excels contain cereal coverage data for 2008, 2009, &
2011, respectively.
45. Following the above practiced steps you followed to join 2007 cereal area coverage data, save
each of these Excels in .csv format and join each table with ―Zone_Cereal_Area‖ in each of the
three data frames.
―Zonal_Cereal_2008.csv‖ should be joined with ―Zone_Cereal_Area‖ beside 2008;
―Zonal_Cereal_2009.csv‖ should be joined with ―Zone_Cereal_Area‖ beside 2009;
―Zonal_Cereal_2011.csv‖ should be joined with ―Zone_Cereal_Area‖ beside 2011.
46. Using the ―Insert‖ menu, add a legend, and other appropriate cartographic elements to your map
(North Arrow, and simple scale bar).
47. Based on your expertise, did the maps show recognizable patterns? Did the zones you know to
have large cereal coverage stand out in these maps? What about zones you know to have low
area covered by cereals? Are there zones for which you think the data needs re-check?