PYDS 3150713 Unit-4
PYDS 3150713 Unit-4
DATA VISUALIZATION
OUTLINE :
V ISU A LI ZI N G I N F O R M ATI O N : S TA RTI N G W I T H A
G RA P H , D EF I N I N G TH E P L O T, D RAW I N G
MU LTI P LE L I N E S A N D PL O T S, SAV I N G Y O U R
W O R K TO D I S K , S E TT I N G TH E A X I S, T I CK S ,
G RI D S , G ET TI N G T H E A X ES , F O R MATT I N G TH E
A X ES , A D D I N G G R I D S , D E F I N I N G T H E LI N E
A PP EA RA N CE , W O R K I N G W I TH L I N E S T Y L E,
U SI N G CO L O R S, A D D I N G M A RK ER S, U SI N G
L A BE L S, A N N O TAT I O N S, A N D L EG EN D S, A D D I N G
L A BE L S, A N N O TAT I N G TH E C H A RT, CR EATI N G A
L EG EN D .
Starting with a Graph
•A graph or chart is simply a visual representation of numeric
data. MatPlotLib makes a large number of graph and chart
types available to you. Of course, you can choose any of the
common graph and graph types such as bar charts, line
graphs, or pie charts.
•You also have access to a huge number of statistical plot
types, such as boxplots, error bar charts, and histograms.
Defining the plot
Plots show graphically what you’ve defined numerically. To define a plot, you
need some values, the matplotlib.pyplot module, and an idea of what you want
to display, as shown in the following code.
In this case, the code tells the plt.plot() function to create a plot using x-axis
values between 1 and 11 and y-axis values as they appear in values. Calling
plot.show() displays the plot in a separate dialog box
Defining the plot
Output
Drawing multiple lines and plots
You encounter many situations in which you must use multiple plot lines, such
as when comparing two sets of values. To create such plots using MatPlotLib,
you simply call plt.plot() multiple times — once for each plot line.
The line graphs are set to be of different colors so that you can tell them apart.
Drawing multiple lines and plots
Output
Saving your work to disk
Jupyter Notebook makes it easy to include your graphs within the notebooks
you create, so that you can define reports that everyone can easily understand.
When you do need to save a copy of your work to disk for later reference or to
use it as part of a larger report, you save the graphic programmatically using the
plt.savefig() function.
In this case, you must provide a minimum of two inputs. The first input is the
filename. You may optionally include a path for saving the file. The second input
is the file format. In this case, the example saves the file in Portable Network
Graphic (PNG) format, but you have other options: Portable Document Format
(PDF), Postscript (PS), Encapsulated Postscript (EPS), and Scalable Vector
Graphics (SVG).
Saving your work to disk
Setting the Axis, Ticks, Grids
It’s hard to know what the data actually means unless you provide a unit of
measure or at least some means of performing comparisons. The use of axes,
ticks, and grids make it possible to illustrate graphically the relative size of data
elements so that the viewer gains an appreciation of comparative measure.
You won’t use these features with every graphic, and you may employ the
features differently based on viewer needs, but it’s important to know that
these features exist and how you can use them to help document your data
within the graphic environment.
Getting the axes
In many cases, you can allow MatPlotLib to perform any required formatting for
you. However, sometimes you need to obtain access to the axes and format
them manually. The following code shows how to obtain access to the axes for a
plot:
Formatting the axes
Simply displaying the axes won’t be enough in many cases. You want to change
the way MatPlotLib displays them. For example, you may not want the highest
value t to reach to the top of the graph. The following example shows just a
small number of tasks you can perform after you have access to the axes.
In this case, the set_xlim() and set_ylim() calls change the axes limits — the
length of each axis. The set_xticks() and set_yticks() calls change the ticks used
to display data.
Formatting the axes
Output
Adding grids
Grid lines enable you to see the precise value of each element of a graph. You
can more quickly determine both the x and y coordinates, which allow you to
perform comparisons of individual points with greater ease.
All you really need to do is call the grid() function. As with many other
MatPlotLib functions, you can add parameters to create the grid precisely as you
want to see it. For example, you can choose whether to add the x grid lines, y
grid lines, or both.
Adding grids
Working with line styles
Line styles help differentiate graphs by drawing the lines in various ways. Using a
unique presentation for each line helps you distinguish each line so that you can
call it out (even when the printout is in shades of gray). You could also call out a
particular line graph by using a different line style for it (and using the same
style for the other lines).
Working with line styles
Using colors
Color is another way in which to differentiate line graphs. Of course, this method
has certain problems. The most significant problem occurs when someone
makes a black-and-white copy of your colored graph — hiding the color
differences as shades of gray. Another problem is that someone with color
blindness may not be able to tell one line from the other.
Adding markers
Markers add a special symbol to each data point in a line graph. Unlike line style
and color, markers tend to be a little less susceptible to accessibility and printing
issues. Even when the specific marker isn’t clear, people can usually differentiate
one marker from the other
Adding markers
Adding markers
Using Labels, Annotations, and Legends
Label: Provides positive identification of a particular data element or grouping.
The purpose is to make it easy for the viewer to know the name or kind of data
illustrated.
Annotation: Augments the information the viewer can immediately see about
the data with notes, sources, or other useful information. In contrast to a label,
the purpose of annotation is to help extend the viewer’s knowledge of the data
rather than simply identify it.
Legend: Presents a listing of the data groups within the graph and often provides
cues (such as line type or color) to make identification of the data group easier.
For example, all the red points may belong to group A, while all the blue points
may belong to group B.
Adding labels
Labels help people understand the significance of each axis of any graph you
create. Without labels, the values portrayed don’t have any significance. In
addition to a moniker, such as rainfall, you can also add units of measure, such
as inches or centimeters, so that your audience knows how to interpret the data
shown.
Adding labels
Annotating the chart
You use annotation to draw special attention to points of interest on a graph. For example, you
may want to point out that a specific data point is outside the usual range expected for a
particular dataset.
Annotating the chart
Creating a legend
A legend documents the individual elements of a plot. Each line is presented in a
table that contains a label for it so that people can differentiate between each
line. For example, one line may represent sales in 2017 and another line may
represent sales in 2018, so you include an entry in the legend for each line that
is labeled 2017 and 2018.
Creating a legend
Visualizing the Data
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
spread = 100 * np.random.rand(100)
center = np.ones(50) * 50
flier_high = 100 * np.random.rand(10) + 100
flier_low = -100 * np.random.rand(10)
data = np.concatenate((spread, center, flier_high, flier_low))
plt.boxplot(data, sym='gx', widths=.75, notch=True)
plt.show()
Seeing data patterns using scatterplots
Scatterplots show clusters of data rather than trends (as with line graphs) or discrete
values (as with bar charts). The purpose of a scatterplot is to help you see data patterns.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x1 = 5 * np.random.rand(40)
x2 = 5 * np.random.rand(40) + 25
x3 = 25 * np.random.rand(20)
x = np.concatenate((x1, x2, x3))
y1 = 5 * np.random.rand(40)
y2 = 5 * np.random.rand(40) + 25
y3 = 25 * np.random.rand(20)
y = np.concatenate((y1, y2, y3))
plt.scatter(x, y, s=[100], marker='^', c='m’)
plt.show()
Depicting groups
Color is the third axis when working with a scatterplot. Using color lets you highlight groups so that others can see them with greater ease.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x1 = 5 * np.random.rand(40)
x2 = 5 * np.random.rand(40) + 25
x3 = 25 * np.random.rand(20)
x = np.concatenate((x1, x2, x3))
y1 = 5 * np.random.rand(40)
y2 = 5 * np.random.rand(40) + 25
y3 = 25 * np.random.rand(20)
y = np.concatenate((y1, y2, y3))
color_array = ['b'] * 50 + ['g'] * 50 + ['r'] * 25
plt.scatter(x, y, s=[50], marker='D', c=color_array)
plt.show()
Depicting groups
Showing correlations
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x1 = 5 * np.random.rand(40)
x2 = 5 * np.random.rand(40) + 25
x3 = 25 * np.random.rand(20)
x = np.concatenate((x1, x2, x3))
y1 = 5 * np.random.rand(40)
y2 = 5 * np.random.rand(40) + 25
y3 = 25 * np.random.rand(20)
y = np.concatenate((y1, y2, y3))
color_array = ['b'] * 50 + ['g'] * 50 + ['r'] * 25
plt.scatter(x, y, s=[50], marker='D', c=color_array)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
plb.plot(x, p(x), ’m-’)
plt.show()
Showing correlations
Plotting Time Series
Nothing is truly static. When you view most data, you see an instant of time — a snapshot of
how the data appeared at one particular moment.
Viewing the data as it moves through time — to see it as it changes can you expect to
understand the underlying forces that shape it.
Representing time on axes
Plotting Geographical Data
Knowing where data comes from or how it applies to a specific place can be
important. For example, if you want to know where food shortages have
occurred and plan how to deal with them, you need to match the data you have
to geographical locations. The same holds true for predicting where future sales
will occur. You may find that you need to use existing data to determine where
to put new stores. Otherwise, you could put a store in a location that won’t
receive much in the way of sales, and the effort will lose money rather than
make it. The following sections describe how to work with Basemap to interact
with geographical data.
Using Basemap to plot geographic data
Using Basemap to plot geographic data
Using Basemap to plot geographic data
Visualizing Graphs
Developing undirected graphs
Developing undirected graphs
Developing directed graphs
Developing directed graphs
Thanks