0% found this document useful (0 votes)
24 views80 pages

DVA Unit 3

The document provides an overview of data visualization, emphasizing its importance in analyzing large datasets through graphical representation. It discusses various Python libraries for data visualization, including Matplotlib, Seaborn, Bokeh, and Plotly, and explains different types of graphs such as bar graphs, line graphs, and pie charts, along with their advantages and disadvantages. Additionally, it covers the syntax for creating these visualizations and the significance of legends and plot customization.

Uploaded by

Keshav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views80 pages

DVA Unit 3

The document provides an overview of data visualization, emphasizing its importance in analyzing large datasets through graphical representation. It discusses various Python libraries for data visualization, including Matplotlib, Seaborn, Bokeh, and Plotly, and explains different types of graphs such as bar graphs, line graphs, and pie charts, along with their advantages and disadvantages. Additionally, it covers the syntax for creating these visualizations and the significance of legends and plot customization.

Uploaded by

Keshav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

BCA VI SEM

Unit III
Syllabus
Data Visualization
 In today’s world, a lot of data is being generated on a daily basis. And sometimes to
analyze this data for certain trends, patterns may become difficult if the data is in its
raw format.
 To overcome this data visualization comes into play.
 Data visualization provides a good, organized pictorial representation of the data
which makes it easier to understand, observe, analyze. In this tutorial, we will discuss
how to visualize data using Python.
 Data visualization is a field in data analysis that deals with visual representation of
data. It graphically plots data and is an effective way to communicate inferences from
data.
 Using data visualization, we can get a visual summary of our data. With pictures, maps
and graphs, the human mind has an easier time processing and understanding any
given data.
 Data visualization plays a significant role in the representation of both small and large
data sets, but it is especially useful when we have large data sets, in which it is
impossible to see all of our data, let alone process and understand it manually.
Data Visualization
 Python provides various libraries that come with different features for
visualizing data. All these libraries come with different features and can
support various types of graphs. In this tutorial, we will be discussing four
such libraries.
 Matplotlib
 Seaborn
 Bokeh
 Plotly
Matplotlib vs Seaborn
Seaborn Matplotlib

It is mainly used for statistics visualization and


It is used for basic graph plotting like line
can perform complex visualizations with fewer
charts, bar graphs, etc.
commands.

It mainly works with datasets and arrays. It works with entire datasets.

Seaborn is considerably more organized and Matplotlib acts productively with data arrays
functional than Matplotlib and treats the entire and frames. It regards the aces and figures as
dataset as a solitary unit. objects.

Matplotlib is more customizable and pairs well


Seaborn has more inbuilt themes and is
with Pandas and Numpy for Exploratory Data
mainly used for statistical analysis.
Analysis.
Bar chart
 A bar graph is a visual representation of data using rectangular bars. The bars
can be vertical or horizontal, and their lengths are proportional to the data they
represent. Bar graphs are also known as bar charts or bar diagrams.
 A bar chart is the most common data visualization for displaying the numerical
values of categorical data to compare various categories between them.
 The categories are represented by rectangular bars of the same width and with
heights (for vertical bar charts) or lengths (for horizontal bar charts)
proportional to the numerical values that they correspond to.
 The pictorial representation of data in groups, either in horizontal or vertical
bars where the length of the bar represents the value of the data present on axis.
They (bar graphs) are usually used to display or impart the information
belonging to ‘categorical data’ i.e; data that fit in some category.
Properties of Bar Graph
 All Bars have a common base.
 Each bar displayed has the same width.
 The distance between consecutive bars is the same
 Every bar graph has a uniform width which is used to analyse data
according to different points.
 It can be either horizontal or vertical.
 Every bar graphs has two axes, one for Graph and other for quantity of the
data.
 Graph shows the comparison of data over a particular time.
Parts of a Bar Graph
 The main parts of a bar graph include:
 Title: Describes the purpose or subject of the graph.
 X-axis (horizontal axis): Represents the categories or groups being compared.
 Y-axis (vertical axis): Displays the values or quantities corresponding to each
category.
 Bars: Vertical or horizontal rectangles representing the data values for each
category.
 Data labels: Numerical values attached to the bars to show the exact
measurement.
 Legend: Explains the meaning of different colours or patterns if multiple data
sets are presented.
 Scale: The units or intervals used on the axes to measure and represent the data
accurately.
Significance of a Bar Graph

 It is always easier and more comfortable to visually understand something


than to look at the large table of Numerical data.
 Bar graphs are extensively used in presentations and reports.
 It is very prominently used as it summarizes data and displays it in a
frequency distribution.
Bar Graph Types

 The different types of bar graph are :


 Vertical Bar Graph
 Horizontal Bar Graph
 Grouped Bar Graph
Vertical Bar Graph
Vertical bar graph is a type of data
visualisation technique used to
represent data using vertical bars
or columns. It is also known as a
vertical bar chart.
Vertical Bar Graphs are the most
common bar graph we come
across. The bars of grouped data in
vertical bar graphs lie vertically.
Horizontal Bar
Graph
 Horizontal bar
graphs are the graphs
that have their
rectangular bars
lying horizontally.
This means that the
frequency of the data
lie on the x-axis
while the categories
of the data lie on the
y-axis.
 Grouped Bar Graph
 Grouped bar graphs are the bar charts in
which multiple sets of data items are
compared, with a single colour used to denote
a specific series across all sets. It is also called
the clustered bar graph.
 A grouped bar graph compares different sets
of data items. It uses a single colour to
represent each series within the set.
 The grouped bar graph is used to represent
the discrete value for more than one object
that shares the same category. As with basic
Bar charts, both vertical and horizontal
versions of grouped bar charts are available.
 The grouped bar graph can be represented
using both vertical and horizontal bar charts.
Bar Graph Advantages and Disadvantages
 The advantages of Bar Graph are :
 It represents the data in a graphical form, which is easier to understand.
 It helps in analysing the data at a glance.
 It displays each information separately which is easier to understand.
 It summarize a large data set in visual from.
 It displays relative data or proportion.
 The disadvantages of Bar Graph are :
 It is easier to manipulate data and show false data analysis.
 It requires additional explanation to describe the Graph.
Creating a bar plot

syntax of the bar()


Multiple bar plots
What is Line Graph?
 A line Graph is nothing but a way to represent two or more variables in the
form of line or curves to visualize the concept and helps to understand it
in a better form.
 It displays the data that changes continuously concerning time.
 In a line graph data points are connected with an edge and data points are
represented either with points.
 Line Graph Definition
 A graph or line chart is a graphical representation of the data that displays
the relationship between two or more variables concerning time. It is
made by connecting data points with straight-line segments.
Parts of Line Graph

Parts of the line graph include the following:


 Title: It is nothing but the title of the
graph drawn.
 Axes: The line graph contains two axes i.e.
X-axis and Y-axis.
 Labels: The name given to the x-axis and
y-axis.
 Line: It is the line segment that is used to
connect two or more data points.
 Point: It is nothing but a point given at
each segment.
Advantages and Disadvantages of Line Graph
 Advantages of Line Graph
 Some of the advantages of using line graph are listed below:
 It helps to visualize the data.
 It provides a clear overview of the data.
 It becomes easy to make predictions using a line graph.
 It helps to compare the data more easily.
 Disadvantages of Line Graph
 Some of the disadvantages of line graphs are listed below:
 It is ineffective for complex data.
 It is very difficult to represent non-linear relationships.
Uses of Line Graph
 There are various use cases of line graphs, some of these use cases are as
follows:
 It is used to visualize the data, which makes raw data understandable easily.
 It is used to compare the data for different categories.
 It becomes easy to make predictions using a line graph
Pie-charts

 A Pie Chart is a circular statistical plot that can display only one series of
data.
 The area of the chart is the total percentage of the given data.
 Pie charts are commonly used in business presentations like sales,
operations, survey results, resources, etc. as they provide a quick
summary.
Pie Chart Advantages and Disadvantages
 Pie Chart Advantages
 Pie Chart is very useful for finding and representing data. Various advantages of
the pie chart are,
 Pie chart is easily understood and comprehended.
 Visual representation of data in a pie chart is done as a fractional part of a whole.
 Pie chart provides an effective mode of communication to all types of audiences.
 Pie chart provides a better comparison of data for the audience.
 Pie Chart Disadvantages
 There are some disadvantages also of using pie charts and some of them are
added below,
 In the case of too much data, this presentation becomes less effective using a pie
chart.
 For multiple data sets, we need a series to compare them.
 For analyzing and Assimilating the data in a pie chart, it is difficult for readers to
comprehend.
Uses of Pie Chart
 Whenever a fraction or fractions are represented as a part of the whole, pie
charts are used. Pie charts are used to compare the data and to analyze
which data is bigger or smaller.
Hence, while dealing with discrete data, pie charts are preferred. Let’s take a
look at the uses of the pie chart:
 Pie charts are used to compare the profit and loss in businesses.
 In schools, the grades can be easily compared using a pie chart.
 The relative sizes of data can be compared using a pie chart.
 The marketing and sales data can be compared using a pie chart.
 Matplotlib API has pie() function in its pyplot module which create a pie chart
representing the data in an array.
 Syntax: matplotlib.pyplot.pie(data, explode=None, labels=None, colors=None,
autopct=None, shadow=False)
 Parameters:

data represents the array of data values to be plotted, the fractional area of each
slice is represented by data/sum(data). If sum(data)<1, then the data values
returns the fractional area directly, thus resulting pie will have empty wedge of
size 1-sum(data).
labels is a list of sequence of strings which sets the label of each wedge.
color attribute is used to provide color to the wedges.
autopct is a string used to label the wedge with their numerical value.
shadow is used to create shadow of wedge.
 The explode parameter allows you to do that.
 The explode parameter, if specified, and not None, must be an array with
one value for each wedge.
 Shadow
 Add a shadow to the pie chart by setting the shadows parameter to True
 Colors
 You can set the color of each wedge with the colors parameter.
 The colors parameter, if specified, must be an array with one value for
each wedge
 Legend
 To add a list of explanation for each wedge, use the legend() function:
Scatter plots

 The matplotlib.pyplot.scatter() plots serve as a visual tool to explore and


analyze the relationships between variables, utilizing dots to depict the
connection between them.
 The matplotlib library provides the scatter() method, specifically designed
for creating scatter plots.
 These plots are instrumental in illustrating the interdependencies among
variables and how alterations in one variable can impact another
 Syntax: matplotlib.pyplot.scatter(x_axis_data, y_axis_data, s=None, c=None,
marker=None, cmap=None, vmin=None, vmax=None, alpha=None,
linewidths=None, edgecolors=None)

 Except for x_axis_data and y_axis_data, all other parameters are optional, with
their default values set to None.
 Parameters:
 x_axis_data: An array containing data for the x-axis.matplotlib
 s: Marker size, which can be a scalar or an array of size equal to the size of x or y.
 c: Color of the sequence of colors for markers.
 marker: Marker style.
 cmap: Colormap name.
 linewidths: Width of the marker border.
 edgecolor: Marker border color.
 alpha: Blending value, ranging between 0 (transparent) and 1 (opaque).
Matplotlib.pyplot.scatter() in Python
 There are various ways of creating plots using matplotlib.pyplot.scatter()
in Python,
 There are some examples that illustrate the matplotlib. pyplot.scatter()
function in matplotlib.plot:

 Basic Scatter Plot


 Scatter Plot With Multiple Datasets
 Bubble Chart Plot
 Customized Scatter Plot
Scatter Plot in Matplotlib
Plot Multiple Datasets on a Scatterplot
Bubble Plots in Matplotlib
 This code generates a bubble chart using Matplotlib.
 It plots points with specified x and y coordinates, each represented by a
bubble with a size determined by the bubble_sizes list.
 The chart has customization for transparency, edge color, and linewidth.
 Finally, it displays the plot with a title and axis labels.
Custom a Matplotlib Scatterplot

 By importing Matplotlib we create a customized scatter plot using


Matplotlib and NumPy.
 It generates random data for x and y coordinates, colors, and sizes.
 The scatter plot is then created with customized properties such as color,
size, transparency, and colormap.
 The plot includes a title, axis labels, and a color intensity scale. Finally, the
plot is displayed
Multiple Plots
 Python provides a powerful library named Matplotlib that creates visual
representations in the form of plots and graphs.
 One of the many features of this library is the ability to plot multiple plots
within a single figure that are useful when comparing different datasets or
visualizing relationships between multiple variables.
 With a single call of 'subplots()' method, we can create one or more than
one subplots within a single figure.
 It provides appropriate control over the plots and also, allows us to
customize their layout and appearance.
 Here, 'numOfrows' and 'numOfcols' specify the number of rows and
columns respectively of grid.
Legends

 Legend is an area on the graph that describes each element that makes up
the. A graph may be straightforward in the sense that it's.
 If we include titles, labels for X, the Y label, and the legend, it will be
clearer.
 When we look at the names, we are able to determine what the graph
represents easily and the type of data it represents.
Changing figure Size
 Plots are an effective way of visually representing data and summarizing it
beautifully. However, if not plotted efficiently it seems appears
complicated. Python’s Matplotlib provides several libraries for data
representation. While making a plot we need to optimize its size.
 Change Plot Size in Matplotlib in Python
 There are various ways we can use those steps to set size of plot in
Matplotlib in Python:

 Using set_figheight() and set_figwidth()


 Using figsize
Change the Size of Figures using set_figheight() and set_figwidth()
Changing Plot Size in Matplotlib using figsize()
Styling plots using Matplotib Library.
Relplot() Function
 This function provides us the access to some other different axes-level
functions which shows the relationships between two variables with
semantic mappings of subsets.
catplot()
 The Seaborn.catplot() method is used to plot categorical plots.
 With the use of one of many visual representations, this function gives
users access to a number of axes-level functions that illustrate the
connection between numerical data and one or more category variables.
seaborn.displot()
 The seaborn.displot() method is a function that provides access to
several approached for visualizing univariate and bivariate distribution of
data.
 This function like other functions in the Seaborn library allows the
plotting of subsets of data defined by semantic mapping across multiple
subplots.
 The distribution and range of a collection of numeric values are
represented against a dimension in a distribution plot.
Empirical Cumulative Distribution Graph
Introduction to Seaborn – Python
 Seaborn is an amazing visualization library for statistical graphics plotting
in Python.
 It provides beautiful default styles and color palettes to make statistical
plots more attractive.
 It is built on top matplotlib library and is also closely integrated with the
data structures from pandas.
 Seaborn aims to make visualization the central part of exploring and
understanding data.
 It provides dataset-oriented APIs so that we can switch between different
visual representations for the same variables for a better understanding of
the dataset.
Different categories of plot in Seaborn
 Plots are basically used for visualizing the relationship between variables. Those
variables can be either completely numerical or a category like a group, class, or
division. Seaborn divides the plot into the below categories –
 Relational plots: This plot is used to understand the relation between two
variables.
 Categorical plots: This plot deals with categorical variables and how they can be
visualized.
 Distribution plots: This plot is used for examining univariate and bivariate
distributions
 Regression plots: The regression plots in Seaborn are primarily intended to add a
visual guide that helps to emphasize patterns in a dataset during exploratory data
analyses.
 Matrix plots: A matrix plot is an array of scatterplots.
 Multi-plot grids: It is a useful approach to draw multiple instances of the same
plot on different subsets of the dataset.
seaborn.lineplot()

 Draw a line plot with the possibility of several semantic groupings.


 The relationship between x and y can be shown for different subsets of the
data using the hue, size, and style parameters.
 These parameters control what visual semantics are used to identify the
different subsets.
Lmplot
 Seaborn.lmplot() method is used to plot data and draw regression model
fits across grids where multiple plots can be plotted.
 This function combines FacetGrid and regplot().
 The purpose of this interface is to make fitting regression models across
conditional subsets of a dataset simple and convenient.
Count plot
 The countplot is used to represent the occurrence(counts) of the
observation present in the categorical variable.
 It uses the concept of a bar chart for the visual depiction.
Seaborn - Color Palette

 Color plays an important role than any other aspect in the visualizations.
When used effectively, color adds more value to the plot.
 A palette means a flat surface on which a painter arranges and mixes
paints.
 Seaborn provides a function called color_palette(), which can be used to
give colors to plots and adding more aesthetic value to it.
The Different Ways For Using Color_palette() Types

 Qualitative
 Sequential
 Diverging
Qualitative
 A qualitative palette is
used when the variable
is categorical in nature,
the color assigned to
each group need to be
distinct. Each possible
value of the variable is
assigned one color
from a qualitative
palette within a plot
Sequential
 In sequential palettes color
moved sequentially from a
lighter to a darker. When
the variable assigned to be
colored is numeric or has
inherently ordered values,
then it can be depicted
with a sequential palette
Diverging
 When we work on mixed
value like +ve and -
ve(low and high values)
then diverging palette is
the best suit for
visualization.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy