Data Visualisation
Data Visualisation
Data visualization is a field in data analysis that deals with visual representation of data. It graphically
plots data and is an effective way to communicate inferences from data.
Using data visualization, we can get a visual summary of our data. With pictures, maps and graphs,
the human mind has an easier time processing and understanding any given data. Data visualization
plays a significant role in the representation of both small and large data sets, but it is especially useful
when we have large data sets, in which it is impossible to see all of our data, let alone process and
understand it manually.
Python offers several plotting libraries, namely Matplotlib, Seaborn and many other such data
visualization packages with different features for creating informative, customized, and appealing plots
to present data in the most simple and effective way.
Matplotlib is a Python library used for creating static, animated, and interactive visualizations. It
provides a variety of plotting functions to visualize data effectively, including line plots, scatter plots,
bar charts, histograms, and more. Matplotlib is often used in data analysis, scientific research, and
visual communication.
Key Features
Diverse Plotting Types:
Matplotlib supports a wide range of plots, including line plots, scatter plots, bar charts, histograms,
pie charts, box plots, area plots, and 3D plots.
Customization:
It allows for extensive customization of plots, including colors, labels, titles, legends, gridlines, and
styles.
Subplots:
Matplotlib enables the creation of multiple plots within a single figure using subplots.
Integration:
It integrates well with other Python libraries like NumPy and Pandas, making it easy to visualize
data from these sources.
Output:
Matplotlib can export plots in various formats, including PNG, JPG, PDF, and SVG.
Basic Usage
To use Matplotlib, it must first be installed. This can be done using pip:
Code
pip install matplotlib
Once installed, it can be imported into a Python script:
Python
import matplotlib.pyplot as plt
A simple line plot can be created using the plot() function:
Python
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 3, 5]
plt.plot(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Simple Line Plot")
plt.show()
Other types of plots can be created similarly, using functions like scatter(), bar(), and hist().
Matplotlib is a powerful and widely-used Python library for creating static, animated and interactive
data visualizations. In this article, we will provide a guide on Matplotlib and how to use it for data
visualization with practical implementation.
Matplotlib offers a wide variety of plots such as line charts, bar charts, scatter plot and histograms
making it versatile for different data analysis tasks. The library is built on top of NumPy making it
efficient for handling large datasets. It provides a lot of flexibility in code.
Installing Matplotlib for Data Visualization
We will use the pip command to install this module. If you do not have pip installed then refer to the
article, Download and install pip Latest Version.
To install Matplotlib type the below command in the terminal.
pip install matplotlib
If you are using Jupyter Notebook, you can install it within a notebook cell by using:
!pip install matplotlib
plt.plot(x, y)
plt.title("Line Chart")
plt.ylabel('Y-Axis')
plt.xlabel('X-Axis')
plt.show()
output:
2. Bar Chart
A bar chart is a graph that represents the category of data with rectangular bars with lengths and
heights that is proportional to the values which they represent. The bar plots can be plotted horizontally
or vertically. A bar chart describes the comparisons between the different categories. It can be created
using the bar() method.
In the below example we will use the tips dataset. Tips database is the record of the tip given by the
customers in a restaurant for two and a half months in the early 1990s. It contains 6 columns as
total_bill, tip, sex, smoker, day, time, size.
Example:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('tips.csv')
x = data['day']
y = data['total_bill']
plt.bar(x, y)
plt.title("Tips Dataset")
plt.ylabel('Total Bill')
plt.xlabel('Day')
plt.show()
Output:
3. Histogram
A histogram is basically used to represent data provided in a form of some groups. It is a type of bar
plot where the X-axis represents the bin ranges while the Y-axis gives information about frequency.
The hist() function is used to compute and create histogram of x.
Example:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('tips.csv')
x = data['total_bill']
plt.hist(x)
plt.title("Tips Dataset")
plt.ylabel('Frequency')
plt.xlabel('Total Bill')
plt.show()
output:
4.Pie Chart
Pie chart is a circular chart used to display only one series of data. The area of slices of the pie
represents the percentage of the parts of the data. The slices of pie are called wedges. It can be
created using the pie() method.
Syntax:
matplotlib.pyplot.pie(data, explode=None, labels=None, colors=None, autopct=None, shadow=False)
Example:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('tips.csv')
plt.pie(data, labels=cars)
plt.title("Car data")
plt.show()
Output: