0% found this document useful (0 votes)
10 views33 pages

Knowledge Institute of Technology: (An Autonomous Institution)

The document is a laboratory manual for the Exploratory Data Analysis Laboratory (CCS346) at the Knowledge Institute of Technology, outlining the vision, mission, and objectives of the Computer Science and Engineering department. It details the course objectives, a list of experiments, and expected outcomes, focusing on various data analysis and visualization techniques using tools like R, Python, Tableau, and Power BI. The manual includes step-by-step algorithms for each experiment to guide students in performing exploratory data analysis and visualization.

Uploaded by

SUBASREE G S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views33 pages

Knowledge Institute of Technology: (An Autonomous Institution)

The document is a laboratory manual for the Exploratory Data Analysis Laboratory (CCS346) at the Knowledge Institute of Technology, outlining the vision, mission, and objectives of the Computer Science and Engineering department. It details the course objectives, a list of experiments, and expected outcomes, focusing on various data analysis and visualization techniques using tools like R, Python, Tableau, and Power BI. The manual includes step-by-step algorithms for each experiment to guide students in performing exploratory data analysis and visualization.

Uploaded by

SUBASREE G S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

KNOWLEDGE INSTITUTE OF TECHNOLOGY

(An Autonomous Institution)

(Accredited by NAAC & NBA, Approved by AICTE, New Delhi and Affiliated to Anna University, Chennai)

KAKAPALAYAM (PO), SALEM – 637 504

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LABORATORY MANUAL

FOR

CCS346 – EXPLORATORY DATA ANALYSIS LABORATORY

1
REGULATION 2021

VISION, MISSION, PEOs AND PSOs OF CSE DEPARTMENT

VISION
To create globally competent software professionals with social values to cater the ever-
changing industry requirements.
MISSION
M1 To provide appropriate infrastructure to impart need-based technical education
through effective teaching and research
M2 To involve the students in collaborative projects on emerging technologies to fulfill
the industrial requirements
M3 To render value based education to students to take better engineering decision
with social consciousness and to meet out the global standards
M4 To inculcate leadership skills in students and encourage them to become a globally
competent professional
Programme Educational Objectives (PEOs)
The graduates of Computer Science and Engineering will be able to
PEO1 Pursue Higher Education and Research or have a successful career in industries
associated with Computer Science and Engineering, or as Entrepreneurs
PEO2 Ensure that graduates will have the ability and attitude to adapt to emerging
technological changes
PEO3 Acquire leadership skills to perform professional activities with social
consciousness
Programme Specific Outcome (PSOs)
The graduates will be able to
PSO1 The students will be able to analyze large volume of data and make business
decisions to improve efficiency with different algorithms and tools
PSO2 The students will have the capacity to develop web and mobile applications for real
time scenarios
PSO3 The students will be able to provide automation and smart solutions in various
forms to the society with Internet of Things

2
Course Code & Name: CCS346 & EXPLORATORY DATA ANALYSIS LABORATORY

REGULATION: R2021
YEAR/SEM: III/V

COURSE OBJECTIVES:
• To outline an overview of exploratory data analysis.
• To implement data visualization using Matplotlib.
• To perform univariate data exploration and analysis.
• To apply bivariate data exploration and analysis.
• To use Data exploration and visualization techniques for multivariate and time series data

LIST OF EXPERIMENTS:

1. Install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.
2. Perform exploratory data analysis (EDA) with datasets like email data set. Export all your
emails as a dataset, import them inside a pandas data frame, visualize them and get different
insights from the data.
3. Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib.
4. Explore various variable and row filters in R for cleaning data. Apply various plot features in
R on sample data sets and visualize.
5. Perform Time Series Analysis and apply the various visualization techniques.
6. Perform Data Analysis and representation on a Map using various Map data sets with Mouse
Rollover effect, user interaction, etc..
7.Build cartographic visualization for multiple datasets involving various countries of the world;
states and districts in India etc.
8. Perform EDA on Wine Quality Data Set.
9. Use a case study on a data set and apply the various EDA and visualization techniques and
present an analysis report.

COURSE OUTCOMES
• Understand the fundamentals of exploratory data analysis.
• Implement the data visualization using Matplotlib.
• Perform univariate data exploration and analysis.
• Apply bivariate data exploration and analysis.
• Use Data exploration and visualization techniques for multivariate and time series data.

3
4
KNOWLEDGE INSTITUTE OF TECHNOLOGY

SALEM – 637 504

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CONTENTS

Ex. Page Mark


No. Date Name of the Experiment No. Awarded Signature

Installation and Setup of Data


1. Analysis and Visualization Tools: R,
Python, Tableau, and Power BI.
Perform exploratory data analysis
(EDA) with datasets like email data
set. Export all your emails as a dataset,
2.
import them inside a pandas data
frame, visualize them and get different
insights from the data.
Working with Numpy arrays, Pandas
3.
dataframe, Basic using Matplotlib.
Explore various variable and row
filters in R for cleaning data. Apply
4.
various plot features in R on sample
data sets and visualize.
Perform Time Series Analysis and
5. apply the various visualization
techniques.
Perform data analysis and
representation on a map using various
6.
map datasets with mouse rollover
effect, user interaction, etc,.
Build cartographic visualization of
7.
multiple datasets involving various
5
countries of the world; states, and
districts in India
8. Perform EDA on Wine Quality Data
Set.
Use a case study on a dataset and apply
the various EDA and visualization
9.
techniques and present an analysis
report.

6
Ex.no : 1 Installation and Setup of Data Analysis and Visualization Tools: R,
Date : Python, Tableau, and Power BI

Aim:
The aim of this project is to install and set up four essential data analysis and visualization tools,
namely R, Python, Tableau, and Power BI, to provide a robust environment for data analysis and
visualization tasks.
Algorithm:
Step 1: System Requirements Assessment:

- Before installation, ensure your system meets the minimum hardware and software requirements
for each tool. Check the official documentation of R, Python, Tableau, and Power BI for system
prerequisites.

Step 2: Installation of R:

- Download the R installation package from the official CRAN (Comprehensive R Archive Network)
website (https://cran.r-project.org/mirrors.html).

- Follow the installation wizard, choose your preferred options, and install R on your system.

Step 3: Installation of Python:

- Download the latest Python installer from the official Python website
(https://www.python.org/downloads/).

- During installation, make sure to check the option "Add Python to PATH" for easy command-line
access.

- After installation, use pip (Python package manager) to install essential data science libraries, such
as NumPy, Pandas, Matplotlib, and Jupyter Notebook, to enhance your data analysis capabilities.

Step 4: Installation of Tableau:

- Visit the official Tableau website (https://www.tableau.com/products/desktop/download) to


download the Tableau Desktop installer.

- Follow the installation instructions and provide your licensing information or use the free trial
period.

- Ensure Tableau Desktop is successfully activated.

Step 5: Installation of Power BI:

7
- Go to the Microsoft Power BI website (https://powerbi.microsoft.com/en-us/desktop/) to
download Power BI Desktop.

- Download and install the Power BI Desktop application.

- You will need a Microsoft account to access certain features. Sign in or create an account if
required.

Step 6: Configuration and Integration:

- Configure R and Python with data science IDEs such as RStudio or Jupyter Notebook to harness
their full potential for data analysis.

- For Tableau and Power BI, explore their integration capabilities with databases, cloud services,
and data sources you plan to use for analysis and visualization.

Step 7: Testing and Documentation:

- Verify that all the installed tools are working correctly by creating a simple data analysis and
visualization project.

- Document the installation process, any challenges faced, and how they were overcome. This
documentation will be helpful for future reference and troubleshooting.

8
Tableau Public:

Power BI:

Result:

Thus the steps to install R/Python/Tableau Public/Power BI was completed successfully.

9
Ex.no : 2 Perform exploratory data analysis (EDA) with datasets like email data set.
Export all your emails as a dataset, import them inside a pandas data
Date : frame, visualize them and get different insights from the data.

Aim :
Perform exploratory data analysis (EDA) with datasets like email data set. Export all your emails as
a dataset, import them inside a pandas data frame, visualize them and get different insights from
the data.
Algorithm
Step 1: Load the data and create a DataFrame.

Step 2: Convert the 'timestamp' column to datetime format.

Step 3: Check the data's structure and data types.

Step 4: Preview the first few rows of the DataFrame.

Step 5: Find the most active email senders.

Step 6: Visualize common keywords in email content with a word cloud.

Step 7: Analyze email frequency over time using a time series plot.

Step 8: Explore additional insights based on specific requirements.

Step 9: Display results and provide interpretations.

Step 10: Conclude the analysis.

10
Output

11
Result :
Thus EDA on email dataset has been successfully performed.

12
EX NO: 3 Working with Numpy arrays, Pandas dataframe, Basic using Matplotlib

DATE:

Aim

To work with arrays in numpy module, dataframe with pandas module and basic plots with matplotlib module
in python programming.

Algorithm for Numpy

Step 1: Start the program

Step 2: Import the required packages

Step 3: Create the 1-d array, 2-d array by using built-in methods

Step 4: Generate arrays using zeros, ones, arange and linspace.

Step 5: Check the number of dimensions, size of an array

Step 6: Compute the shape of an array and reshape an array and perform transpose of an array

Step 7: Do the required operations like slicing, iterating and splitting an array element.

Step 8: Stop the program

Numpy arrays

(i) Creating numpy 1d and 2d arrays

Program

Output

(ii) Different ways of creating arrays

Program

Output

13
(iii) Numpy array Dimension,Shape,Size,Transpose and Reshaping

Program

Output

(iv) Slicing and accessing arrays

Program

Output

(v) Iterating Arrays

Program

Output

(vi) Vstack,Hstack,split and flip functions on arrays

Program

Output

(vii) Operations on array

Program

14
Output

Algorithm for Pandas:

Pandas DataFrame

(i) Creating Pandas DataFrame

Program

Output

(ii) Creating named index

Program

Output

(iii) Importing and adding into DataFrame

Program

Output

(iv) Information about dataset

Program

Output

15
(v) Basic operations on DataFrame

Program

Output

(VI) Slicing using loc and iloc

Program

Output

Program

Output

Algorithm for Matplotlib:

Step 1: Start the program

Step 2: Import the required packages

Step 3: Visualize the data using simple line plot

Step 4: Create a bar plot, use the bar method and customize the appearance, labels, and title.

Step 5: Generate a scatter plot by scatter function to plot your data points and customize the size, color, and style
of the markers, as well as labels and title.

Step 6: Visualize the data using the pie chart and histogram with required parameters

Step 7: Add grid line to the plots by using built-in methods

Step 8: Stop the program

16
Basic plots using Matplotlib

(i) Creating of line plot

Program

Output

(ii) Creating bar plot

Program

Output

(iii) Creating Scatter plot

Program

Output

(iv) Creating Pie chart

Program

Output

(v) Creating Histogram plot

Program

Output

17
(vi) Adding Gridlines to plot

Program

Output

Result

Thus working with arrays in the numpy module, dataframe with pandas module and basic plots with matplotlib
module in python programming has been explored successfully.

18
Ex No : 4 Explore various variable and row filters in R for cleaning
DATE : data. Apply various plot features in R on sample data sets
and visualize.

Aim:
The aim of this project is to explore data cleaning techniques in R, including variable and row filters,
and to apply various plot features to sample datasets to visualize data effectively.
Algorithm:
Step 1: Import the required libraries for data manipulation, such as dplyr for filtering and cleaning, and
ggplot2 for data visualization.

Step 2: Load one or more sample datasets for analysis. These datasets should have some missing values and
outliers for data cleaning demonstrations.

Step 3: Apply variable filters (e.g., removing unnecessary columns) to clean the dataset.

Step 4: Apply row filters (e.g., removing rows with missing values or outliers) to clean the dataset further.

Step 5: Perform EDA to gain insights into the data.

Step 6: Create summary statistics, histograms, and box plots to understand the distribution of variables.

Step 7: Use ggplot2 to create various types of plots, such as bar charts, scatter plots, and line plots, to visualize
the data.

Step 8: Display the created plots using R's built-in functionality.

Step 9: Optionally, save the plots as image files for further use or reporting.

Program:

19
Output:

Result:
Thus the process of data manipulation and visualization using R was executed successfully.

20
Ex No : 5 Perform Time Series Analysis and apply the various
DATE : visualization techniques

Aim :

To Write a code to implement the Time Series Analysis.

Algorithm :

Step 1: Import the required libraries:

1. Pandas for data handling.


2. Numpy for mathematical processing.
3. Matplotlib for data visualization.
4. Statsmodels for the time series
Step 2: Set the random seed for reproducibility using np.random.seed ( ).

Step 3: Set the date range using pd.date_range( ).

Step 4: Generate the random values for the date from 1 to 100 using np.random.randint ( ) and convert
it into a Data Frame.

Step 5: Set the Date as index to the DataFrame using df.set_index ( ‘date’ , inplace=True)

Step 6: From the statsmodels import seasonal_decompose to show different time series visuals like
trend , Seasonal , resid etc …

Step 7: Visualize the DataFrame using Matplot library as df.plot( ).

Step 8: Create a variable decomposition and pass the model as seasonal_decompose(

df[‘value’],model=’Additive’) and use the variable by accesing all the visuals (


decomposition.trend,

decomposition.Seasonal,decomposition.reside)

Step 9: Finally, use plt.show( ) to show the Visualization.

Program :

Output:

21
Output:

RESULT :

Thus, the above Times Series program was written and executed successfully.

22
Ex No : 6 Perform data analysis and representation on a map using various map
datasets with mouse rollover effect, user interaction, etc,.
Date :

Aim:

To Perform data analysis and representation on a map using various map datasets with mouse
rollover effect, user interaction, etc,.

Algorithm:

Step 1: Collect and prepare various map datasets, including geographical information (latitude,
longitude), and relevant data attributes.

Step 2: Choose a suitable map library or framework (e.g., Leaflet, Google Maps API) for displaying
the maps.

Step 3: Create a web-based application that renders the map and overlays it with markers,
polygons, or other visual elements based on the dataset.

Step 4: Implement a mouse rollover effect, where users can hover over map elements (markers,
regions) to view additional information related to the data point.

Step 5: Enable user interaction by allowing users to interact with the map, such as zooming in/out,
panning, and selecting specific data points for more details.

Step 6: Perform data analysis on the selected dataset, which may include generating statistics,
clustering, or creating heatmaps based on geographical attributes.

Step 7: Implement filters and visualization options, such as color-coding or varying marker sizes to
represent different data attributes.

Step 8: Design a user-friendly interface with clear controls, legends, and tooltips to help users
understand the data and its representation on the map.

Program:

23
Output:

Result:

The web-based interactive map application that allows users to explore geographical data using
mouse rollover effects, interactive features has been created.

24
Ex No : 7 Build cartographic visualisation of multiple datasets involving various
countries of the world; states, and districts in India
Date :

Aim:

To Build cartographic visualisation of multiple datasets involving various countries of the world;
states, and districts in India etc.

Algorithm:

Step 1: Gather the multiple datasets related to countries, states, and districts, and ensure that each
dataset contains geographical information, such as latitude and longitude, to accurately plot the data
on maps.

Step 2: Choose a mapping library or tool suitable for the project, such as Leaflet, Mapbox, or Google
Maps.

Step 3: Develop a web-based application or interactive dashboard that renders the world map and
India's map. Overlay the map with relevant boundaries (country borders, state borders, district
boundaries).

Step 4: Merge the datasets with the map layers, linking data points to their geographical locations
(countries, states, districts).

Step 5: Implement data visualization techniques to represent the datasets visually on the maps. This
may include choropleth maps, bubble maps, or heatmaps, depending on the nature of the data.

Step 6: Apply color-coding to the map elements to represent different data attributes, making it easy
to distinguish and analyze the information.

Step 7: Enable user interaction by allowing users to zoom in/out, pan, and click on map elements to
access detailed information about the regions or data points.

Step 8: Include legends, labels, and tooltips to help users interpret the visualizations and understand
the meaning of different colors and symbols.

Program:

Output:

25
Result:

The result is an interactive web-based platform or application that provides cartographic


visualizations for multiple datasets related to countries worldwide, as well as states and districts in
India.

26
EX NO: 8 Perform EDA on Wine Quality Data Set
DATE:

Aim

To perform EDA on Wine quality dataset.

Algorithm

Step 1: Import necessary libraries such as pandas, numpy, matplotlib and seaborn.

Step 2: Read a CSV file ('winequality-red.csv') into a Pandas DataFrame and store it in the variable

Step 3: Take the first column of the DataFrame and split it into multiple columns based on the delimiter ';'

Step 4: Change the data types of the columns in the DataFrame to their appropriate types

Step 5: Check the datatypes, missing values, and summary of the data.

Step 6: Create a histogram of the 'quality' column using Seaborn with labels and a title, and display the plot.

Step 7: Calculate the correlation matrix for the dataset and create a heatmap using Seaborn with
annotations and a title, and display the plot.

Step 8: Create a scatter plot of 'alcohol' vs. 'quality' using Seaborn with labels and a title, and display the plot.

Program :

Output:

Output

27
Output

# Check the data types of each column

Output

Output

Output

28
Output

Result :

Thus EDA on Wine quality dataset has been performed successfully.

29
Ex No : 9 Use a case study on a dataset and apply the various EDA and visualization
techniques and present an analysis report
Date :

Aim:

To use a case study on a dataset and apply the various EDA and visualization techniques and present
an analysis report

Algorithm :

Step 1: Import the required libraries.

Step 2: Load the dataset.

Step 3: Display basic information about the dataset.

Step 4: Check for missing values.

Step 5: Visualise the distribution of scores using various plots.

Step 6: Check the correlation for variables.

Step 7: Visualise the correlation using heatmap.

Step 8: Give an overall analysis report.

Output :

Output :

30
Output :

Output :

Output :

Output :

Output :

Output :

31
Result :

Thus by applying various EDA and visualization techniques we analysed the student score data and
identified important factors affecting the total scores.

32
LAB EXERCISE EVALUATION METHODOLOGY/ PROCEDURE

33

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy