Knowledge Institute of Technology: (An Autonomous Institution)
Knowledge Institute of Technology: (An Autonomous Institution)
(Accredited by NAAC & NBA, Approved by AICTE, New Delhi and Affiliated to Anna University, Chennai)
LABORATORY MANUAL
FOR
1
REGULATION 2021
VISION
To create globally competent software professionals with social values to cater the ever-
changing industry requirements.
MISSION
M1 To provide appropriate infrastructure to impart need-based technical education
through effective teaching and research
M2 To involve the students in collaborative projects on emerging technologies to fulfill
the industrial requirements
M3 To render value based education to students to take better engineering decision
with social consciousness and to meet out the global standards
M4 To inculcate leadership skills in students and encourage them to become a globally
competent professional
Programme Educational Objectives (PEOs)
The graduates of Computer Science and Engineering will be able to
PEO1 Pursue Higher Education and Research or have a successful career in industries
associated with Computer Science and Engineering, or as Entrepreneurs
PEO2 Ensure that graduates will have the ability and attitude to adapt to emerging
technological changes
PEO3 Acquire leadership skills to perform professional activities with social
consciousness
Programme Specific Outcome (PSOs)
The graduates will be able to
PSO1 The students will be able to analyze large volume of data and make business
decisions to improve efficiency with different algorithms and tools
PSO2 The students will have the capacity to develop web and mobile applications for real
time scenarios
PSO3 The students will be able to provide automation and smart solutions in various
forms to the society with Internet of Things
2
Course Code & Name: CCS346 & EXPLORATORY DATA ANALYSIS LABORATORY
REGULATION: R2021
YEAR/SEM: III/V
COURSE OBJECTIVES:
• To outline an overview of exploratory data analysis.
• To implement data visualization using Matplotlib.
• To perform univariate data exploration and analysis.
• To apply bivariate data exploration and analysis.
• To use Data exploration and visualization techniques for multivariate and time series data
LIST OF EXPERIMENTS:
1. Install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.
2. Perform exploratory data analysis (EDA) with datasets like email data set. Export all your
emails as a dataset, import them inside a pandas data frame, visualize them and get different
insights from the data.
3. Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib.
4. Explore various variable and row filters in R for cleaning data. Apply various plot features in
R on sample data sets and visualize.
5. Perform Time Series Analysis and apply the various visualization techniques.
6. Perform Data Analysis and representation on a Map using various Map data sets with Mouse
Rollover effect, user interaction, etc..
7.Build cartographic visualization for multiple datasets involving various countries of the world;
states and districts in India etc.
8. Perform EDA on Wine Quality Data Set.
9. Use a case study on a data set and apply the various EDA and visualization techniques and
present an analysis report.
COURSE OUTCOMES
• Understand the fundamentals of exploratory data analysis.
• Implement the data visualization using Matplotlib.
• Perform univariate data exploration and analysis.
• Apply bivariate data exploration and analysis.
• Use Data exploration and visualization techniques for multivariate and time series data.
3
4
KNOWLEDGE INSTITUTE OF TECHNOLOGY
CONTENTS
6
Ex.no : 1 Installation and Setup of Data Analysis and Visualization Tools: R,
Date : Python, Tableau, and Power BI
Aim:
The aim of this project is to install and set up four essential data analysis and visualization tools,
namely R, Python, Tableau, and Power BI, to provide a robust environment for data analysis and
visualization tasks.
Algorithm:
Step 1: System Requirements Assessment:
- Before installation, ensure your system meets the minimum hardware and software requirements
for each tool. Check the official documentation of R, Python, Tableau, and Power BI for system
prerequisites.
Step 2: Installation of R:
- Download the R installation package from the official CRAN (Comprehensive R Archive Network)
website (https://cran.r-project.org/mirrors.html).
- Follow the installation wizard, choose your preferred options, and install R on your system.
- Download the latest Python installer from the official Python website
(https://www.python.org/downloads/).
- During installation, make sure to check the option "Add Python to PATH" for easy command-line
access.
- After installation, use pip (Python package manager) to install essential data science libraries, such
as NumPy, Pandas, Matplotlib, and Jupyter Notebook, to enhance your data analysis capabilities.
- Follow the installation instructions and provide your licensing information or use the free trial
period.
7
- Go to the Microsoft Power BI website (https://powerbi.microsoft.com/en-us/desktop/) to
download Power BI Desktop.
- You will need a Microsoft account to access certain features. Sign in or create an account if
required.
- Configure R and Python with data science IDEs such as RStudio or Jupyter Notebook to harness
their full potential for data analysis.
- For Tableau and Power BI, explore their integration capabilities with databases, cloud services,
and data sources you plan to use for analysis and visualization.
- Verify that all the installed tools are working correctly by creating a simple data analysis and
visualization project.
- Document the installation process, any challenges faced, and how they were overcome. This
documentation will be helpful for future reference and troubleshooting.
8
Tableau Public:
Power BI:
Result:
9
Ex.no : 2 Perform exploratory data analysis (EDA) with datasets like email data set.
Export all your emails as a dataset, import them inside a pandas data
Date : frame, visualize them and get different insights from the data.
Aim :
Perform exploratory data analysis (EDA) with datasets like email data set. Export all your emails as
a dataset, import them inside a pandas data frame, visualize them and get different insights from
the data.
Algorithm
Step 1: Load the data and create a DataFrame.
Step 7: Analyze email frequency over time using a time series plot.
10
Output
11
Result :
Thus EDA on email dataset has been successfully performed.
12
EX NO: 3 Working with Numpy arrays, Pandas dataframe, Basic using Matplotlib
DATE:
Aim
To work with arrays in numpy module, dataframe with pandas module and basic plots with matplotlib module
in python programming.
Step 3: Create the 1-d array, 2-d array by using built-in methods
Step 6: Compute the shape of an array and reshape an array and perform transpose of an array
Step 7: Do the required operations like slicing, iterating and splitting an array element.
Numpy arrays
Program
Output
Program
Output
13
(iii) Numpy array Dimension,Shape,Size,Transpose and Reshaping
Program
Output
Program
Output
Program
Output
Program
Output
Program
14
Output
Pandas DataFrame
Program
Output
Program
Output
Program
Output
Program
Output
15
(v) Basic operations on DataFrame
Program
Output
Program
Output
Program
Output
Step 4: Create a bar plot, use the bar method and customize the appearance, labels, and title.
Step 5: Generate a scatter plot by scatter function to plot your data points and customize the size, color, and style
of the markers, as well as labels and title.
Step 6: Visualize the data using the pie chart and histogram with required parameters
16
Basic plots using Matplotlib
Program
Output
Program
Output
Program
Output
Program
Output
Program
Output
17
(vi) Adding Gridlines to plot
Program
Output
Result
Thus working with arrays in the numpy module, dataframe with pandas module and basic plots with matplotlib
module in python programming has been explored successfully.
18
Ex No : 4 Explore various variable and row filters in R for cleaning
DATE : data. Apply various plot features in R on sample data sets
and visualize.
Aim:
The aim of this project is to explore data cleaning techniques in R, including variable and row filters,
and to apply various plot features to sample datasets to visualize data effectively.
Algorithm:
Step 1: Import the required libraries for data manipulation, such as dplyr for filtering and cleaning, and
ggplot2 for data visualization.
Step 2: Load one or more sample datasets for analysis. These datasets should have some missing values and
outliers for data cleaning demonstrations.
Step 3: Apply variable filters (e.g., removing unnecessary columns) to clean the dataset.
Step 4: Apply row filters (e.g., removing rows with missing values or outliers) to clean the dataset further.
Step 6: Create summary statistics, histograms, and box plots to understand the distribution of variables.
Step 7: Use ggplot2 to create various types of plots, such as bar charts, scatter plots, and line plots, to visualize
the data.
Step 9: Optionally, save the plots as image files for further use or reporting.
Program:
19
Output:
Result:
Thus the process of data manipulation and visualization using R was executed successfully.
20
Ex No : 5 Perform Time Series Analysis and apply the various
DATE : visualization techniques
Aim :
Algorithm :
Step 4: Generate the random values for the date from 1 to 100 using np.random.randint ( ) and convert
it into a Data Frame.
Step 5: Set the Date as index to the DataFrame using df.set_index ( ‘date’ , inplace=True)
Step 6: From the statsmodels import seasonal_decompose to show different time series visuals like
trend , Seasonal , resid etc …
decomposition.Seasonal,decomposition.reside)
Program :
Output:
21
Output:
RESULT :
Thus, the above Times Series program was written and executed successfully.
22
Ex No : 6 Perform data analysis and representation on a map using various map
datasets with mouse rollover effect, user interaction, etc,.
Date :
Aim:
To Perform data analysis and representation on a map using various map datasets with mouse
rollover effect, user interaction, etc,.
Algorithm:
Step 1: Collect and prepare various map datasets, including geographical information (latitude,
longitude), and relevant data attributes.
Step 2: Choose a suitable map library or framework (e.g., Leaflet, Google Maps API) for displaying
the maps.
Step 3: Create a web-based application that renders the map and overlays it with markers,
polygons, or other visual elements based on the dataset.
Step 4: Implement a mouse rollover effect, where users can hover over map elements (markers,
regions) to view additional information related to the data point.
Step 5: Enable user interaction by allowing users to interact with the map, such as zooming in/out,
panning, and selecting specific data points for more details.
Step 6: Perform data analysis on the selected dataset, which may include generating statistics,
clustering, or creating heatmaps based on geographical attributes.
Step 7: Implement filters and visualization options, such as color-coding or varying marker sizes to
represent different data attributes.
Step 8: Design a user-friendly interface with clear controls, legends, and tooltips to help users
understand the data and its representation on the map.
Program:
23
Output:
Result:
The web-based interactive map application that allows users to explore geographical data using
mouse rollover effects, interactive features has been created.
24
Ex No : 7 Build cartographic visualisation of multiple datasets involving various
countries of the world; states, and districts in India
Date :
Aim:
To Build cartographic visualisation of multiple datasets involving various countries of the world;
states, and districts in India etc.
Algorithm:
Step 1: Gather the multiple datasets related to countries, states, and districts, and ensure that each
dataset contains geographical information, such as latitude and longitude, to accurately plot the data
on maps.
Step 2: Choose a mapping library or tool suitable for the project, such as Leaflet, Mapbox, or Google
Maps.
Step 3: Develop a web-based application or interactive dashboard that renders the world map and
India's map. Overlay the map with relevant boundaries (country borders, state borders, district
boundaries).
Step 4: Merge the datasets with the map layers, linking data points to their geographical locations
(countries, states, districts).
Step 5: Implement data visualization techniques to represent the datasets visually on the maps. This
may include choropleth maps, bubble maps, or heatmaps, depending on the nature of the data.
Step 6: Apply color-coding to the map elements to represent different data attributes, making it easy
to distinguish and analyze the information.
Step 7: Enable user interaction by allowing users to zoom in/out, pan, and click on map elements to
access detailed information about the regions or data points.
Step 8: Include legends, labels, and tooltips to help users interpret the visualizations and understand
the meaning of different colors and symbols.
Program:
Output:
25
Result:
26
EX NO: 8 Perform EDA on Wine Quality Data Set
DATE:
Aim
Algorithm
Step 1: Import necessary libraries such as pandas, numpy, matplotlib and seaborn.
Step 2: Read a CSV file ('winequality-red.csv') into a Pandas DataFrame and store it in the variable
Step 3: Take the first column of the DataFrame and split it into multiple columns based on the delimiter ';'
Step 4: Change the data types of the columns in the DataFrame to their appropriate types
Step 5: Check the datatypes, missing values, and summary of the data.
Step 6: Create a histogram of the 'quality' column using Seaborn with labels and a title, and display the plot.
Step 7: Calculate the correlation matrix for the dataset and create a heatmap using Seaborn with
annotations and a title, and display the plot.
Step 8: Create a scatter plot of 'alcohol' vs. 'quality' using Seaborn with labels and a title, and display the plot.
Program :
Output:
Output
27
Output
Output
Output
Output
28
Output
Result :
29
Ex No : 9 Use a case study on a dataset and apply the various EDA and visualization
techniques and present an analysis report
Date :
Aim:
To use a case study on a dataset and apply the various EDA and visualization techniques and present
an analysis report
Algorithm :
Output :
Output :
30
Output :
Output :
Output :
Output :
Output :
Output :
31
Result :
Thus by applying various EDA and visualization techniques we analysed the student score data and
identified important factors affecting the total scores.
32
LAB EXERCISE EVALUATION METHODOLOGY/ PROCEDURE
33