0% found this document useful (0 votes)

20 views16 pages

INDEX

The document outlines three Exploratory Data Analysis (EDA) projects on different datasets: Global Superstore Sales, COVID-19 Global Data, and YouTube Trending Videos. Each project includes steps such as data loading, cleaning, analysis, visualizations, and insights derived from the data. Key findings highlight sales trends, COVID-19 case distributions, and video engagement patterns across the datasets.

Uploaded by

saruhasan1103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views16 pages

INDEX

Uploaded by

saruhasan1103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

INDEX

S.NO TOPIC SIGN

1. EDA ON GLOBAL SUPERSTORE SALES

DATASET

2. EDA ON COVID-19 GLOBAL DATASET

3. EDA ON YOUTUBE TRENDING VIDEOS

DATASET
EX.No:1 EDA ON GLOBAL SUPERSTORE SALES DATASET

EXPLORATORY DATA ANALYSIS (EDA):

Exploratory Data Analysis (EDA) is the process of examining and understanding a
dataset before applying any modeling or predictive techniques. It involves summarizing the
dataset’s main characteristics using statistical measures and visualizations to uncover patterns,
spot anomalies, test hypotheses, and check assumptions. EDA typically includes cleaning the
data (handling missing values and duplicates), generating descriptive statistics (like mean,
median, and standard deviation), and using plots such as histograms, bar charts, and line graphs
to visualize trends and relationships. This step is crucial for gaining insights and making
informed decisions about the direction of further analysis or modeling.

DATA SOURCE:

Dataset link: https://www.kaggle.com/datasets/fatihilhan/global-superstore-dataset

STEP 1: LOAD THE DATASET

PROGRAM:

import pandas as pd

file_path = "/content/GLOBAL DATASTORE.csv"

df = pd.read_csv(file_path)

OUTPUT:
STEP 2:DATA CLEANING

 Check and remove missing values

 Remove duplicates

PROGRAM:

df.dropna(inplace=True)

df.drop_duplicates(inplace=True)

STEP 3: SUMMARY STATISTICS

PROGRAM:

sales_summary = df["Sales"].describe()[["mean", "50%", "std"]]

profit_summary = df["Profit"].describe()[["mean", "50%", "std"]]

print("Sales Summary:\n", sales_summary)

print("Profit Summary:\n", profit_summary)

OUTPUT:

Sales Summary:
mean 246.498440
50% 85.000000
std 487.567175
Name: Sales, dtype: float64
Profit Summary:
mean 28.610982
50% 9.240000
std 174.340972
Name: Profit, dtype: float64
STEP 4: ANALYSIS
Total Sales per Region
PROGRAM:
sales_per_region = df.groupby("Region")["Sales"].sum()
print(sales_per_region)
OUTPUT:
Region
Africa 783776
Canada 66932
Caribbean 324281
Central 2822399
Central Asia 752839
EMEA 806184
East 678834
North 1248192
North Asia 848349
Oceania 1100207
South 1600960
Southeast Asia 884438
West 725514
Name: Sales, dtype: int64

Line Chart: Year-wise Sales Trend

PROGRAM:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")
plt.figure(figsize=(10, 6))
yearly_sales = df.groupby('Year')['Sales'].sum()
sns.lineplot(x=yearly_sales.index, y=yearly_sales.values, marker='o',
color='orange')
plt.title("Year-wise Sales Trend")
plt.xlabel("Year")
plt.ylabel("Total Sales")
plt.tight_layout()
plt.show()
OUTPUT:

GOOGLE COLAB LINK:

https://colab.research.google.com/drive/12ok_SXN84wnqSQL9AzV4OA4e7kD
ohCWQ?usp=sharing
STEP 6: INSIGHTS
Bar Chart – Sales by Region:
 The West region shows the highest total sales, followed by East and
Central.
 South lags behind, indicating potential for growth or marketing focus.

Line Chart – Year-wise Sales Trend:

 Sales have shown a steady upward trend year over year.
 Indicates growing business or improved operations/logistics over time.
Ex.No:2 EDA ON COVID-19 GLOBAL DATASET

INTRODUCTION:
The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has had a profound global
impact since early 2020, affecting millions of lives and disrupting economies. To better
understand the spread, trends, and regional impact of the virus, data-driven approaches such as
Exploratory Data Analysis (EDA) are essential. By exploring confirmed cases, recoveries, and
deaths, this analysis aims to uncover insights into the progression of the pandemic, identify the
most affected states, and visualize daily trends in new infections.

Dataset link: https://www.kaggle.com/datasets/ COVID-19 in India

GOOGLE COLAB LINK:

https://colab.research.google.com/drive/1VEuFN6gRCyIMnIEkwccqlMqENFi11BRv?usp=s
haring

STEP 1: LOAD AND INSPECT THE DATASET

PROGRAM:
import pandas as pd
df = pd.read_csv('path_to_covid_dataset.csv')
print(df.head())
OUTPUT:
PROGRAM:
print(df.columns)

print(df.info())

OUTPUT:

STEP 2: HANDLE MISSING DATA AND CONVERT DATES

PROGRAM:
df.fillna(0, inplace=True)
df['Date'] = pd.to_datetime(df['Date'])

STEP 3: COMPUTE METRICS

a) Total confirmed, recovered, and death cases per state:
PROGRAM:
statewise_total=df.groupby('State/UnionTerritory')[['Confirmed','Cured',
'Deaths']].max().reset_index()
print(statewise_total)

OUTPUT:
b) State with the highest number of confirmed cases:
PROGRAM:
top_state=statewise_total[statewise_total['Confirmed']==
statewise_total['Confirmed'].max()]
print("State with highest confirmed cases:\n", top_state)
OUTPUT:
State with highest confirmed cases:
State/UnionTerritory Confirmed Cured Deaths
27 Maharashtra 6363442 6159676 134201

c) Daily trend of new cases:

PROGRAM:
daily_cases = df.groupby('Date')['Confirmed'].sum().diff().fillna(0)

STEP 4: VISUALIZATIONS
a) Pie Chart: Top 5 States by Confirmed Cases
PROGRAM:
import matplotlib.pyplot as plt
top5_states = statewise_total.sort_values('Confirmed', ascending=False).head(5)
plt.figure(figsize=(8, 8))
plt.pie(top5_states['Confirmed'],labels=top5_states['State/UnionTerritory'],
autopct='%1.1f%%', startangle=140)
plt.title('Top 5 Indian States by Confirmed COVID-19 Cases')
plt.show()
OUTPUT:

b) Line Graph: Daily Trend of Confirmed Cases

PROGRAM:
plt.figure(figsize=(10, 6))
plt.plot(daily_cases.index, daily_cases.values, color='blue')
plt.title('Daily New Confirmed COVID-19 Cases in India')
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.grid(True)
plt.show()
OUTPUT:

STEP 5:OBSERVATION
 Top affected states (e.g., Maharashtra, Kerala, Karnataka) account for the
majority of confirmed cases.
 Trend graph shows multiple waves—sharp increases followed by
declines.
 Lockdown periods and vaccination rollouts align with noticeable trend
changes.
 Deaths and recovery rates vary by region and wave, highlighting
healthcare disparities.
Ex.No:3 EDA ON YOUTUBE TRENDING VIDEOS DATASET

INTRODUCTION:
YouTube has become a dominant platform for video sharing, content
creation, and audience engagement worldwide. The YouTube Trending Videos
Dataset provides a snapshot of videos that were trending in various regions over
time, offering valuable insights into user preferences, content popularity, and
engagement metrics.
This Exploratory Data Analysis (EDA) aims to uncover trends in video
categories, the frequency of trending videos across different channels, and
patterns in user interactions such as views, likes, and comments. By analyzing
this data, we can better understand what makes a video trend, which content types
perform best, and how users engage with trending content.
DATA SOURCE:

Dataset link: https://www.kaggle.com/datasets/anushabellam/Trending videos on

Youtube

STEP 5:OBSERVATION
 Top Categories: Certain categories like music, entertainment, and news
dominate the trending list.
 Channel Popularity: A few channels consistently produce trending content.
 Engagement Patterns: There's a strong positive correlation between views
and likes.
 Outliers: Some videos have extremely high views but relatively low
likes/comments, suggesting passive viewing.

Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
19 pages
NM
No ratings yet
NM
23 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
1 2 Merged
No ratings yet
1 2 Merged
12 pages
Naan Mudhalvan - Data Analytics by Google Lab Manual-2-24!2!23
No ratings yet
Naan Mudhalvan - Data Analytics by Google Lab Manual-2-24!2!23
22 pages
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Cost Behavior and Forecasting: Seventh Edition
No ratings yet
Cost Behavior and Forecasting: Seventh Edition
130 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
121A1079 Sma Exp6
No ratings yet
121A1079 Sma Exp6
7 pages
Datascience 3
No ratings yet
Datascience 3
40 pages
Unit 2
No ratings yet
Unit 2
36 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
139 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
84 pages
Naan Mudhalvan - Google Cloud Data Analytics
No ratings yet
Naan Mudhalvan - Google Cloud Data Analytics
33 pages
BIDA Practical Print
No ratings yet
BIDA Practical Print
56 pages
Guidebook On Exploratory Data Analysis
No ratings yet
Guidebook On Exploratory Data Analysis
27 pages
Randomized Block Design
No ratings yet
Randomized Block Design
7 pages
DEV Lab Record
No ratings yet
DEV Lab Record
46 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
34 pages
DAC Phase2
No ratings yet
DAC Phase2
8 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
CS202 Assignment - 4 - GIKI
No ratings yet
CS202 Assignment - 4 - GIKI
3 pages
CCS347 GD Iat2 QB
No ratings yet
CCS347 GD Iat2 QB
2 pages
ML Report
No ratings yet
ML Report
12 pages
UNIT1
No ratings yet
UNIT1
67 pages
AR&VR Unit 1 Notes
No ratings yet
AR&VR Unit 1 Notes
35 pages
Dev Record Final
No ratings yet
Dev Record Final
34 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Data Exploration and Visualization Unit 3
No ratings yet
Data Exploration and Visualization Unit 3
13 pages
Unit V Argumented Reality
No ratings yet
Unit V Argumented Reality
16 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
EDA Report Week2
No ratings yet
EDA Report Week2
15 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Unit III VR Programming
No ratings yet
Unit III VR Programming
20 pages
Intro
No ratings yet
Intro
26 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
Unit 6
No ratings yet
Unit 6
3 pages
Cs3401 Algorithms Unit III
No ratings yet
Cs3401 Algorithms Unit III
37 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Basics of Structural Equation Modeling
100% (2)
Basics of Structural Equation Modeling
328 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Object Oriented Software Engineering - CCS356 - Important Questions
No ratings yet
Object Oriented Software Engineering - CCS356 - Important Questions
15 pages
Automotive Servicing (Engine Repair) NC II Modules of Instruction Content 1
No ratings yet
Automotive Servicing (Engine Repair) NC II Modules of Instruction Content 1
42 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Unit 1
No ratings yet
Unit 1
23 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Irrigation and Drainage
No ratings yet
Irrigation and Drainage
34 pages
Machine
No ratings yet
Machine
10 pages
DA Assignmnet 4 Based On Format - Solution
No ratings yet
DA Assignmnet 4 Based On Format - Solution
9 pages
Kinds of Quantitative Research
No ratings yet
Kinds of Quantitative Research
3 pages
3b.tp On Data To Design and Prediction To Creation
No ratings yet
3b.tp On Data To Design and Prediction To Creation
2 pages
DEV Lab Material
No ratings yet
DEV Lab Material
16 pages
Exp 12
No ratings yet
Exp 12
7 pages
Learneverythingai
No ratings yet
Learneverythingai
9 pages
DV Lab Manual (Ex - No.1-10)
No ratings yet
DV Lab Manual (Ex - No.1-10)
23 pages
1ST ICONICS Book of Abstract
No ratings yet
1ST ICONICS Book of Abstract
75 pages
Data Analyst Course
No ratings yet
Data Analyst Course
8 pages
ETL Testing Interview Questions
No ratings yet
ETL Testing Interview Questions
33 pages
Statistics Using Excel PDF
No ratings yet
Statistics Using Excel PDF
63 pages
Budget Adminsteration
No ratings yet
Budget Adminsteration
69 pages
Syllabus
No ratings yet
Syllabus
2 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
UNIT 5 Scenario
No ratings yet
UNIT 5 Scenario
5 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Exp 12
No ratings yet
Exp 12
4 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
Research Content Parental Support in The Inclusion of Learners With Visual Impairment
No ratings yet
Research Content Parental Support in The Inclusion of Learners With Visual Impairment
42 pages
Supermarket Sales Data Analysis
No ratings yet
Supermarket Sales Data Analysis
6 pages
Proposal Real Eka Apreni1
No ratings yet
Proposal Real Eka Apreni1
74 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
Homework CH 1 Version1 0
No ratings yet
Homework CH 1 Version1 0
11 pages
Regression Analysis 3
No ratings yet
Regression Analysis 3
6 pages
BSD 3101-Lab Exercise 1
No ratings yet
BSD 3101-Lab Exercise 1
12 pages
MANOVA
No ratings yet
MANOVA
12 pages
Share Data Through The Art of Visualization
No ratings yet
Share Data Through The Art of Visualization
63 pages
Introductory Econometrics Test Bank 5th Edi
100% (1)
Introductory Econometrics Test Bank 5th Edi
140 pages
Tema 1.2 Aplicaţii Fidelitatea Testelor: Case Processing Summary
No ratings yet
Tema 1.2 Aplicaţii Fidelitatea Testelor: Case Processing Summary
6 pages
Okikiola Balogun's Resume
No ratings yet
Okikiola Balogun's Resume
2 pages
Learning Area Grade Level Quarter Date I. Lesson Title Ii. Most Essential Learning Competencies (Melcs) Iii. Content/Core Content
No ratings yet
Learning Area Grade Level Quarter Date I. Lesson Title Ii. Most Essential Learning Competencies (Melcs) Iii. Content/Core Content
4 pages
CodinganalysisinstructionsIndPVQ RR
No ratings yet
CodinganalysisinstructionsIndPVQ RR
4 pages
Muticollinearity of Technical Indicators
No ratings yet
Muticollinearity of Technical Indicators
42 pages
Basic QN Py
No ratings yet
Basic QN Py
1 page
Research Methodology MCQ Questions With Answers
100% (5)
Research Methodology MCQ Questions With Answers
48 pages
Analysis of Variance Anova
No ratings yet
Analysis of Variance Anova
14 pages
Prediction & Forecasting: Regression Analysis
No ratings yet
Prediction & Forecasting: Regression Analysis
3 pages
5 Forecasting PDF
No ratings yet
5 Forecasting PDF
24 pages
Measures of Variations
No ratings yet
Measures of Variations
18 pages
Google - Business Systems Analyst, Android and Business Communication - Google - Hyderabad, Telangana, India - Google Careers
No ratings yet
Google - Business Systems Analyst, Android and Business Communication - Google - Hyderabad, Telangana, India - Google Careers
3 pages
Hubungan Antara Pengambilan Keputusan Dengan Kematangan Emosi Dan
No ratings yet
Hubungan Antara Pengambilan Keputusan Dengan Kematangan Emosi Dan
6 pages
Visualizing Financial Data
From Everand
Visualizing Financial Data
Julie Rodriguez
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

INDEX

Uploaded by

INDEX

Uploaded by

INDEX

S.NO TOPIC SIGN

1. EDA ON GLOBAL SUPERSTORE SALES

2. EDA ON COVID-19 GLOBAL DATASET

3. EDA ON YOUTUBE TRENDING VIDEOS

EXPLORATORY DATA ANALYSIS (EDA):

Dataset link: https://www.kaggle.com/datasets/fatihilhan/global-superstore-dataset

STEP 1: LOAD THE DATASET

file_path = "/content/GLOBAL DATASTORE.csv"

 Check and remove missing values

STEP 3: SUMMARY STATISTICS

sales_summary = df["Sales"].describe()[["mean", "50%", "std"]]

profit_summary = df["Profit"].describe()[["mean", "50%", "std"]]

print("Sales Summary:\n", sales_summary)

print("Profit Summary:\n", profit_summary)

Top 5 Most Profitable Product Categories

Line Chart: Year-wise Sales Trend

GOOGLE COLAB LINK:

Line Chart – Year-wise Sales Trend:

Dataset link: https://www.kaggle.com/datasets/ COVID-19 in India

GOOGLE COLAB LINK:

STEP 1: LOAD AND INSPECT THE DATASET

STEP 2: HANDLE MISSING DATA AND CONVERT DATES

STEP 3: COMPUTE METRICS

c) Daily trend of new cases:

b) Line Graph: Daily Trend of Confirmed Cases

Dataset link: https://www.kaggle.com/datasets/anushabellam/Trending videos on

GOOGLE COLAB LINK:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.