INDEX
INDEX
DATA SOURCE:
PROGRAM:
import pandas as pd
df = pd.read_csv(file_path)
OUTPUT:
STEP 2:DATA CLEANING
Remove duplicates
PROGRAM:
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)
PROGRAM:
OUTPUT:
Sales Summary:
mean 246.498440
50% 85.000000
std 487.567175
Name: Sales, dtype: float64
Profit Summary:
mean 28.610982
50% 9.240000
std 174.340972
Name: Profit, dtype: float64
STEP 4: ANALYSIS
Total Sales per Region
PROGRAM:
sales_per_region = df.groupby("Region")["Sales"].sum()
print(sales_per_region)
OUTPUT:
Region
Africa 783776
Canada 66932
Caribbean 324281
Central 2822399
Central Asia 752839
EMEA 806184
East 678834
North 1248192
North Asia 848349
Oceania 1100207
South 1600960
Southeast Asia 884438
West 725514
Name: Sales, dtype: int64
INTRODUCTION:
The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has had a profound global
impact since early 2020, affecting millions of lives and disrupting economies. To better
understand the spread, trends, and regional impact of the virus, data-driven approaches such as
Exploratory Data Analysis (EDA) are essential. By exploring confirmed cases, recoveries, and
deaths, this analysis aims to uncover insights into the progression of the pandemic, identify the
most affected states, and visualize daily trends in new infections.
print(df.info())
OUTPUT:
OUTPUT:
b) State with the highest number of confirmed cases:
PROGRAM:
top_state=statewise_total[statewise_total['Confirmed']==
statewise_total['Confirmed'].max()]
print("State with highest confirmed cases:\n", top_state)
OUTPUT:
State with highest confirmed cases:
State/UnionTerritory Confirmed Cured Deaths
27 Maharashtra 6363442 6159676 134201
STEP 4: VISUALIZATIONS
a) Pie Chart: Top 5 States by Confirmed Cases
PROGRAM:
import matplotlib.pyplot as plt
top5_states = statewise_total.sort_values('Confirmed', ascending=False).head(5)
plt.figure(figsize=(8, 8))
plt.pie(top5_states['Confirmed'],labels=top5_states['State/UnionTerritory'],
autopct='%1.1f%%', startangle=140)
plt.title('Top 5 Indian States by Confirmed COVID-19 Cases')
plt.show()
OUTPUT:
STEP 5:OBSERVATION
Top affected states (e.g., Maharashtra, Kerala, Karnataka) account for the
majority of confirmed cases.
Trend graph shows multiple waves—sharp increases followed by
declines.
Lockdown periods and vaccination rollouts align with noticeable trend
changes.
Deaths and recovery rates vary by region and wave, highlighting
healthcare disparities.
Ex.No:3 EDA ON YOUTUBE TRENDING VIDEOS DATASET
INTRODUCTION:
YouTube has become a dominant platform for video sharing, content
creation, and audience engagement worldwide. The YouTube Trending Videos
Dataset provides a snapshot of videos that were trending in various regions over
time, offering valuable insights into user preferences, content popularity, and
engagement metrics.
This Exploratory Data Analysis (EDA) aims to uncover trends in video
categories, the frequency of trending videos across different channels, and
patterns in user interactions such as views, likes, and comments. By analyzing
this data, we can better understand what makes a video trend, which content types
perform best, and how users engage with trending content.
DATA SOURCE:
STEP 5:OBSERVATION
Top Categories: Certain categories like music, entertainment, and news
dominate the trending list.
Channel Popularity: A few channels consistently produce trending content.
Engagement Patterns: There's a strong positive correlation between views
and likes.
Outliers: Some videos have extremely high views but relatively low
likes/comments, suggesting passive viewing.