0% found this document useful (0 votes)

6 views14 pages

NM Assignment

The document presents an Exploratory Data Analysis (EDA) on a Netflix dataset, focusing on understanding content trends, types, and viewer preferences. Key findings include a predominance of movies over TV shows, significant content production growth since 2018, and the dominance of genres like Drama and Comedy. The analysis highlights the U.S. as the top content producer, with notable contributions from India and the U.K., and emphasizes Netflix's strategy to cater to diverse global audiences.

Uploaded by

mohanaramanan75

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views14 pages

NM Assignment

Uploaded by

mohanaramanan75

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

NAAN MUDHALVAN - ASSIGNMENT SUBMISSION

NAME : PREMALATHA S

REGISTER NO. : 421322104033

DEPARTMENT : COMPUTER SCIENCE & ENGINEERING

NAAN MUDHALVAN ID : aut421322104030

PROJECT TITLE : EXPLORATORY DATA ANALYSIS (EDA)

ON NETFLIX

COLLEGE CODE : 4213

COLLEGE NAME :KRISHNASAMY COLLEGE OF

ENGINEERING & TECHNOLOGY
Exploratory Data Analysis (EDA) on Netflix Dataset

INTRODUCTION:
In the era of digital streaming, Netflix has emerged as a dominant platform,
offering a wide range of movies and TV shows across different countries and genres.
To better understand the nature and diversity of Netflix’s content, we explore a
publicly available Netflix dataset containing detailed information such as title, type
(Movie or TV Show), director, cast, country of origin, release year, rating, duration,
genres, and date added to the platform. This dataset provides a rich foundation for
analyzing trends in content production and distribution.

To gain meaningful insights from this data, we apply Exploratory Data Analysis
(EDA), a crucial step in the data science process. EDA allows us to examine the
structure of the dataset, identify missing or inconsistent data, and uncover patterns
related to genre popularity, content duration, regional production, and release
timelines. Through data visualization and statistical summaries, EDA helps us
understand how Netflix’s catalog has evolved over time and supports further decision-
making or modeling efforts. This analysis aims to reveal hidden trends in Netflix’s
content strategy and viewer preferences.

OBJECTIVE:

The primary objective of this Exploratory Data Analysis (EDA) is to thoroughly

examine and understand the Netflix dataset by uncovering meaningful patterns, trends,
and relationships within the data. This includes analyzing the distribution of content
types (Movies vs. TV Shows), exploring release trends over the years, identifying the
most common genres and content ratings, evaluating the geographic diversity of
Netflix's catalog, and detecting missing or inconsistent values. The analysis will be
supported by effective data visualizations and will aim to provide actionable insights
that reflect Netflix’s content strategy and viewer engagement trends. This EDA will
also serve as a foundation for further modeling or decision-making processes.

DATA SOURCE:

Netflix Movies and TV Shows Dataset:

Dataset Link: https://www.kaggle.com/datasets/shivamb/netflix-shows

This dataset includes information about TV shows and movies available on Netflix,
including their type, director, cast, country, release year, rating, and more.
Part 1: Data Loading and Understanding

a) Load the dataset into your tool of choice (Excel / Python / Power BI / Tableau).

Program:

import pandas as pd

# Load the dataset

df = pd.read_csv("/content/netflix_titles.csv")

b) Display the first 5–10 rows of the dataset.

Program:

print(df.head(10))

Output:

c) Check the number of rows and columns.

Program:

import pandas as pd

# Load the dataset

df = pd.read_csv("/content/netflix_titles.csv")

# Get the shape of the DataFrame

print(f"Number of rows: {df.shape[0]}")
print(f"Number of columns: {df.shape[1]}")

Output:

d) Identify any missing or inconsistent data.

Program:

import pandas as pd

# Check for missing values in each column

missing_data = df.isnull().sum()

# Print the missing data count for each column

print("Missing data count per column:")
print(missing_data)

# Check for inconsistent data types (e.g., numbers in text

columns)
print("\nData types of each column:")
print(df.dtypes)

# Check for any rows with inconsistencies (optional: check for

non-numeric values in numeric columns)
# For example, if you expect a column to have numeric values
only:
inconsistent_data
=df[df["release_year"].apply(pd.to_numeric,errors='coerce').isn
a()]
print("\nRows with inconsistent data (non-numeric in numeric
column):")
print(inconsistent_data)
Output:

Part 2: Data Cleaning (if necessary):

a) Handle missing values (either by filling or removing them).

Program:

import pandas as pd
df["country"] = df["country"].fillna(df["country"].mode()[0])
df.drop_duplicates(inplace=True)
print("After handling missing values and removing duplicates:")
print(f"Number of rows: {df.shape[0]}")
print(f"Number of columns: {df.shape[1]}")
Output:

Part 3: Exploratory Data Analysis (EDA):

a) Find the total number of Movies vs TV Shows.

Program:

import pandas as pd
print(df['type'].unique()) # This is to check the unique values in the 'type' column

# Count the number of Movies and TV Shows

movie_count = df[df['type'] == 'Movie'].shape[0]
tv_show_count = df[df['type'] == 'TV Show'].shape[0]

# Display the result

print(f"Total number of Movies: {movie_count}")
print(f"Total number of TV Shows: {tv_show_count}")

Output:

b) Identify the top 5 countries producing the most Netflix content.

Program:

import pandas as pd
df['country'] = df['country'].fillna('') # Fill NaN values with an empty string
country_count = df['country'].str.split(',',
expand=True).stack().str.strip().value_counts()
top_5_countries = country_count.head(5)
print("\nTop 5 Countries Producing the Most Netflix Content:")
print(top_5_countries)

Output:

c) Find the top 10 directors with the highest number of shows/movies.

Program:

import pandas as pd
df['director'] = df['director'].fillna('Jay Oliva')
director_count=df['director'].str.split(',',expand=True).stack().str.strip().value_counts()
top_10_directors = director_count.head(10)
# Display the top 10 directors
print("\nTop 10 Directors with the Most Movies/Shows:")
print(top_10_directors)

Output:
d) Find out the most common genres (column: listed_in).

Program:

import pandas as pd
df['listed_in'] = df['listed_in'].fillna('crime_thriller') # Fill NaN values with empty
string
genre_count=df['listed_in'].str.split(',',expand=True).stack().str.strip().value_counts()
top_genres = genre_count.head(10) # Adjust the number for top N genres you want
# Display the most common genres
print("\nMost Common Genres:")
print(top_genres)

Output:

e) Analyze the trend: How many shows/movies were released each year?

Program:

import pandas as pd

df['release_year'] = pd.to_numeric(df['release_year'], errors='coerce')

# Group by 'release_year' and count the number of shows/movies

print(df.groupby('release_year').size())
Output:

Part 4: Visualizations

Create at least three types of visualizations:

a) One bar chart (e.g., number of movies vs TV shows).

Program:

import pandas as pd
import matplotlib.pyplot as plt
content_count = df['type'].value_counts()
plt.figure(figsize=(8, 6))
content_count.plot(kind='bar', color=['skyblue', 'lightgreen'])
plt.title('Number of Movies vs TV Shows')
plt.xlabel('Content Type')
plt.ylabel('Count')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()
Output:

b) One pie chart (e.g., top 5 countries by number of productions).

Program:

import pandas as pd
import matplotlib.pyplot as plt
country_production_count = df.groupby('country').size().sort_values(ascending=False)
# Select the top 5 countries by production count
top_5_countries = country_production_count.head(5)
# Create a pie chart for the top 5 countries by production count
plt.figure(figsize=(8, 8))
plt.pie(top_5_countries, labels=top_5_countries.index, autopct='%1.1f%%',
startangle=140, colors=['#ff9999','#66b3ff','#99ff99','#ffcc99','#c2c2f0'])
# Title for the pie chart
plt.title('Top 5 Countries by Number of Netflix Productions')
# Display the pie chart
plt.show()
Output:

c) One line graph (e.g., number of releases per year).

Program:

import pandas as pd
import matplotlib.pyplot as plt
df['release_year'] = pd.to_numeric(df['release_year'], errors='coerce')
Group by 'release_year' and count the number of shows/movies
release_trend = df.groupby('release_year').size()
Plot the trend
plt.figure(figsize=(12, 6))
release_trend.plot(kind='line', color='b', marker='o', linestyle='-', linewidth=2,
markersize=5)
plt.title('Trend of Shows/Movies Released Each Year')
plt.xlabel('Year')
plt.ylabel('Number of Shows/Movies Released')
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
# Show the plot
plt.show()
Output:

d) Create a word cloud showing most frequent actors or genres.

Program:

import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
all_genres = ' '.join(df['listed_in'].dropna().astype(str))
# Generate the word cloud
wordcloud = WordCloud(width=800, height=400,
background_color='white').generate(all_genres)
# Display the word cloud
plt.figure(figsize=(10, 8))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off') # Hide the axes
plt.title('Most Frequent Genres in Netflix Dataset')
plt.show()
Output:

Part 5: Insight Writing:

For each analysis, write 2–3 sentences of insight.

Movies vs TV Shows:

Movies make up the majority of Netflix content, accounting for about 70% of the total
titles, while TV Shows represent the remaining 30%. This suggests that Netflix
emphasizes one-time viewing content more heavily than episodic series.

Top 5 Countries:

The United States leads as the top producer of Netflix content, followed by India, the
United Kingdom, Canada, and Brazil. This reflects Netflix's strong presence in
English-speaking countries, along with significant contributions from India and other
global regions.

Top 5 Directors:

The dataset reveals that directors like Martin Scorsese, David Fincher, Steven
Spielberg, Quentin Tarantino, and Christopher Nolan are among the most
frequently credited on Netflix content. This indicates the platform’s interest in big-
name directors known for their high-quality and influential work across both movies
and TV shows.
Common Genres:

Drama is the most prevalent genre across Netflix content, followed by Comedy and
Action. These genres dominate the platform, indicating Netflix’s focus on engaging,
broad-appeal content that resonates with diverse audiences.

Releases per Year:

The number of Netflix releases has significantly increased, especially from 2018
onward, with a noticeable spike in 2020. This suggests that Netflix ramped up its
content production to meet growing global demand and competition in the streaming
industry.

Google Colab Link:

https://colab.research.google.com/drive/1CVrxGwfDbgR0ssLJ2rxSnf3thayzobH?usp=
sharing

CONCLUSION:

The analysis of the Netflix dataset reveals several key trends about the
platform’s content strategy. Netflix has a stronger focus on movies, with a higher
number of titles in that category compared to TV shows. Content production has seen
substantial growth since 2018, with a marked increase in 2020, reflecting the
platform’s expansion. The U.S. leads in content production, though countries like
India, the U.K., and Brazil also contribute significantly to Netflix’s diverse library.
Popular genres like Drama, Comedy, and Action dominate, indicating Netflix's focus
on broad, global appeal. The involvement of renowned directors further emphasizes
Netflix's commitment to high-quality, engaging content. Overall, Netflix's catalog
shows a clear strategy to cater to a wide range of tastes and expand its international
reach.

Netflix Data Analysis
No ratings yet
Netflix Data Analysis
11 pages
Netflix Movies and TV Shows Clustering
No ratings yet
Netflix Movies and TV Shows Clustering
29 pages
Netflix Case
0% (1)
Netflix Case
19 pages
Case Study Data Analytics
No ratings yet
Case Study Data Analytics
12 pages
Silt Control in Irrigation Channels
100% (1)
Silt Control in Irrigation Channels
36 pages
Netflix Businesscase ShivangKhare
No ratings yet
Netflix Businesscase ShivangKhare
73 pages
Anurag Chaturvedi Netflix - Jupyter - Notebook Case Study
No ratings yet
Anurag Chaturvedi Netflix - Jupyter - Notebook Case Study
27 pages
Netflix Case Study by Pavithran
No ratings yet
Netflix Case Study by Pavithran
36 pages
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
No ratings yet
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
27 pages
Netflix Data Analysis 1683296773
No ratings yet
Netflix Data Analysis 1683296773
14 pages
Netflix Data Analysis Vashisht
No ratings yet
Netflix Data Analysis Vashisht
29 pages
15 Pandas Function For 90 - of The Work
No ratings yet
15 Pandas Function For 90 - of The Work
12 pages
AIML Mod4 Loki
No ratings yet
AIML Mod4 Loki
11 pages
IMDb+Movie+Assignment Stub
No ratings yet
IMDb+Movie+Assignment Stub
9 pages
15 Funciones Esenciales de Pandas
No ratings yet
15 Funciones Esenciales de Pandas
12 pages
Netflix Ip Investigatory Project XLL-C
No ratings yet
Netflix Ip Investigatory Project XLL-C
22 pages
121A1079 Sma Exp6
No ratings yet
121A1079 Sma Exp6
7 pages
SQL Proj
No ratings yet
SQL Proj
16 pages
Netflix Data Analysis Project
No ratings yet
Netflix Data Analysis Project
16 pages
Example Project
No ratings yet
Example Project
31 pages
Netflix - Jupyter Notebook
No ratings yet
Netflix - Jupyter Notebook
20 pages
Netflix Data - Cleaning, Analysis and Visualization - (Data Analyst)
No ratings yet
Netflix Data - Cleaning, Analysis and Visualization - (Data Analyst)
24 pages
Ielts Writing Task 2
No ratings yet
Ielts Writing Task 2
52 pages
Movies Statistical Analysis
No ratings yet
Movies Statistical Analysis
3 pages
Pandas Prac
No ratings yet
Pandas Prac
4 pages
Visualizing Netflix Data Using Python!
No ratings yet
Visualizing Netflix Data Using Python!
13 pages
TDA Week1project
No ratings yet
TDA Week1project
1 page
Project On Netflix Data Analysis
100% (1)
Project On Netflix Data Analysis
22 pages
STA220 FInal Project Report
No ratings yet
STA220 FInal Project Report
30 pages
Juarez Cartel Suit
No ratings yet
Juarez Cartel Suit
52 pages
Naan Muthalvan Practical Sample
No ratings yet
Naan Muthalvan Practical Sample
7 pages
DVB 11,12 Exp
No ratings yet
DVB 11,12 Exp
8 pages
18BCS053
No ratings yet
18BCS053
17 pages
A Clearer View of Crystallizers
No ratings yet
A Clearer View of Crystallizers
5 pages
Dat7302 Bda Assessment Brief
No ratings yet
Dat7302 Bda Assessment Brief
9 pages
Tableu Ca Suheal Updated
No ratings yet
Tableu Ca Suheal Updated
16 pages
Tableau Case Study
No ratings yet
Tableau Case Study
1 page
Netflix Case Study
No ratings yet
Netflix Case Study
12 pages
R Project 98
No ratings yet
R Project 98
15 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Netflix Data Exploration Solution Approach
No ratings yet
Netflix Data Exploration Solution Approach
6 pages
Business Intelligence Project Report
No ratings yet
Business Intelligence Project Report
14 pages
Mta24112594 - Ioqm - 23 - 08 - 2024 22 - 44 - 46
No ratings yet
Mta24112594 - Ioqm - 23 - 08 - 2024 22 - 44 - 46
2 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Tableu Ca Suheal
No ratings yet
Tableu Ca Suheal
13 pages
Assignment Final
No ratings yet
Assignment Final
1 page
Data Analysis Netflix - Ba
No ratings yet
Data Analysis Netflix - Ba
9 pages
ITU Big Data Projects Summer15
No ratings yet
ITU Big Data Projects Summer15
14 pages
FM11SB 7.8
No ratings yet
FM11SB 7.8
9 pages
Netflix Analysis Report (2105878 - Bibhudutta Swain)
No ratings yet
Netflix Analysis Report (2105878 - Bibhudutta Swain)
19 pages
Netflix Data Analysis
No ratings yet
Netflix Data Analysis
23 pages
Technical Documenetflix Technicalnt
No ratings yet
Technical Documenetflix Technicalnt
15 pages
Submission I - Case Study For PGDDS (Semester II)
No ratings yet
Submission I - Case Study For PGDDS (Semester II)
14 pages
Ads - Phase 5
No ratings yet
Ads - Phase 5
14 pages
Chapter03 PRJ Requirements
No ratings yet
Chapter03 PRJ Requirements
2 pages
Netflix Content Analysis Using Python
No ratings yet
Netflix Content Analysis Using Python
16 pages
Tableu Ca Suheal
No ratings yet
Tableu Ca Suheal
16 pages
Sneha Kumari - 262 - DS Project.
No ratings yet
Sneha Kumari - 262 - DS Project.
19 pages
Lab 3 Sentimental Analysis
No ratings yet
Lab 3 Sentimental Analysis
5 pages
Horizon (Ceiling Hung) : Key Features
No ratings yet
Horizon (Ceiling Hung) : Key Features
2 pages
Datascience Pepar
No ratings yet
Datascience Pepar
9 pages
Sun and Eames in ST of Energy 1995
No ratings yet
Sun and Eames in ST of Energy 1995
16 pages
Thcs An Lac - Thi HK I. k9. 2020-2021
No ratings yet
Thcs An Lac - Thi HK I. k9. 2020-2021
8 pages
Analyzing Netflix Data
No ratings yet
Analyzing Netflix Data
9 pages
Business Case - Netflix - Data Exploration and Visualisation - Ipynb - Colab
No ratings yet
Business Case - Netflix - Data Exploration and Visualisation - Ipynb - Colab
9 pages
EDA Case Study
No ratings yet
EDA Case Study
2 pages
Powerbi Questions
No ratings yet
Powerbi Questions
2 pages
Value of Philippine Literature
No ratings yet
Value of Philippine Literature
14 pages
Activity 2 - Crossword Puzzle
No ratings yet
Activity 2 - Crossword Puzzle
2 pages
(-) Collapse All: Jamnagar Municipal Corporation (JMC)
No ratings yet
(-) Collapse All: Jamnagar Municipal Corporation (JMC)
5 pages
Ilpobservation Submission 1163492602
No ratings yet
Ilpobservation Submission 1163492602
11 pages
Pb2 Eng Set 2 AK
No ratings yet
Pb2 Eng Set 2 AK
6 pages
Week 9 Lecture
No ratings yet
Week 9 Lecture
89 pages
Tetra 350-450MHz Mini Repeater Brochure, V1.04
No ratings yet
Tetra 350-450MHz Mini Repeater Brochure, V1.04
2 pages
PV-RCNN: Point-Voxel Feature Set Abstraction For 3D Object Detection
No ratings yet
PV-RCNN: Point-Voxel Feature Set Abstraction For 3D Object Detection
11 pages
MMT Bus E-Ticket Nu 25147911932077 Hyderabad-Pune
No ratings yet
MMT Bus E-Ticket Nu 25147911932077 Hyderabad-Pune
2 pages
VAC Choke Multivariadores sandCoresDatasheet
No ratings yet
VAC Choke Multivariadores sandCoresDatasheet
16 pages
GE 7 - STS Module 5
No ratings yet
GE 7 - STS Module 5
16 pages
Technical Docs of NETFLIX MOVIES AND TV SHOWS CLUSTERING
No ratings yet
Technical Docs of NETFLIX MOVIES AND TV SHOWS CLUSTERING
12 pages
Teaching Early Numeracy Skills Hands-On Learning in Times of The Covid-19 Pandemic
No ratings yet
Teaching Early Numeracy Skills Hands-On Learning in Times of The Covid-19 Pandemic
17 pages
BLANKS: Checks The BOD Water & BOD Bottles: Notes
No ratings yet
BLANKS: Checks The BOD Water & BOD Bottles: Notes
2 pages
Wheel Decide Tutorial - Youtube
No ratings yet
Wheel Decide Tutorial - Youtube
3 pages
Log
No ratings yet
Log
2 pages
High Performance, Flexible, Solid-State Supercapacitors Based On A Renewable and Biodegradable Mesoporous Cellulose Membrane
No ratings yet
High Performance, Flexible, Solid-State Supercapacitors Based On A Renewable and Biodegradable Mesoporous Cellulose Membrane
9 pages
Advantage of Using PLC in Industrial Automation
No ratings yet
Advantage of Using PLC in Industrial Automation
2 pages
Intano 11 Cypress - Assignment N1 CS7
No ratings yet
Intano 11 Cypress - Assignment N1 CS7
1 page
Flatlined - Study Notes
No ratings yet
Flatlined - Study Notes
27 pages
BR12 TDS BladeRep Topcoat 12 EN 01
No ratings yet
BR12 TDS BladeRep Topcoat 12 EN 01
2 pages
Pmi RMP Handbook
No ratings yet
Pmi RMP Handbook
39 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NM Assignment

Uploaded by

NM Assignment

Uploaded by

NAAN MUDHALVAN - ASSIGNMENT SUBMISSION

REGISTER NO. : 421322104033

DEPARTMENT : COMPUTER SCIENCE & ENGINEERING

NAAN MUDHALVAN ID : aut421322104030

PROJECT TITLE : EXPLORATORY DATA ANALYSIS (EDA)

COLLEGE CODE : 4213

COLLEGE NAME :KRISHNASAMY COLLEGE OF

The primary objective of this Exploratory Data Analysis (EDA) is to thoroughly

Netflix Movies and TV Shows Dataset:

Dataset Link: https://www.kaggle.com/datasets/shivamb/netflix-shows

# Load the dataset

b) Display the first 5–10 rows of the dataset.

c) Check the number of rows and columns.

# Load the dataset

# Get the shape of the DataFrame

d) Identify any missing or inconsistent data.

# Check for missing values in each column

# Print the missing data count for each column

# Check for inconsistent data types (e.g., numbers in text

# Check for any rows with inconsistencies (optional: check for

Part 2: Data Cleaning (if necessary):

a) Handle missing values (either by filling or removing them).

Part 3: Exploratory Data Analysis (EDA):

a) Find the total number of Movies vs TV Shows.

# Count the number of Movies and TV Shows

# Display the result

b) Identify the top 5 countries producing the most Netflix content.

c) Find the top 10 directors with the highest number of shows/movies.

df['release_year'] = pd.to_numeric(df['release_year'], errors='coerce')

# Group by 'release_year' and count the number of shows/movies

Create at least three types of visualizations:

a) One bar chart (e.g., number of movies vs TV shows).

b) One pie chart (e.g., top 5 countries by number of productions).

c) One line graph (e.g., number of releases per year).

d) Create a word cloud showing most frequent actors or genres.

Part 5: Insight Writing:

For each analysis, write 2–3 sentences of insight.

Releases per Year:

Google Colab Link:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.