Case Study Data Analytics
Case Study Data Analytics
CASE STUDY
On
Netflix dataset
Submitted by
Submitted to
Mr.Ratnesh dubey
SOET
Department of Computer Science and Applications
Next Steps:
1. Data Cleaning: Handle missing values, correct inconsistencies, and adjust
data types if needed.
Data Summary
• Data Types:
• Statistical Insights:
o Runtimes vary widely, with some entries having extremely low values
(e.g., 4 minutes).
Plan:
1. Convert premiere to a datetime format.
Next Steps:
1. Transform the dataset for EDA by categorizing short-form content.
Below is the complete code broken into sections for clarity. You can run it in a
Python environment (e.g., Jupyter Notebook, Google Colab).
1. Important libraries
2.Loading and Inspecting the Dataset
3.Data Cleaning
4. Exploratory Data Analysis (EDA)
Key Insights to Analyze:
• Most Common Genres
• Most Common Languages
• Distribution of IMDb Scores
• Runtime Distribution
• Trend in Content Production by Year
5. Visualizations
Bar Chart for Top 10 Genres
# Correlation
correlation = netflix_data[['runtime', 'imdb_score']].corr()
print(correlation)
# Visualizing correlation
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap')
plt.show()