IMDB Movie Analysis Project Report
IMDB Movie Analysis Project Report
PROJECT DESCRIPTION :-
The IMDb Movie Analysis project aims to explore and analyze a comprehensive dataset of
movies available on the IMDb platform. This dataset contains essential information about
movies, including director names, movie titles, duration, genre, budget, gross earnings, IMDb
ratings, and more. Through in-depth data analysis using Excel, Data Visualization and Statistics
techniques this project seeks to extract valuable insights and trends that contribute to a movie's
success.
In this project, I was required to provide a detailed report for the below data record mentioning
the answers of the questions that follows:
A. Movie Genre Analysis: Analyze the distribution of movie genres and their impact on the
IMDB score.
● Task: Determine the most common genres of movies in the dataset. Then, for each genre,
calculate descriptive statistics (mean, median, mode, range, variance, standard deviation)
of the IMDB scores.
B. Movie Duration Analysis: Analyze the distribution of movie durations and its impact on the
IMDB score.
● Task: Analyze the distribution of movie durations and identify the relationship between
movie duration and IMDB score.
C. Language Analysis: Situation: Examine the distribution of movies based on their language.
● Task: Determine the most common languages used in movies and analyze their impact
on the IMDB score using descriptive statistics.
● Task: Identify the top directors based on their average IMDB score and analyze their
contribution to the success of movies using percentile calculations.
E. Budget Analysis: Explore the relationship between movie budgets and their financial success.
● Task: Analyze the correlation between movie budgets and gross earnings, and identify
the movies with the highest profit margin.
MY APPROACH :-
I have gone through the dataset and understood all the given columns. Then I have observed that
there are a total of 28 Columns and 5043 Rows. This dataset consists of unwanted columns, Null
values and Blank rows. So, I have decided to Clean this dataset thoroughly.
1) First, I have deleted the columns which have no relation to our project and don't provide
any valuable insights. In the end, I only left with 9 Columns which are director’s name,
duration, movie title, genre, budget, gross, imdb rating, language and country.
2) Then, I noticed that there were many blank rows. To find them I first clicked on “Find &
Select” then clicked on “go to special” and selected the “blank” option. It highlighted all
the blank rows. Then I clicked the shortcut “CTRL + - ” and selected the “Entire rows”
option. This process deleted the entire blank rows in the dataset.
3) Finally, I also deleted the duplicate rows present in the dataset. Now, I left with a total of
9 Columns and 3786 Rows. The Cleaned Dataset is provided below.
https://docs.google.com/spreadsheets/d/
1QZcrT5BZhKOTA9_pnpaorlPPRI7wW4BCzT_FyVd0YQY/edit?
usp=sharing
TECH STACK :-
For this project, I have used Microsoft Excel 365 to run the functions and get answers for the
above questions. I also used this to plot the graphs.
INSIGHTS :-
https://docs.google.com/spreadsheets/d/1X-
ak_kajhbePb1_NzvtnA8kmErCSwUN9yEJqvIdsac0/edit?usp=sharing
I have noticed that,
1) The Most common movie genres from the dataset are Drama, Comedy, Thriller and
Action.
2) The Average duration of a Movie is 109 minutes. The trendline between the duration vs
imdb score is elevated upward with R^2 = 0.131
3) The Most common languages used in the movies are English, French, Spanish, Mandarin
and German. I have also Observed that the languages Telugu and Persian have the
highest average imdb score.
4) I have identified that Tony Kaye, Charles Chaplin, Alfred Hitchcock, Ron Fricke,
Damien Chazelle, Majid Majidi, Sergio Leone, Christopher Nolan, SS Rajamouli and
Richard Marquand are the top 10 directors with average imdb score >=8.4
5) The Top-5 with highest profits are Avatar, Jurassic World, Titanic, Star Wars: Episode
IV - A New Hope and E.T. The Extra-Terrestrial. The Correlation between budget and
gross is positive.
RESULTS :-
With the help of this project, I have gained valuable experience for data analysis using statistical
knowledge and excel’s data visualization. Through this, I have learnt to apply my data analysis
skills in solving real life problems.