IMDB Movie Analysis Report
IMDB Movie Analysis Report
Project description:
IMDB movie analysis involves analyzing the IMDB movie dataset by performing
various techniques and calculations on some features of the IMDB dataset. Based
on the questions and the task given , the analysis should be proper and should
fulfill the project requirements.
Approach:
● The project begins with loading the dataset into the spreadsheet tool.
● After loading, the first step is to clean the dataset by removing nulls , finding
the missing values, removing duplicates etc.
● Later based on the questions asked and the problem statement , specific
criterias, calculations and visualization are used to make tables, pivot tables,
charts etc.
● The answers to each and every question are stored separately on different
spreadsheets.
Tech-stack used:
Google Sheets
INSIGHTS & RESULTS:
A. Cleaning the data:: This is one of the most important steps to perform before
moving forward with the analysis.
Results:
There were lots of null values out of which , some of the rows containing null
values were dropped and others were manually filled by myself by researching
and finding more about the imdb movies. Also there were lots of duplicated rows
and also some columns which were not very important for the analysis were
dropped during the cleaning process. They are visualized below:
Results:
There are lots of movies with profits , and the movie with the highest profit was
‘Avatar’ . But there were also some movies with losses which can be seen in the second
graph. In the second graph I can clearly see an outlier. It is the movie name ‘The Host’
with a profit of $-12213298588 and this movie had a budget of $12215500000. This
calculation shows that either this movie was not released or it had some issues in
releasing the movie or maybe wrong data collected.
C. Top 250: Find the top 250 movies according IMDB score in the IMDB dataset.
Also find the foreign movies.
Results:
Out of the IMDB top 250 , the top 29 movies are as follows:
D. Best Directors: TGroup the column using the director_name column.
Find out the top 10 directors for whom the mean of imdb_score is the highest and
store them in a new column top 10 director. In case of a tie in IMDb score
between two directors, sort them alphabetically.
Your task: Find the best directors.
Results:
In this analysis , I found that John Blanchard is the director with the highest
average IMDB score which is 9.5 followed by Krzysztof Kieslowski with an
average score of 9.1
E. Popular Genres: Perform this step using the knowledge gained while
performing previous steps.
Your task: Find popular genres
Results: Here are the top 10 genres with the highest number of counts.
Results:
The audience favorite and the critic favorite actor is Leonardo DiCaprio with an average
audience vote of 914 and average critic votes of 330. Also he has the most number of
movies compared to the other 3 actors.
Conclusion:
Here I conclude the report, I got to learn a lot about advanced excel and how to
analyze such data sets effectively. I also got to know a lot about the movies , their
profits and much more.