0% found this document useful (0 votes)
2K views9 pages

Final Project1 IMDB Movie Analysis PDF

This document outlines a final project analyzing an IMDB movie dataset. It includes: 1) Cleaning the raw data by removing missing values, duplicates, and inconsistencies. 2) Exploring the cleaned data through descriptive statistics and visualizations to identify trends, patterns, and relationships between variables like highest grossing movies. 3) Drawing conclusions about factors influencing movie performance and proposing solutions based on insights from the data analysis. The goals are to gain understanding of the movie industry through analyzing the IMDB movie metadata, including variables like budget, revenue, ratings, and completing tasks like identifying the most profitable films. Power BI and SQL will be used to manipulate and explore the data.

Uploaded by

meha agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views9 pages

Final Project1 IMDB Movie Analysis PDF

This document outlines a final project analyzing an IMDB movie dataset. It includes: 1) Cleaning the raw data by removing missing values, duplicates, and inconsistencies. 2) Exploring the cleaned data through descriptive statistics and visualizations to identify trends, patterns, and relationships between variables like highest grossing movies. 3) Drawing conclusions about factors influencing movie performance and proposing solutions based on insights from the data analysis. The goals are to gain understanding of the movie industry through analyzing the IMDB movie metadata, including variables like budget, revenue, ratings, and completing tasks like identifying the most profitable films. Power BI and SQL will be used to manipulate and explore the data.

Uploaded by

meha agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

3/5/2023

Final Project-1

IMDB MovIe AnAlysIs

Submitted by

Vivek Vardhan Reddy Gurrala

Project Description

• The provided dataset contains various columns for different IMDB movies.
• The task is to frame a problem that you want to shed light on using the dataset.
• To frame the problem, start by asking "What?" and answering these questions:
o What do you see happening?
o What is your hypothesis for the cause of the problem?
o What is the impact of the problem on stakeholders?
o What is the impact of the problem not being solved?
• These questions will help you define a problem and find the right data to solve it.

This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
1
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023

• The goal of the project is to analyze a dataset of IMDB movies and draw insights from the data.

• The dataset includes various columns such as movie names, budgets, gross revenue, and IMDB
ratings.

• You will need to use a combination of Excel formulas, Power BI and SQL commands to clean and
manipulate the data.

• You will be asked to complete specific tasks, such as identifying the movie with the highest profit
or the top IMDB movies, as well as share your own insights by identifying any problems or trends
in the data.

• You may also be asked to use charts and visualizations to present your findings.

• The overall objective of the project is to gain a better understanding of the movie industry by
analyzing the data and drawing meaningful conclusions.

Project Approach
• Understanding the data: Read the project description carefully and make sure you understand the
problem you are trying to solve. Identify the main questions you want to answer, and any specific
tasks you need to complete.

• Familiarized with the dataset: Take some time to understand the structure of the dataset, the
different variables included, and any potential challenges or limitations. I used the Five Why's
approach developed by Sakichi Toyoda to understand the root cause of the problem.

• Cleaned the data: Identify any missing or invalid values, duplicates, and inconsistencies in
formatting. Remove or impute missing or invalid values, remove duplicates, and standardize
formatting to ensure consistency. After downloading the dataset I cleaned it and developed a
hypothesis to work upon.

• Explored the dataset: Use a combination of descriptive statistics and visualizations to explore the
data and identify any trends, patterns, or relationships between variables. Identify any outliers or
anomalies in the data that may require further investigation.

• Derive insights: Use the insights you have gained from exploring the dataset to draw conclusions
about the problem you defined. Identify any factors that may be contributing to the problem, and
propose potential solutions or areas for further research.

This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
2
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023

Tech Stack

IMDB Metadata
• Colour - Colour of the movie (B&W/Color) • actor_3_name - Actor 3 Name
• Director_Name - Name of the director • facenumber_in_poster - (Unidentified)
• Num_Critic_For_Reviews – Count of critic reviews • Plot_Keywords - Keywords for the plot
• Duration - Movie duration • Movie_Imdb_Link - Movie link on IMDB portal
• Director_Facebook_Likes - Facebook likes for director • num_user_for_reviews - User reviews count
• Actor_3_Facebook_Likes - Facebook likes for Actor 3 • language - Language of the movie
• Actor_2_Name - Actor 2 name • country - Country of the movie
• Actor_1_Facebook_Likes - Facebook likes for Actor 1 • content_rating - Rating for the movie content
• Gross - Gross revenue for the movie • budget - Movie budget
• Genres - Genre of the movie • title_year - Release year of the movie
• Actor_1_Name - Actor 1 Name • actor_2_facebook_likes - Facebook likes for Actor 3
• Movie_Title - Title of the movie • imdb_score - IMDB Score for the movie (out of 10)
• num_voted_users - User votes count • aspect_ratio - Aspect ratio of the movie
• cast_total_facebook_likes - Facebook likes for cast • movie_facebook_likes - Facebook likes of the movie

This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
3
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023

A. Cleaning the data:


Cleaning the data: This is one of the most important step to perform before moving forward with the
analysis. Use your knowledge learned till now to do this. (Dropping columns, removing null values, etc.)
Task: Clean the data

• Cleaned the data step-by-step


analytically as per understanding and
data analysis requirement.
• The data is sorted and filtered
“Blanks” followed by removing
unnecessary values.
• Created new calculated column for
further analysis.

Before Cleaning: 5044 rows and 28 columns |||||| After Cleaning: 3858 rows and 23 columns

B. Movies with highest profit:

Movies with highest profit: Create a new column called profit which contains the difference of the two
columns: gross and budget. Sort the column using the profit column as reference. Plot profit (y-axis) vs
budget (x- axis) and observe the outliers using the appropriate chart type.
Task: Find the movies with the highest profit?

• Visualization charts are generated using MS Power BI tool

This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
4
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023

C. Top 250 Movies:


Top 250: Create a new column IMDb_Top_250 and store the top 250 movies with the highest IMDb Rating (corresponding
to the column: imdb_score). Also make sure that for all of these movies, the num_voted_users is greater than 25,000. Also
add a Rank column containing the values 1 to 250 indicating the ranks of the corresponding films.
Extract all the movies in the IMDb_Top_250 column which are not in the English language and store them in a new
column named Top_Foreign_Lang_Film. You can use your own imagination also!
Task: Find IMDB Top 250

• MySQL script to create new column and analyze


IMDB_Top250 movies respect to their Ranking

Top 250: Create a new column IMDb_Top_250 and store the top 250 movies with the highest IMDb Rating (corresponding
to the column: imdb_score). Also make sure that for all of these movies, the num_voted_users is greater than 25,000. Also
add a Rank column containing the values 1 to 250 indicating the ranks of the corresponding films.
Extract all the movies in the IMDb_Top_250 column which are not in the English language and store them in a new
column named Top_Foreign_Lang_Film. You can use your own imagination also!
Task: Find IMDB Top 250

• MySQL script to create new column Top250_foreign language films


that are non-English.
• There are total 10 non-English films listed in top250 IMDB ratings.
• For this analysis pre created imdb_top250 table is used as reference.

This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
5
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023

D. Best Directors:

Best Directors: TGroup the column using the director_name column.


Find out the top 10 directors for whom the mean of imdb_score is the highest and store them in a new column
top10director. In case of a tie in IMDb score between two directors, sort them alphabetically.
Task: Find the best directors

• MySQL script of Top_10_diectors with mean of imdb_score sorted


from top to bottom.

E. Popular Genres:
Popular Genres: Perform this step using the knowledge gained while performing previous steps.

Task: Find popular genres

OR

• MySQL analysis of popularity by movie genre in the IMDB database with respective ratings.

This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
6
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023

F. Charts:
Charts: Create three new columns namely, Meryl_Streep, Leo_Caprio, and Brad_Pitt which contain the movies in which
the actors: 'Meryl Streep', 'Leonardo DiCaprio', and 'Brad Pitt' are the lead actors. Use only the actor_1_name column for
extraction. Also, make sure that you use the names 'Meryl Streep', 'Leonardo DiCaprio', and 'Brad Pitt' for the said
extraction.

Append the rows of all these columns and store them in a new column named Combined.

Group the combined column using the actor_1_name column.

Find the mean of the num_critic_for_reviews and num_users_for_review and identify the actors which have the highest
mean.

Observe the change in number of voted users over decades using a bar chart. Create a column called decade which
represents the decade to which every movie belongs to. For example, the title_year year 1923, 1925 should be stored as
1920s. Sort the column based on the column decade, group it by decade and find the sum of users voted in each decade.
Store this in a new data frame called df_by_decade.

Task: Find the critic-favorite and audience-favorite actors

• The objective was to find out and list the


movies where Meryl Streep, Leonardo
DiCaprio, and Brad Pitt were lead actors.

• The table was created in excel and


formatted accordingly.

• I used the helper column along with


VLOOKUP() function to list down the
movies by these lead actors

This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
7
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023

• MySQL analysis results of 3 popular actors 'Meryl Streep', 'Leonardo DiCaprio', and 'Brad Pitt’ are sorted by their critic
favorite and audience favorite report.

• The table contains 3 columns, actor_name, Mean of


user_reviews & mean of critic_review.
• This analysis showed that the Critic Favorite actor was Albert
Finney with 750 critic reviews.
• The Audience Favorite actor was Heather Donahue with 3400
user reviews

• During the 2000s, there was a high number of users who voted for movies out of all
other history of decades.

This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
8
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023

Conclusion
• As per the data, budgeting problem turned out to be a problem with the user experience team not working
effectively. It is required to provide a detailed report for the given IMDB data record.3

• According to analysis "Avatar" and "Jurassic Park" have high potential profit gains. The high budget movies as
such in fantasy genres are ideal plots to maximize profit, it may be advisable to consider making movies with
similar films in future.

• Movies made in dramatic genres are more rated than comedy films, multi generic films are more successful when
compared.

• It appears that the movie "The Shawshank Redemption" has the highest IMDB score among those with a minimum
of 25,000 voted users and rated among top250 films throughout the database.

• Through this analysis, I found out that there are a lot of metrics using which the popularity of a movie is rated.

• This project broadened my analytical skills and technical approach towards data analytics. While this also gave me
an edge on cinematic terminology and analyzing movie critics.

THANK YOU…

Drive Link: https://drive.google.com/file/d/1tNssCwu2tV9r8W5enCTrdSvd1aM5zPx/view?usp=sharing

This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
9
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
Powered by TCPDF (www.tcpdf.org)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy