Final Project1 IMDB Movie Analysis PDF
Final Project1 IMDB Movie Analysis PDF
Final Project-1
Submitted by
Project Description
• The provided dataset contains various columns for different IMDB movies.
• The task is to frame a problem that you want to shed light on using the dataset.
• To frame the problem, start by asking "What?" and answering these questions:
o What do you see happening?
o What is your hypothesis for the cause of the problem?
o What is the impact of the problem on stakeholders?
o What is the impact of the problem not being solved?
• These questions will help you define a problem and find the right data to solve it.
This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
1
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023
• The goal of the project is to analyze a dataset of IMDB movies and draw insights from the data.
• The dataset includes various columns such as movie names, budgets, gross revenue, and IMDB
ratings.
• You will need to use a combination of Excel formulas, Power BI and SQL commands to clean and
manipulate the data.
• You will be asked to complete specific tasks, such as identifying the movie with the highest profit
or the top IMDB movies, as well as share your own insights by identifying any problems or trends
in the data.
• You may also be asked to use charts and visualizations to present your findings.
• The overall objective of the project is to gain a better understanding of the movie industry by
analyzing the data and drawing meaningful conclusions.
Project Approach
• Understanding the data: Read the project description carefully and make sure you understand the
problem you are trying to solve. Identify the main questions you want to answer, and any specific
tasks you need to complete.
• Familiarized with the dataset: Take some time to understand the structure of the dataset, the
different variables included, and any potential challenges or limitations. I used the Five Why's
approach developed by Sakichi Toyoda to understand the root cause of the problem.
• Cleaned the data: Identify any missing or invalid values, duplicates, and inconsistencies in
formatting. Remove or impute missing or invalid values, remove duplicates, and standardize
formatting to ensure consistency. After downloading the dataset I cleaned it and developed a
hypothesis to work upon.
• Explored the dataset: Use a combination of descriptive statistics and visualizations to explore the
data and identify any trends, patterns, or relationships between variables. Identify any outliers or
anomalies in the data that may require further investigation.
• Derive insights: Use the insights you have gained from exploring the dataset to draw conclusions
about the problem you defined. Identify any factors that may be contributing to the problem, and
propose potential solutions or areas for further research.
This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
2
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023
Tech Stack
IMDB Metadata
• Colour - Colour of the movie (B&W/Color) • actor_3_name - Actor 3 Name
• Director_Name - Name of the director • facenumber_in_poster - (Unidentified)
• Num_Critic_For_Reviews – Count of critic reviews • Plot_Keywords - Keywords for the plot
• Duration - Movie duration • Movie_Imdb_Link - Movie link on IMDB portal
• Director_Facebook_Likes - Facebook likes for director • num_user_for_reviews - User reviews count
• Actor_3_Facebook_Likes - Facebook likes for Actor 3 • language - Language of the movie
• Actor_2_Name - Actor 2 name • country - Country of the movie
• Actor_1_Facebook_Likes - Facebook likes for Actor 1 • content_rating - Rating for the movie content
• Gross - Gross revenue for the movie • budget - Movie budget
• Genres - Genre of the movie • title_year - Release year of the movie
• Actor_1_Name - Actor 1 Name • actor_2_facebook_likes - Facebook likes for Actor 3
• Movie_Title - Title of the movie • imdb_score - IMDB Score for the movie (out of 10)
• num_voted_users - User votes count • aspect_ratio - Aspect ratio of the movie
• cast_total_facebook_likes - Facebook likes for cast • movie_facebook_likes - Facebook likes of the movie
This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
3
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023
Before Cleaning: 5044 rows and 28 columns |||||| After Cleaning: 3858 rows and 23 columns
Movies with highest profit: Create a new column called profit which contains the difference of the two
columns: gross and budget. Sort the column using the profit column as reference. Plot profit (y-axis) vs
budget (x- axis) and observe the outliers using the appropriate chart type.
Task: Find the movies with the highest profit?
This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
4
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023
Top 250: Create a new column IMDb_Top_250 and store the top 250 movies with the highest IMDb Rating (corresponding
to the column: imdb_score). Also make sure that for all of these movies, the num_voted_users is greater than 25,000. Also
add a Rank column containing the values 1 to 250 indicating the ranks of the corresponding films.
Extract all the movies in the IMDb_Top_250 column which are not in the English language and store them in a new
column named Top_Foreign_Lang_Film. You can use your own imagination also!
Task: Find IMDB Top 250
This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
5
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023
D. Best Directors:
E. Popular Genres:
Popular Genres: Perform this step using the knowledge gained while performing previous steps.
OR
• MySQL analysis of popularity by movie genre in the IMDB database with respective ratings.
This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
6
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023
F. Charts:
Charts: Create three new columns namely, Meryl_Streep, Leo_Caprio, and Brad_Pitt which contain the movies in which
the actors: 'Meryl Streep', 'Leonardo DiCaprio', and 'Brad Pitt' are the lead actors. Use only the actor_1_name column for
extraction. Also, make sure that you use the names 'Meryl Streep', 'Leonardo DiCaprio', and 'Brad Pitt' for the said
extraction.
Append the rows of all these columns and store them in a new column named Combined.
Find the mean of the num_critic_for_reviews and num_users_for_review and identify the actors which have the highest
mean.
Observe the change in number of voted users over decades using a bar chart. Create a column called decade which
represents the decade to which every movie belongs to. For example, the title_year year 1923, 1925 should be stored as
1920s. Sort the column based on the column decade, group it by decade and find the sum of users voted in each decade.
Store this in a new data frame called df_by_decade.
This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
7
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023
• MySQL analysis results of 3 popular actors 'Meryl Streep', 'Leonardo DiCaprio', and 'Brad Pitt’ are sorted by their critic
favorite and audience favorite report.
• During the 2000s, there was a high number of users who voted for movies out of all
other history of decades.
This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
8
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
3/5/2023
Conclusion
• As per the data, budgeting problem turned out to be a problem with the user experience team not working
effectively. It is required to provide a detailed report for the given IMDB data record.3
• According to analysis "Avatar" and "Jurassic Park" have high potential profit gains. The high budget movies as
such in fantasy genres are ideal plots to maximize profit, it may be advisable to consider making movies with
similar films in future.
• Movies made in dramatic genres are more rated than comedy films, multi generic films are more successful when
compared.
• It appears that the movie "The Shawshank Redemption" has the highest IMDB score among those with a minimum
of 25,000 voted users and rated among top250 films throughout the database.
• Through this analysis, I found out that there are a lot of metrics using which the popularity of a movie is rated.
• This project broadened my analytical skills and technical approach towards data analytics. While this also gave me
an edge on cinematic terminology and analyzing movie critics.
THANK YOU…
This study source was downloaded by 100000859054411 from CourseHero.com on 04-30-2023 02:45:18 GMT -05:00
9
https://www.coursehero.com/file/195639799/Final-Project1-IMDB-Movie-Analysispdf/
Powered by TCPDF (www.tcpdf.org)