Movie Recommendation Project Report
Movie Recommendation Project Report
Project Report
On
MOVIE RECOMMENDATION SYSTEM
Submitted
In partial fulfillment
for the award of the Degree of
Bachelor of Technology
In Department of Computer Science & Engineering
Submitted To Submitted By
Supervisor Name Tushar Jain (K18327)
Designation (CSE)
1
CERTIFICATE
I hereby admit that the work presented in this B.Tech Major Project Part I entitled “Movie
Recommendation System” in partial fulfillment of the requirements for the award of the
Bachelor of Technology in Computer Science & Engineering and submitted to the Department of
Computer Science & Engineering of Career Point University, Kota, Rajasthan is an original
piece of my own work and has not been submitted partially or fully anywhere else. Authorized
contents and copy right material used in this report has been properly cited and obtained
permission from competent authority. The matter presented in this project work has not been
submitted by me for the award of any other degree elsewhere.
Tushar Jain
(k18327)
This is certify that the above statement made by the candidate is correct to the best of my
Knowledge.
2
ACKNOWLEDGEMENT
We are highly grateful to Mr. Rohit Maheshwara, Head of the School of Computer
Science & Engineering of Career Point University, Kota, for his kind support for the
project work. We would like to thank all my friends and all those who have helped
me carrying out this work directly or indirectly without whom completion of this
project work was not possible.
Yours Sincerely
Tushar Jain (k16133)
3
ABSTRACT
In this project, a movie recommended system is built based on the TMDB datasets. We used a
content based filtering method to recommends other movie which are similar to the selected
movies. There is already enough content available on the movie recommendation system.
Showing the movie recommendations are essential so that the user need not waste a lot of time
searching for the content which he/she might like. Thus, the movie recommendation system
plays a vital role to get user personalized movie recommendations.
After searching a lot on the internet and referring to a lot of research papers, we got to know that
the recommendations made using Content-based Filtering are using a single text to vector
conversion technique and a single technique to find the similarity between the vectors. In this
research work, we have used multiple text to vector conversion techniques and manipulated the
results of the multiple algorithms to get the final recommendation list. You can think of it as a
hybrid approach using the Content-based Filtering technique only.
4
INDEX
CERTIFICATE…………………………………………………….…………2
ACKNOWLEDGEMENT…………………………………………...……...3
ABSTRACT…………………………………………………………...….......4
List of Figures………………………………………………………..……...7
List of Tables……………………………………………………..………......8
Contents
1. Introduction………………………………..……………………..10
1. Relevance of the Project………………………………………………..10
2. Problem Statement……………………………………………………...11
3. Objective of the Projects ……………………………………………….11
4. Scope of the Project…………………………………………………….11
5. Methodology for Movie Recommendation…………………………….12
2. Purpose ……………………..………………………..……………..14
3. Literature Survey..…………………………………………………16
1. Movie Recommendation System by K-Means Clustering AND K-Nearest
Neighbor…………………………………………………………………16
2. Movie Recommendation System Using Collaborative Filtering………...17
4. Requirement………………………………………………………..18
1. Hardware Requirements………………………………………………..18
2. Software Specification………………………………………………….16
3. Software Requirements………………………………………………….16
5
5. Analysis and Design………………………………………………..18
1. System Architecture of Proposed System……………………………….18
2. Project Flow……………………………………………………………..18
6. Implementation ……………………………………………………22
1. Cosine Similarity………………………………………………………..22
2. CountVectorizer…………………………………………………………22
7. Datasets…………………….………………………………………24
8. Result AndAnalysis………………………………………………..30
9. Conclusion………………………………………………………….44
10. Reference……………………………………………………………45
6
LIST OF FIGURES
7
LIST OF TABLES
8
9
CHAPTER 1
INTRODUCTION
Movies are a part and parcel of life. There are different types of movies like some for
entertainment, some for educational purposes, some are animated movies for
children, and some are horror movies or action films. Movies can be easily
differentiated through their genres like comedy, thriller, animation, action etc. Other
way to distinguish among movies can be either by releasing year, language, director
etc. Watching movies online, there are a number of movies to search in our most
liked movies . Movie Recommendation Systems helps us to search our preferred
movies among all of these different types of movies and hence reduce the trouble of
spending a lot of time searching for our favorite movies. So, it requires that the movie
recommendation system should be very reliable and should provide us with the
recommendation of movies which are exactly same or most matched with our
preferences.
10
have several benefits, the most important being customer satisfaction and revenue.
Movie Recommendation system is a very powerful and important system. But, due to
the problems associated with pure collaborative approach, movie recommendation
systems also suffer with poor recommendation quality and scalability issues.
11
To eradicate the overload of the data, recommendation system is used as information
filtering tool in social networking sites .Hence, there is a huge scope of exploration in this
field for improving scalability, accuracy and quality of movie recommendation systems
Movie Recommendation system is very powerful and important system. But, due to the
problems associated with pure collaborative approach, movie recommendation systems
also suffers with poor recommendation quality and scalability issues.
We need to perform preprocessing on the dataset and combine the relevant features into a
single feature. Later, we need to convert the text from that particular feature into vectors.
Later, we need to find the similarity between the vectors. Finally, get the
recommendations as per the system architecture mentioned below.
Agile Methodology:
1. Collection of Data Sets: Collecting all required data sets from Kaggle website. In
this project we required a tmdb 5000 movie dataset.
2. Data Analysis: Make sure that the collected data sets are correct and analyzing
the data in the csv files. i.e. checking whether all the column Fields are present in
the data sets.
12
a. Improvement in this project: In the later stage we can implement different
algorithms and methods for better recommendation.
13
CHAPTER2
PURPOSE
The need for movie recommendations has increased with the advent of the internet. People began
to expect more from the new technology, and they wanted to be able to find films easily and
quickly. Now, anyone with a search engine and an internet connection can find anything they
want, including films. With the increasing number of films being published, the number of films
that one can watch, and the number of films that one can discover, has also increased. It can be
quite a challenge for a person to find films that interest them and make a decision about them.
For example, you want to watch a certain type of film, but you have no idea where to start from.
Movies, in general, have become a lot more competitive now than what they used to be a couple
of years ago. More and more filmmakers are trying to find ways to stand out from the crowd, and
it has become much more difficult to find something that interests you. With so many films
competing for your attention, how can you possibly find something new and interesting?
The popularity of movie recommendation systems has increased exponentially in recent years.
People now expect their apps to provide them with recommendations regarding movies.
However, most of these systems are quite limited in their scope of recommendations. They
recommend only a few films, and they don’t help you discover new things. You can build a
movie recommendation system that can help you discover new films as well as help people find
the kinds of films they want to watch. It can analyze the preferences of the users, and then
recommend a specific genre for the user or a list of films that fit a certain theme.
With the increase in the number of films being published, discovering new films has become a
challenge for many people. Finding those films that one loves and those that are entertaining has
14
become difficult. A movie recommendation system can help you discover new films and find
those films that you love. For those filmmakers who want their app to help people discover new
films, a movie recommendation system can be an ideal solution. It can recommend specific
genres or help people find films that fit a specific theme. Now, when you are going through these
articles, you may be thinking why you need a movie recommendation system. But trust me,
building one is quite interesting and fun. Moreover, it can be quite a lucrative business for you as
well. So, let’s get
15
CHAPTER3
LITERATURE SURVEY
Over the years, many recommendation systems have been developed using either collaborative,
content based or hybrid filtering methods. These systems have been implemented using various
big data and machine learning algorithms.
A recommendation system collect data about the user’s preferences either implicitly or
explicitly on different items like movies. An implicit acquisition in the development of
movie recommendation system uses the user’s behaviour while watching the movies. On
the other hand, a explicit acquisition in the development of movie recommendation
system uses the user’s previous ratings or history. The other supporting technique that are
used in the development of recommendation system is clustering. Clustering is a process
to group a set of objects in such a way that objects in the same clusters are more similar
to each other than to those in other clusters. KMeans Clustering along with K-Nearest
Neighbour is implemented on the movie lens dataset in order to obtain the best-optimized
result. In existing technique, the data is scattered which results in a high number of
clusters while in the proposed technique data is gathered and results in a low number of
clusters. The process of recommendation of a movie is optimized in the proposed
scheme. The proposed recommender system predicts the user’s preference of a movie on
the basis of different parameters. The recommender system works on the concept that
people are having common preference or choice. These users will influence each other’s
opinions. This process optimizes the process and has lower RMSE.
16
3.2 Movie Recommendation System Using Collaborative Filtering
17
CHAPTER4
REQUIREMENTS
This chapter involves both the hardware and software requirements needed for the project and
detailed explanation of the specifications.
1. A PC with Windows/Linux OS
2. Processor with 2.40GHz 2.50 GHz speed
3. Minimum of 8gb RAM.
18
the package management system conda. The anaconda distribution includes data-
science packages suitable for Windows, Linux and MacOS.3.
For the computation and analysis we need certain python libraries which are used
to perform analytics. Packages such as SKlearn, Numpy, pandas, Matplotlib,
Flask framework, etc are needed.
19
CHAPTER5
The recommender system stores previous user data like clicks, ratings, and likes to create
a user profile. The more a customer engages, the more accurate future recommendations
are.
20
5.2 Project Flow
Initially load the data sets that are required to build a model the data set that are required
in this project are tmdb_5000_credits.csv and tmdb_5000_movies.csv all the data sets are
available in Kaggle.com. Basically, three models are created using a content-based
approach and then imported into a website using the Streamlit Python library used for
creating web apps. And at last deploy that website to the heroku server.
21
CHAPTER6
IMPLEMENTATION
The Proposed System Makes Use of Different Algorithms and Methods for the implementation
of Content based approach.
Formula:
6.2 CountVectorizer:
CountVectorizer is a great tool provided by the scikit-learn library in Python. It is used to
transform a given text into a vector on the basis of the frequency (count) of each word
that occurs in the entire text. This is helpful when we have multiple such texts, and we
wish to convert each word in each text into vectors (for use in further text analysis).
22
class sklearn.feature_extraction.text.CountVectorizer(*,
input='content', encoding='utf-8', decode_error='strict',
strip_accents=None, lowercase=True, preprocessor=None,
tokenizer=None, stop_words=None, token_pattern='(?u)\b\w\w+\b',
ngram_range=(1, 1), analyzer='word', max_df=1.0, min_df=1,
max_features=None, vocabulary=None, binary=False,
dtype=<class 'numpy.int64'>)
Experimental requirements:
Code: Website(Streamlit):
CHAPTER7
23
DATASET
The ‘TMDB 5000 Movie Dataset’ is taken into consideration for movie recommendation
purposes in this research work. This dataset is available on kaggle.com. The dataset is composed
of 2 CSV files - ‘tmdb_5000_movies.csv’ and ‘tmdb_5000_credits.csv’
24
● ‘status’: It indicates the status of the movie. For example, a movie can be released
or not released which basically indicates the status of that movie.
● ‘tagline’: It consists of the tagline of the movie.
● ‘title’: It consists of the title of the movie.
● ‘vote_average’: It indicates the average of the votes.
● ‘vote_count’: It indicates the vote count.
●
25
● ‘title’: It indicates the title of the movie.
● ‘cast’: It consists of the cast of the movie. Cast implies the actors and actresses who
appear in the movie.
● ‘crew’: It consists of those people who are concerned with the production of the movie.
The Exploratory Data Analysis (EDA) has been inspired by Heeral Dedhia’s blog on
medium.com.
Movies having the genre as Drama are maximum in number as compared to Family movies and
Horror movies. A movie might have multiple genres
26
Fig:7.3 Actor with highest appearance
The above figure indicates the actors with the highest appearance in the decreasing order.
The above figure indicates the directors with the highest appearance in the decreasing order.
27
Fig:7.5 Runtime versus Number of movies
As the runtime increases, number of movies are increasing. After a certain point, as the runtime
increases, the number of movies decreases. There are some exceptions.
There are a lot of movies with lower budget and falling in the range of runtime 70 to runtime
150.
28
Fig:7.7 Revenue versus Budget
It can be seen from the above figure that low budget movies have low revenue in general.
Table:7.3 Director, Keywords, Cast and Genres of a movie are combined into a
single feature titled as ‘tags’
CHAPTER7
29
30