90% found this document useful (10 votes)
11K views30 pages

Movie Recommendation Project Report

This document describes a project report on building a movie recommendation system. It discusses the relevance of recommendation systems, outlines the problem statement and objectives which are improving accuracy, quality and scalability. It also briefly describes the methodology that will be used for movie recommendations.

Uploaded by

Tushar Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
90% found this document useful (10 votes)
11K views30 pages

Movie Recommendation Project Report

This document describes a project report on building a movie recommendation system. It discusses the relevance of recommendation systems, outlines the problem statement and objectives which are improving accuracy, quality and scalability. It also briefly describes the methodology that will be used for movie recommendations.

Uploaded by

Tushar Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

A

Project Report
On
MOVIE RECOMMENDATION SYSTEM
Submitted
In partial fulfillment
for the award of the Degree of
Bachelor of Technology
In Department of Computer Science & Engineering

July - Dec 2022

Submitted To Submitted By
Supervisor Name Tushar Jain (K18327)
Designation (CSE)

Department of Computer Science and Engineering


Career Point University, Kota-325003
Rajasthan (India)

1
CERTIFICATE

I hereby admit that the work presented in this B.Tech Major Project Part I entitled “Movie
Recommendation System” in partial fulfillment of the requirements for the award of the
Bachelor of Technology in Computer Science & Engineering and submitted to the Department of
Computer Science & Engineering of Career Point University, Kota, Rajasthan is an original
piece of my own work and has not been submitted partially or fully anywhere else. Authorized
contents and copy right material used in this report has been properly cited and obtained
permission from competent authority. The matter presented in this project work has not been
submitted by me for the award of any other degree elsewhere.

Tushar Jain
(k18327)

This is certify that the above statement made by the candidate is correct to the best of my
Knowledge.

Date: ………………………. Supervisor


Mr. Rohit Maheshwara
Professor (CSE)
Mr. Ashik Hussain
Head of Department
Computer Science & Engineering

2
ACKNOWLEDGEMENT

We would like to express my heartfelt gratitude to my guide professor Mr. Rohit


Maheshwara, School of Computer Science & Engineering of Career Point
University, Kota for his valuable time and guidance that made the project work a
success. He has inspired me with such a spirit of devotion, precision, and unbiased
observation, which is a cornerstone of technical study.

We are highly grateful to Mr. Rohit Maheshwara, Head of the School of Computer
Science & Engineering of Career Point University, Kota, for his kind support for the
project work. We would like to thank all my friends and all those who have helped
me carrying out this work directly or indirectly without whom completion of this
project work was not possible.

Yours Sincerely
Tushar Jain (k16133)

3
ABSTRACT

In this project, a movie recommended system is built based on the TMDB datasets. We used a
content based filtering method to recommends other movie which are similar to the selected
movies. There is already enough content available on the movie recommendation system.
Showing the movie recommendations are essential so that the user need not waste a lot of time
searching for the content which he/she might like. Thus, the movie recommendation system
plays a vital role to get user personalized movie recommendations.
After searching a lot on the internet and referring to a lot of research papers, we got to know that
the recommendations made using Content-based Filtering are using a single text to vector
conversion technique and a single technique to find the similarity between the vectors. In this
research work, we have used multiple text to vector conversion techniques and manipulated the
results of the multiple algorithms to get the final recommendation list. You can think of it as a
hybrid approach using the Content-based Filtering technique only.

4
INDEX

CERTIFICATE…………………………………………………….…………2

ACKNOWLEDGEMENT…………………………………………...……...3

ABSTRACT…………………………………………………………...….......4

List of Figures………………………………………………………..……...7

List of Tables……………………………………………………..………......8

Contents

1. Introduction………………………………..……………………..10
1. Relevance of the Project………………………………………………..10
2. Problem Statement……………………………………………………...11
3. Objective of the Projects ……………………………………………….11
4. Scope of the Project…………………………………………………….11
5. Methodology for Movie Recommendation…………………………….12
2. Purpose ……………………..………………………..……………..14
3. Literature Survey..…………………………………………………16
1. Movie Recommendation System by K-Means Clustering AND K-Nearest
Neighbor…………………………………………………………………16
2. Movie Recommendation System Using Collaborative Filtering………...17
4. Requirement………………………………………………………..18
1. Hardware Requirements………………………………………………..18
2. Software Specification………………………………………………….16
3. Software Requirements………………………………………………….16

5
5. Analysis and Design………………………………………………..18
1. System Architecture of Proposed System……………………………….18
2. Project Flow……………………………………………………………..18
6. Implementation ……………………………………………………22
1. Cosine Similarity………………………………………………………..22
2. CountVectorizer…………………………………………………………22
7. Datasets…………………….………………………………………24
8. Result AndAnalysis………………………………………………..30
9. Conclusion………………………………………………………….44
10. Reference……………………………………………………………45

6
LIST OF FIGURES

Fig: 6.1 Website Code Screenshot…………………………………………………….23

7
LIST OF TABLES

8
9
CHAPTER 1

INTRODUCTION

1.1 Relevance of the Project

A recommendation system or recommendation engine is a model used for


information filtering where it tries to predict the preferences of a user and provide
suggests based on these preferences. These systems have become increasingly
popular nowadays and are widely used today in areas such as movies, music, books,
videos, clothing, restaurants, food, places and other utilities. These systems collect
information about a user's preferences and behavior, and then use this information to
improve their suggestions in the future.

Movies are a part and parcel of life. There are different types of movies like some for
entertainment, some for educational purposes, some are animated movies for
children, and some are horror movies or action films. Movies can be easily
differentiated through their genres like comedy, thriller, animation, action etc. Other
way to distinguish among movies can be either by releasing year, language, director
etc. Watching movies online, there are a number of movies to search in our most
liked movies . Movie Recommendation Systems helps us to search our preferred
movies among all of these different types of movies and hence reduce the trouble of
spending a lot of time searching for our favorite movies. So, it requires that the movie
recommendation system should be very reliable and should provide us with the
recommendation of movies which are exactly same or most matched with our
preferences.

A large number of companies are making use of recommendation systems to increase


user interaction and enrich a user's shopping experience. Recommendation systems

10
have several benefits, the most important being customer satisfaction and revenue.
Movie Recommendation system is a very powerful and important system. But, due to
the problems associated with pure collaborative approach, movie recommendation
systems also suffer with poor recommendation quality and scalability issues.

1.2 Problem Statement

The goal of the project is to recommend a movie to the user.


Providing related content out of relevant and irrelevant collection of items to
users of online service providers.

1.3 Objective of the Project

● Improving the Accuracy of the recommendation system


● Improve the Quality of the movie Recommendation system
● Improving Scalability.
● Enhancing the user experience.

1.4 Scope of the Project

The objective of this project is to provide accurate movie recommendations to


users. The goal of the project is to improve the quality of movie recommendation system,
such as accuracy, quality and scalability of system than the pure approaches. This is done
using Hybrid approach by combining content based filtering and collaborative filtering,

11
To eradicate the overload of the data, recommendation system is used as information
filtering tool in social networking sites .Hence, there is a huge scope of exploration in this
field for improving scalability, accuracy and quality of movie recommendation systems
Movie Recommendation system is very powerful and important system. But, due to the
problems associated with pure collaborative approach, movie recommendation systems
also suffers with poor recommendation quality and scalability issues.

1.5 Methodology for Movie Recommendation

We need to perform preprocessing on the dataset and combine the relevant features into a
single feature. Later, we need to convert the text from that particular feature into vectors.
Later, we need to find the similarity between the vectors. Finally, get the
recommendations as per the system architecture mentioned below.

Agile Methodology:

1. Collection of Data Sets: Collecting all required data sets from Kaggle website. In
this project we required a tmdb 5000 movie dataset.

2. Data Analysis: Make sure that the collected data sets are correct and analyzing
the data in the csv files. i.e. checking whether all the column Fields are present in
the data sets.

3. Algorithm: In our project we have use cosine similarity and CountVectorizer.

4. Training and Testing the model: once the implementation of algorithm is


completed . we have to train the model to get the result. We have tested it several
times the model is recommend different set of movies to different users

12
a. Improvement in this project: In the later stage we can implement different
algorithms and methods for better recommendation.

13
CHAPTER2
PURPOSE

The need for movie recommendations has increased with the advent of the internet. People began
to expect more from the new technology, and they wanted to be able to find films easily and
quickly. Now, anyone with a search engine and an internet connection can find anything they
want, including films. With the increasing number of films being published, the number of films
that one can watch, and the number of films that one can discover, has also increased. It can be
quite a challenge for a person to find films that interest them and make a decision about them.
For example, you want to watch a certain type of film, but you have no idea where to start from.
Movies, in general, have become a lot more competitive now than what they used to be a couple
of years ago. More and more filmmakers are trying to find ways to stand out from the crowd, and
it has become much more difficult to find something that interests you. With so many films
competing for your attention, how can you possibly find something new and interesting?

The popularity of movie recommendation systems has increased exponentially in recent years.
People now expect their apps to provide them with recommendations regarding movies.
However, most of these systems are quite limited in their scope of recommendations. They
recommend only a few films, and they don’t help you discover new things. You can build a
movie recommendation system that can help you discover new films as well as help people find
the kinds of films they want to watch. It can analyze the preferences of the users, and then
recommend a specific genre for the user or a list of films that fit a certain theme.

With the increase in the number of films being published, discovering new films has become a
challenge for many people. Finding those films that one loves and those that are entertaining has

14
become difficult. A movie recommendation system can help you discover new films and find
those films that you love. For those filmmakers who want their app to help people discover new
films, a movie recommendation system can be an ideal solution. It can recommend specific
genres or help people find films that fit a specific theme. Now, when you are going through these
articles, you may be thinking why you need a movie recommendation system. But trust me,
building one is quite interesting and fun. Moreover, it can be quite a lucrative business for you as
well. So, let’s get

15
CHAPTER3

LITERATURE SURVEY

Over the years, many recommendation systems have been developed using either collaborative,
content based or hybrid filtering methods. These systems have been implemented using various
big data and machine learning algorithms.

3.1 Movie Recommendation System by K-Means Clustering AND K-Nearest


Neighbor

A recommendation system collect data about the user’s preferences either implicitly or
explicitly on different items like movies. An implicit acquisition in the development of
movie recommendation system uses the user’s behaviour while watching the movies. On
the other hand, a explicit acquisition in the development of movie recommendation
system uses the user’s previous ratings or history. The other supporting technique that are
used in the development of recommendation system is clustering. Clustering is a process
to group a set of objects in such a way that objects in the same clusters are more similar
to each other than to those in other clusters. KMeans Clustering along with K-Nearest
Neighbour is implemented on the movie lens dataset in order to obtain the best-optimized
result. In existing technique, the data is scattered which results in a high number of
clusters while in the proposed technique data is gathered and results in a low number of
clusters. The process of recommendation of a movie is optimized in the proposed
scheme. The proposed recommender system predicts the user’s preference of a movie on
the basis of different parameters. The recommender system works on the concept that
people are having common preference or choice. These users will influence each other’s
opinions. This process optimizes the process and has lower RMSE.

16
3.2 Movie Recommendation System Using Collaborative Filtering

By Ching-Seh (Mike) Wu,Deepti Garg,Unnathi Bhandary Collaborative filtering


systems analyze the user's behavior and preferences and predict what they would like
based on similarity with other users. There are two kinds of collaborative filtering
systems; user-based recommender and item-based recommender.

1. Use-based filtering: User-based preferences are very common in the field of


designing personalized systems. This approach is based on the user's likings. The
process starts with users giving ratings (1-5) to some movies. These ratings can be
implicit or explicit. Explicit ratings are when the user explicitly rates the item on
some scale or indicates a thumbs-up/thumbs-down to the item. Often explicit
ratings are hard to gather as not every user is much interested in providing
feedbacks. In these scenarios, we gather implicit ratings based on their behaviour.
For instance, if a user buys a product more than once, it indicates a positive
preference. In context to movie systems, we can imply that if a user watches the
entire movie, he/she has some likeability to it. Note that there are no clear rules in
determining implicit ratings. Next, for each user, we first find some defined
number of nearest neighbours. We calculate correlation between users' ratings
using Pearson Correlation algorithm. The assumption that if two users' ratings are
highly correlated, then these two users must enjoy similar items and products is
used to recommend items to users.
2. Item-based filtering: Unlike the user-based filtering method, itembased focuses
on the similarity between the item’s users like instead of the users themselves.
The most similar items are computed ahead of time. Then for recommendation,
the items that are most similar to the target item are recommended to the user.

17
CHAPTER4

REQUIREMENTS

This chapter involves both the hardware and software requirements needed for the project and
detailed explanation of the specifications.

4.1 Hardware Requirements

1. A PC with Windows/Linux OS
2. Processor with 2.40GHz 2.50 GHz speed
3. Minimum of 8gb RAM.

4.2 Software Specification

1. Text Editor(VS code/Jupyter Notebook)


2. Anaconda distribution package
3. Python libraries

4.3. Software Requirements

4.3.1 Anaconda distribution package

Anaconda is a free and open-source distribution of the Python programming


languages for scientific computing (data science, machine learning applications,
large-scale data processing, predictive analytics, etc.), that aims to simplify
package management system and deployment. Package versions are managed by

18
the package management system conda. The anaconda distribution includes data-
science packages suitable for Windows, Linux and MacOS.3.

4.3.2 Python libraries

For the computation and analysis we need certain python libraries which are used
to perform analytics. Packages such as SKlearn, Numpy, pandas, Matplotlib,
Flask framework, etc are needed.

SKlearn: It features various classification, regression and clustering algorithms


including support vector machines, random forests, gradient boosting, k-means
and DBSCAN, and is designed to interoperate with the Python numerical and
scientific libraries NumPy and SciPy.

NumPy: NumPy is a general-purpose array-processing package. It provides a


high-performance multidimensional array object, and tools for working with these
arrays. It is the fundamental package for scientific computing with Python.
Pandas: Pandas is one of the most widely used python libraries in data science. It
provides high-performance, easy to use structures and data analysis tools. Unlike
NumPy library which provides objects for multi-dimensional arrays, Pandas
qqprovides an in-memory 2d table object called Data frame.

Pandas: Pandas is an open-source library that is built on top of NumPy library. It


is a Python package that offers various data structures and operations for
manipulating numerical data and time series. It is mainly popular for importing
and analyzing data much easier. Pandas is fast and it has high-performance &
productivity for users.

Streamlit: Streamlit is an open source app framework in python language. It


helps us create beautiful web-apps for data science and machine learning in a little
time. It is compatible with major python libraries such as scikit-learn, keras,
pytorch, latex, numpy, pandas, matplotlib, etc.. Syntax for installing this library is
shown below.

19
CHAPTER5

SYSTEM ANALYSIS AND DESIGN

5.1 System Architecture of Proposed System:

Fig:5.1 Architecture of Content-based approach

Content-based filtering in recommender systems leverages machine learning algorithms


to predict and recommend new but similar items to the user. Recommending products
based on their characteristics is only possible if there is a clear set of features for the
product and a list of the user’s choices.

The recommender system stores previous user data like clicks, ratings, and likes to create
a user profile. The more a customer engages, the more accurate future recommendations
are.

20
5.2 Project Flow

Fig: 5.2 Project Flow

Initially load the data sets that are required to build a model the data set that are required
in this project are tmdb_5000_credits.csv and tmdb_5000_movies.csv all the data sets are
available in Kaggle.com. Basically, three models are created using a content-based
approach and then imported into a website using the Streamlit Python library used for
creating web apps. And at last deploy that website to the heroku server.

21
CHAPTER6

IMPLEMENTATION

The Proposed System Makes Use of Different Algorithms and Methods for the implementation
of Content based approach.

6.1 Cosine Similarity:


Cosine similarity is a measure of similarity between two non-zero vectors of an inner
product space that measures the cosine of the angle between them.

Formula:

6.2 CountVectorizer:
CountVectorizer is a great tool provided by the scikit-learn library in Python. It is used to
transform a given text into a vector on the basis of the frequency (count) of each word
that occurs in the entire text. This is helpful when we have multiple such texts, and we
wish to convert each word in each text into vectors (for use in further text analysis).

22
class sklearn.feature_extraction.text.CountVectorizer(*,
input='content', encoding='utf-8', decode_error='strict',
strip_accents=None, lowercase=True, preprocessor=None,
tokenizer=None, stop_words=None, token_pattern='(?u)\b\w\w+\b',
ngram_range=(1, 1), analyzer='word', max_df=1.0, min_df=1,
max_features=None, vocabulary=None, binary=False,
dtype=<class 'numpy.int64'>)

Experimental requirements:

Code: Website(Streamlit):

Fig: 6.1 Website Code Screenshot

CHAPTER7

23
DATASET

The ‘TMDB 5000 Movie Dataset’ is taken into consideration for movie recommendation
purposes in this research work. This dataset is available on kaggle.com. The dataset is composed
of 2 CSV files - ‘tmdb_5000_movies.csv’ and ‘tmdb_5000_credits.csv’

The ‘tmdb_5000_movies.csv’ dataset consists of the following attributes:

● ‘budget’: It indicates the budget of the movie.


● ‘genres’: It indicates the genres of the movie like Action, Documentary, etc.
● ‘homepage’: It indicates the homepage of the movie. It is basically a website link.
● ‘id’: It indicates movie ID
● ‘keywords’: It indicates the keywords of the movie. Apart from the title of the
movie, keywords give a quick information about the movie.
● ‘original_language’: It indicates whether the movie is originally created in
English or other language.
● ‘original_title’: It is nothing but the movie title.
● ‘overview’: It is a short description of the movie.
● ‘popularity’: It is a metric which indicates popularity.
● ‘production_companies’: It consists of the names of companies which has
produced the movie.
● ‘production_countries’: It consists of the names of the countries in which the
movie production took place.
● ‘release_date’: It consists of the release date of the movie. The format used is
yyyy-mm-dd where ‘yyyy’ indicates year of release, ‘mm’ indicates the month of
release, and ‘dd’ indicates the day of release.
● ‘revenue’: It indicates the revenue earned by the movie.
● ‘runtime’: It indicates the runtime of a movie. Runtime basically means the length
of the movie.
● ‘spoken_languages’: It consists of the languages spoken in the movie

24
● ‘status’: It indicates the status of the movie. For example, a movie can be released
or not released which basically indicates the status of that movie.
● ‘tagline’: It consists of the tagline of the movie.
● ‘title’: It consists of the title of the movie.
● ‘vote_average’: It indicates the average of the votes.
● ‘vote_count’: It indicates the vote count.

Table: 7.1 Statistical data about ‘tmdb_5000_movies.csv’ dataset

Fig:7.1 Glimpse of the ‘tmdb_5000_movies.csv’ dataset

The ‘tmdb_5000_credits.csv’ dataset consists of the following attributes:


● ‘movie_id’: It indicates the movie ID.

25
● ‘title’: It indicates the title of the movie.
● ‘cast’: It consists of the cast of the movie. Cast implies the actors and actresses who
appear in the movie.
● ‘crew’: It consists of those people who are concerned with the production of the movie.

Table:7.2 Statistical data about ‘tmdb_5000_credits.csv’ dataset

The Exploratory Data Analysis (EDA) has been inspired by Heeral Dedhia’s blog on
medium.com.

Fig:7.2 Top Genres

Movies having the genre as Drama are maximum in number as compared to Family movies and
Horror movies. A movie might have multiple genres

26
Fig:7.3 Actor with highest appearance

The above figure indicates the actors with the highest appearance in the decreasing order.

Fig:7.4 Directors with highest movies

The above figure indicates the directors with the highest appearance in the decreasing order.

27
Fig:7.5 Runtime versus Number of movies

As the runtime increases, number of movies are increasing. After a certain point, as the runtime
increases, the number of movies decreases. There are some exceptions.

Fig:7.6 Runtime versus Budget

There are a lot of movies with lower budget and falling in the range of runtime 70 to runtime
150.

28
Fig:7.7 Revenue versus Budget

It can be seen from the above figure that low budget movies have low revenue in general.

Table:7.3 Director, Keywords, Cast and Genres of a movie are combined into a
single feature titled as ‘tags’

The ‘tags’ attribute needs to be further processed by using some algorithms.

CHAPTER7

RESULTS AND DISCUSSION

29
30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy