0% found this document useful (0 votes)
8 views39 pages

ABHAY P

Uploaded by

mj5629083
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views39 pages

ABHAY P

Uploaded by

mj5629083
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

1.

MOVIE RECOMMENDATION SYSTEM


A Project Report submitted in partial fulfillment of the requirements for the award of
the Degree of

B. Voc. (B.Sc.) INFORMATION TECHNOLOGY

By
Abhay Singh Soun
SSJUUV2247510001
2022-25

Under the esteemed guidance of


Mr./Mrs. Himanshu Punetha
Designation

DEPARTMENT OF VOCATIONAL STUDIES

(Affiliated to Soban Singh Jeena University, Almora)

PITHORAGARH, 262501, UTTARAKHAND

MANAS COLLEGE OF SCIENCE TECHNOLOGY AND


MANAGEMENT
(Affiliated to Soban Singh Jeena University, Almora)
PITHORAGARH – UTTARAKHAND - 262501
B. Voc. (B.Sc.) INFORMATION TECHNOLOGY
CERTIFICATE

This is to certify that the project entitled, "Movie Recommendation System ", is bona-fide
work of Abhay Singh Soun bearing Enrollment No: (SSJUUV2247510001) submitted in
partial fulfillment of the requirements for the award of the degree of B. Voc. (B.Sc.)
INFORMATION TECHNOLOGY from Manas College of Science Technology and
Management, Pithoragarh.

Guide External Examiner

Date:

DECLARATION
Movie Recommendation System Manas
College of Science Technology Management and has not been in any case duplicated to submit
to any other university for the award of any degree. To the best of my knowledge other than me,
no one has submitted to any other university.
The project is done in partial fulfillment of the requirements for the award of a degree of B. Voc.
(B.Sc.) INFORMATION TECHNOLOGY to be submitted as a final semester project as part of
our curriculum.

Abhay Singh Soun


Name and Signature of the Student

PROFORMA FOR THE APPROVAL PROJECT


PROPOSAL

ENROLLMENT NO. SSJUUV2247510001 Roll No. 2247510001

1. Name of the Student


ABHAY SINGH SOUN

2. Title of the Project


MOVIE RECOMMENDATION SYSTEM

3. Guide Of the Project


Mr. Himanshu Punetha

4. Teaching experience of the Guide


5+ Years

5. Is this your first submission?


Yes
Signature of the student Signature of the Guide
DATE: DATE:

ABSTRACT

The rapid growth of digital streaming platforms has led to an overwhelming volume of content,
making it challenging for users to find movies that match their tastes. A Movie
Recommendation System serves as an essential tool to enhance user experience by filtering and
suggesting movies aligned with individual preferences. This project presents the development
and implementation of a Movie Recommendation System that combines collaborative filtering,
content-based filtering, and hybrid techniques to deliver accurate and personalized movie
suggestions. This Movie Recommendation System can be seamlessly integrated into streaming
platforms, helping users discover relevant content more efficiently and enhancing platform
engagement and retention. By providing tailored recommendations, this system not only
improves user satisfaction but also promotes a more diverse viewing experience, exposing users
to a broader range of movies. Future work includes exploring deep learning methods, such as
neural collaborative filtering and recurrent neural networks (RNNs), to further improve
recommendation accuracy. This project illustrates the potential of recommendation systems in
transforming digital content discovery and user engagement in the entertainment industry.
This abstract provides a more in-depth overview, touching on the algorithms, challenges, and
potential future enhancements for the system.
ACKNOWLEDGEMENT

First and foremost, I would want to express my gratitude to “Mr. Himanshu Punetha”, my
project mentor, for all of their help and support during this project. Their knowledge and
perceptions have greatly influenced the course and result of this effort.

I am thankful to the Professor’s for guiding me through this project and continuously
encouraging me. It would not have been possible to complete this project without his support.

I am also thankful to all the faculty members of Department of Information Technology, Manas
College of Science Technology and Management, Pithoragarh for helping me during the project.

I am grateful to my team, family and friends for their unending support without which
completion of this project was not possible.

In closing, I would want to thank all of the writers and scholars whose contributions have served
as a strong basis and source material for this project.
TABLE OF CONTENTS
INTRODUCTION
INTRODUCTION……………………………………………………………........................1

PROBLEM STATEMENT………………………………………………………………………...2

OBJECTIVE…………………………………………………………………............................3

METHODOLOGY.……….…………………………………………………………………………4

ORGANISATION…………………………………………………………………………………….14

SYSTEM DESIGN
Dataset………………………………………………………………………………………………..18

Visualizing the no. of users Voted…………………………………………………………19


Visualizing the no. of Votes by User………………………………………………………20

Algorithms Used…………………………………………………………………………............21

Hardware and Software Requirements………………………………………………….24

Concepts Requirements…………………………………………………………………………24

PERFORMANCE ANALYSIS
Comparisons and Results………………………………………………………..................25

CONCLUSIONS
Conclusion…………………………………………………………………………………40
Chapter 1
INTRODUCTION
Recommendation system is basically a filtering system that predicts the users choices and then
suggest them the the more accurate results based on the the previous likings of the users . We
have a variety of varied applications of this recommendation systems in which we can can be
used over the years and now used in various online platforms the basic content of all this
platforms are basically different types of movies such as action thriller romantic or maybe your
eCommerce website any platform of social media having a professional website such as
Linkedin .

For example when we use Instagram we can see the previous stories that on the feed of the
people we follow so here we can see that the Instagram can monitor our interaction with the
various people are our past activities and then it just suggest kind of other related stories of
some other accounts that have done some same kind of activity previously or currently.
Quite a few time is recommender system also keep improving the activities of a bunch of users
based on the activities they have scroll through you attempted. For example on Flipkart when
we buy some laptop or any mobile phone then it simply suggests mobile cover tempered glass
for mobile or buy USB type C adaptor or type A adaptor for the laptop also.
Safed enhancements in the recommender systems users get good recommendation all the time
and it keeps on improving as we move forward in the 21st century and they make almost
accurate solutions.

In case of clash of any e App Music any music platform or any educational then use a simply
deny using the app in addition to this the companies have to focus on their recommendation
system which is more Complex than it seems. Every user has different preferences and different
choices based upon their different type of activities sometime mood also so in case of music’s
while playing, travelling, running aur after having some fight in relationships etc.
PROBLEM STATEMENT
Recommender systems are tools that aims to get the user's rating and then recommend the
movies from a big set of data on the basis of the users matching interest and then classify them
into different categories. The sole purpose of the whole system of this recommendation is the
search for the content that it would fit into the person's interest for an individual's personal oasis.
However it takes into account different factors that would create some different list of content
that is specific to different categories of individual/ users .

AI based algorithms that recommender systems basically used creates a list of possible different
scenarios of devices and then customizing that all the interesting and matching interest/ choices
of the individual categories in the end. All the results are basically based on the different
activities that they have done previously such as how does the profile look what have gone
through the Chrome Browser Opera browser and other Browser which includes their previously
browsed history for considering the demographic traits or the possibility how they would like
the movie is based on the genre, a set of predictive modelling is constructed through the
data(big) which is available and then the movies are protected through the list of 2000 movies
set a bunch of few selected movies are recommended using different algorithms different
methods different similarity measures

OBJECTIVE

Movie recommendation system provides the mechanism and classifying the users with the
same interest and searches for the content that would be so much interesting belonging to
different set of users and then creating different kind of lists and providing interesting
recommendations to the individual based on the content the love. The main objective of the
recommender system is to used approaches suggest demographic filtering ,content based
filtering , collaborative filtering to find the set of movies with every user likes for specific set
of users.
The movies that have high probability of being liked by the general set of users will be
displayed to the user by the recommender in the end and then in another technique we will try
to find the users with different interest using the information collected through different
activities an Indian in collaborative filtering will test all those users which have same type of
interests to get the final set of movies to be recommended to the users individually.
So we will use different categories of recommender filtering techniques and then compare in
contrast that results obtained in different methods and will try to to improve the results as h
dataset for set of movies goes larger and larger above the computational bound of the system
which is generally a limitation on the large dataset.
METHODOLOGY

Various types of recommender system which we can classify as below

1. Demographic Filtering : This technique of recommendation filtering is based on the


popularity basis for the gender specific users. The system simply e recommends the movies to users
chaps slightly same demographic matches. Every user is different in this case so it is very simple to
applied this approach. Idea is that the movies which are very popular and accepted by a bunch of
people are having the highest probability of getting like by the users.
To understand this demographic filtering:

. Create a metric to rate the movie.

. Find the different metric score.

. Shorting the scores and then recommending the movies which are best rated for the users

Weighted Ratings (WR)= 𝑣


.𝑅+ 𝑚
.𝐶
𝑣+𝑚 𝑣+𝑚
2. Content Based Filtering system: In the content based filtering method we compare
the different items with the user's interest profile. So basically the user profile holds the
content that is is much more matching to use the form of the features. The previous actions or
for the feedback is taken into account a generally takes into account the description of the
content that has been edited by the users of different choices. Considering that example where
a person buys some favourite item 'M' but item has been sold out and as a result he has to buy
the item 'N' on the recommendation of some person as and 'N' has same type of matching
features that the first one possesses. So this is simply the content based filtering which is
demonstrated below

Fig.-Content Based Filtering Method

So here numeric quantity that will be used to calculate the similarity between the two types of
movies will be cosine similarity and we will calculate the score it is very very fast to calculate
the the magnitude of the score which is obtained through the cosine similarity

The steps involved in getting the movie recommendation are as below:

. Having the title find the index of that movie

. Calculate the cosine similarity scores for all the movies


. Arranging the scores in the order of highest priority first that is ascending order

. And then shorting the list based on the similarity scores.

. Getting the first 10 element of the list excluding the first one as it is the movie name in itself. .
Getting the top elements

Repeating above steps we will find the top movies based on the distances which it can get
rhe best possible recommendation, the movies that have high probability of being liked by the
general set of users will be displayed to the user by the recommender in the end and then in
another technique we will try to find the users with different interest using the information
collected through different activities an Indian in collaborative filtering will test all those
users which have same type of interests to get the final set of movies to be recommended to
the users individually. The cosine similarity is the cause of the angle between the two vectors
where the vectors are non zero and the inner product space it is described as the dot product
of the two vectors divide by by the product of the euclidean magnitude. In most cases cosine
similarity is used to get preffered recommendations for users.

3.Collaborative based Filtering: Content based filtering suffer from various limitations
which is only capable of the suggesting movies having only one type of users preferences and
then unable to provide recommendations in case of genres . However collaborative filtering
based system provides much complexibility in finding the record between the similarity of
user and the the likes of the users having similar interest. For measuring the similarity of
users views cosine similarity or pearson's correlation. Taking example in the below Matrix
every row has a user with column corresponding to the movies having the same similarity it
also has the ratings of different movies which the user have given to each movie has a target
user.
All the the collaborative filtering in case of user based is simple but it has also drawbacks the
biggest challenges that the choices of the users where is with time. Pre computing the Matrix
orphan let the problem of lower performance. So we can use the item based collaborative
filtering which basically considers the items based on the similarity with the items and that it it
find the similar matches with the target users the same similarity coefficients suggest pearson's
correlation or Cosine similarity can be used. Item based collaborative filtering is most static in
nature . Like blow example only one user which has related both Matrix and Titanic so similarity
which stands between them is only one . There may be cases where we have millions of users and
the similarity between those two different movies is very high as they have same rank for the user
who have rated them both.
In collaborative filtering try to find out the users have which have name interest and similar li
kes. In this case we don't use features of the item to recommend it but we use the classificatio n
of users into clusters of similar types and then seperate each cluster into the order of the pre
ference of the user. we can also use the cosine distance here which takes into account the user s
with the similar interest greater the cosine small angle between the two user. Here we simpl y
use the utility matrix we can assign the zero value to the sparse columns forming the calcul
ations easy. Item based Colaborative filtering is preferred in general because it takes into acc
ount the movie instead of the number of users which further only make the classification of th e
movies and user much easier. Hence the user based collaborative filtering is not preferred b
ecause it's simply only takes the user's into account and ignore the sparse values which create s
the issues in bringing out the performance of the recommender system.
4.Hybrid Based Filtering: It is simply a mixture of content based filtering and collaborative
based filtering methods where we will take the input as the the userid and the title of the
movie and the output will be e the similar movies shorted by the particular users based on the
expected ratings. Expected ratings are calculated internally where the ideas from content and
collaborative filtering are used to build a engine where movies are suggested to the particular
user and then estimation of the ratings takes place

In the comparisons section below we will see how movies are determined through the hybrid
technique of filtering where we have both used content based method as well as the
collaborative based filtering method. It is clear that hybrid filtering method is is good in most
of the cases and scenarios where it is difficult to distinguish or get the accuracy which the
users can get the recommended movies.
Hybrid Filtering Method
SYSTEM DESIGN

Dataset
1) For Content and Collaborative Based Filtering:
• Kaggle provided the data set. The Movie Recommendation System uses it as a standard
Dataset.

• We used the movie dataset from 'Movie Lens(Kaggle)' for the project.

• Movies and ratings are taken into account.

• Total of 9743 movies

• Total of 100147 ratings

• MovieLens users were chosen at random.

• A unique id is assigned to each user and movie

2) For Hybrid filtering method :

Consists of 26,000,000 ratings and 750,000 taag applications applied to 45,000 movies by
270,000 users
Ratings are from 1-5 scale and taken from Group Lens Officially.
Visualizing the no. of users Voted

Visualization the no. of Users Voted

Visualizing the no. of Votes by User


Visualizing the no. of Votes by Users

Algorithms Used
K-Means Algorithm:

K means clustering algorithm just simply create the cluster inside a cluster which have same
matching features in between them. The degree of closeness defines the the similarity basis as 2
how 2 points are related to each other. In this algorithm re simplify and centroid and then repeat
the the process until optimum centroid is is calculated or found . It simply determines the best
value for the K Centre points by iterative process and then assign each data point to the closest
nearest centre of K value.The number of clusters found from the data is denoted simply by the
notation 'K'. Simple unsupervised ml algorithm categorize the data points into subgroups even
from the very less information about the data.

K-Mean algorithm

Hardware and Software Requirements


● 4.2 GB RAM

● MS Window 7 and above Software Requirements

● Jupyter Notebook

● Wamdp Server
● Visual Studio Code

● Sublime Text

● MYSQL

CONCEPTS REQUIREMENTS

• Machine Learning Algorithms

• Data Pre-processing Functions and tools

• scikit-learn

• seaborn

• knowledge of K-Means clustering

.NumPy is a Python programming language.

• Panda bears

• matplotlib (matplotlib)

• Cleaning of data

•64bit processors are required

PERFORMANCE ANALYSIS

Comparisons and Results


1.Demographic Filtering:
Filtering the cause of short in the movies recommended which are best to the users
based on the metric scores and personalized and generalized recommendations are
recommended to the every users on the basis of the popularity which are generally like
by the average audience.
Demographic Filtering Output

2 Content Based Filtering:


User profile holds the content that is is much more matching to use the form of the
features. The previous actions or for the feedback is taken into account a generally takes
into account the description of the content that has been edited by the users of different
choices.
Content Based Filtering Output

3.) Collaborative based Filtering: In the collaborative filtering behaviour used here
item based collaborative filtering where we have taken 3 different types of metrics and
varied the results accordingly. Brief comparison of three of the metric used in the
collaborative filtering are are shown with the movies recommended from them based on
the the bounds set to the number of users and a number of ratings by a user to a movie.
Metric =”Cosine” Cosine similarity, or the cosine kernel, computes similarity as the
normalized dot product of X and Y: K(X, Y) = <X, Y> / (||X||*||Y||) On L2-normalized
data, this function is equivalent to linear_kernel.
Collaborative Based Filtering Method(Metric =Cosine) Output

Metric=”Cityblock”-> This function simply returns the valid pairwise distance metrics. It exists
to allow for a description of the mapping for each of the valid strings.
The function for the cityblock is as below

‘cityblock’ =metrics.pairwise.manhattan_distances
Collaborative Based Filtering Method(Metric =Cityblock) Output

Metric=”Minkowski”-> It is a metric intended for real-valued vector spaces. We can


calculate Minkowski distance only in a normed vector space, which means in a space
where distances can be represented as a vector that has a length and the lengths cannot
be negative.
Collaborative Based Filtering Method(Metric =Minkowski) Output
4) Hybrid Based filtering: It is simply a mixture of content based filtering and
collaborative based filtering methods where we will take the input as the the userid and
the title of the movie and the output will be e the similar movies shorted by the
particular users based on the expected ratings. Expected ratings are calculated internally
where the ideas from content and collaborative filtering are used to build a engine where
movies are suggested to the particular user and then estimation of the ratings takes place.

Hybrid Based Filtering Output


Comparitative output

Here we can see that the hybrid filtering technique stands good in in overcoming the the
issues faced in the content based filtering technique and the collaborative based filtering
method we can generalize from the method of root mean square error that the value for
hybrid filtering method is less so performance is higher for hybrid case. While we can
say that collaborative filtering technique stands good only in terms of the quality
perspective but when it comes to both qualitative and quantitative achievement of the
result will prefer hybrid filtering technique where the all flaws. While content based
filtering technique only outperform the collaborative in terms of similarity e the
collaborative filtering technique can you recommend one item to the other item of the
similar interest, the overall flaws can be removed by the hybrid based collaborative
filtering with two or more examination techniques are combined to gain the better
performance with the less possibilities of drawback of this system. In general in case of
hybrid filtering techniques the collaborative filtering technique is combined with some
other type of filtering technique to avoid the ramp up problem and thus it outperforms the
the major drawbacks of the system in case if we prefer to use single content based or
collaborative filtering technique.
So hybrid filtering recommender simply allows the user to select his own choices from a
given data which contain some attributes or some set of values which contain user
specific values and then recommend then the best movie which is based on the
similarities based calculating the the accumulator weight and then applies the algorithm
which is in our case K mean algorithm. Expected ratings are calculated internally where
the ideas from content and collaborative filtering are used to build a engine where
movies are suggested to the particular user and then estimation of the ratings takes place.
So in the the process of getting different results from different algorithms and techniques
hybrid approach is preferred to be better one between the content and collaborative
filtering techniques which simply overcomes the drawbacks of the the single algorithm
and then tries to improve the performance of the overall recommender system. Moreover
some other techniques like classification clustering can be used to get the best of the
recommendations which would simply increase our accuracy for the recommender
system. So the the better performance can be achieved in the end by a hybrid based
filtering technique which is why it is most preferable over the other two techniques.
System Framework

PERFORMANCE ANALYSIS

Comparisons and Results


1.Demographic Filtering:
Filtering the cause of short in the movies recommended which are best to the users based
on the metric scores and personalized and generalized recommendations are recommended
to the every users on the basis of the popularity which are generally like by the average
audience.

Demographic Filtering Output

2 Content Based Filtering:


User profile holds the content that is is much more matching to use the form of the
features. The previous actions or for the feedback is taken into account a generally takes
into account the description of the content that has been edited by the users of different
choices.

Content Based Filtering Output

3.) Collaborative based Filtering: In the collaborative filtering behaviour used here
item based collaborative filtering where we have taken 3 different types of metrics and
varied the results accordingly. Brief comparison of three of the metric used in the
collaborative filtering are are shown with the movies recommended from them based on
the the bounds set to the number of users and a number of ratings by a user to a movie.

Metric =”Cosine” Cosine similarity, or the cosine kernel, computes similarity as the
normalized dot product of X and Y: K(X, Y) = <X, Y> / (||X||*||Y||) On L2-normalized
data, this function is equivalent to linear_kernel.
Collaborative Based Filtering Method(Metric =Cosine) Output

Metric=”Cityblock”-> This function simply returns the valid pairwise distance metrics. It exists to allow
for a description of the mapping for each of the valid strings.
The function for the cityblock is as below

‘cityblock’ =metrics.pairwise.manhattan_distances
Collaborative Based Filtering Method(Metric =Cityblock) Output

Metric=”Minkowski”-> It is a metric intended for real-valued vector spaces. We can


calculate Minkowski distance only in a normed vector space, which means in a space
where distances can be represented as a vector that has a length and the lengths cannot
be negative.
Collaborative Based Filtering Method(Metric =Minkowski) Output

4) Hybrid Based filtering: It is simply a mixture of content based filtering and


collaborative based filtering methods where we will take the input as the the userid and
the title of the movie and the output will be e the similar movies shorted by the
particular users based on the expected ratings. Expected ratings are calculated internally
where the ideas from content and collaborative filtering are used to build a engine where
movies are suggested to the particular user and then estimation of the ratings takes place.

Hybrid Based Filtering Output

30
Comparitative output

Here we can see that the hybrid filtering technique stands good in in overcoming the the
issues faced in the content based filtering technique and the collaborative based filtering
method we can generalize from the method of root mean square error that the value for
hybrid filtering method is less so performance is higher for hybrid case. While we can
say that collaborative filtering technique stands good only in terms of the quality
perspective but when it comes to both qualitative and quantitative achievement of the
result will prefer hybrid filtering technique where the all flaws. While content based
filtering technique only outperform the collaborative in terms of similarity e the
collaborative filtering technique can you recommend one item to the other item of the
similar interest, the overall flaws can be removed by the hybrid based collaborative
filtering with two or more examination techniques are combined to gain the better
performance with the less possibilities of drawback of this system. In general in case of
hybrid filtering techniques the collaborative filtering technique is combined with some
other type of filtering technique to avoid the ramp up problem and thus it outperforms
the the major drawbacks of the system in case if we prefer to use single content based or
collaborative filtering technique.
So hybrid filtering recommender simply allows the user to select his own choices from a
given data which contain some attributes or some set of values which contain user
specific values and then recommend then the best movie which is based on the
similarities based calculating the the accumulator weight and then applies the algorithm
which is in our case K mean algorithm. Expected ratings are calculated internally where
the ideas from content and collaborative filtering are used to build a engine where
movies are suggested to the particular user and then estimation of the ratings takes place.
So in the the process of getting different results from different algorithms and techniques
hybrid approach is preferred to be better one between the content and collaborative
filtering techniques which simply overcomes the drawbacks of the the single algorithm
and then tries to improve the performance of the overall recommender system. Moreover
some other techniques like classification clustering can be used to get the best of the
recommendations which would simply increase our accuracy for the recommender
system. So the the better performance can be achieved in the end by a hybrid based
filtering technique which is why it is most preferable over the other two techniques.
System Framework
CONCLUSIONS

So for implementing a hybrid technique for content and collaborative based filtering we
take into account the hybrid approach which improves the overall performance of the
system and then recommended movies to the users as per the choice in a much better
way than the other two system of recommendation lower the mean average error, it
further increases the the accuracy of the recommender system and then we can use h
system of recommendation for future uses as well in a better way. We also have some
system computational bounds or limitations to perform the recommender system on the
large dataset here but we have done enough to distinguish between the various
recommender system which finally put hybrid system of recommendation on the top of
the all. Hence we can conclude that hybrid based filtering helps in getting the system
fragmentation much efficient enhance the Precision of the overall system and and no
doubt it is the the the mixture of both content in collaborative based filtering methods
where even if one method fails The Other takes over and maintains the overall accuracy
of the the system and and simply increase the performance overall all around.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy