ABHAY P
ABHAY P
By
Abhay Singh Soun
SSJUUV2247510001
2022-25
This is to certify that the project entitled, "Movie Recommendation System ", is bona-fide
work of Abhay Singh Soun bearing Enrollment No: (SSJUUV2247510001) submitted in
partial fulfillment of the requirements for the award of the degree of B. Voc. (B.Sc.)
INFORMATION TECHNOLOGY from Manas College of Science Technology and
Management, Pithoragarh.
Date:
DECLARATION
Movie Recommendation System Manas
College of Science Technology Management and has not been in any case duplicated to submit
to any other university for the award of any degree. To the best of my knowledge other than me,
no one has submitted to any other university.
The project is done in partial fulfillment of the requirements for the award of a degree of B. Voc.
(B.Sc.) INFORMATION TECHNOLOGY to be submitted as a final semester project as part of
our curriculum.
ABSTRACT
The rapid growth of digital streaming platforms has led to an overwhelming volume of content,
making it challenging for users to find movies that match their tastes. A Movie
Recommendation System serves as an essential tool to enhance user experience by filtering and
suggesting movies aligned with individual preferences. This project presents the development
and implementation of a Movie Recommendation System that combines collaborative filtering,
content-based filtering, and hybrid techniques to deliver accurate and personalized movie
suggestions. This Movie Recommendation System can be seamlessly integrated into streaming
platforms, helping users discover relevant content more efficiently and enhancing platform
engagement and retention. By providing tailored recommendations, this system not only
improves user satisfaction but also promotes a more diverse viewing experience, exposing users
to a broader range of movies. Future work includes exploring deep learning methods, such as
neural collaborative filtering and recurrent neural networks (RNNs), to further improve
recommendation accuracy. This project illustrates the potential of recommendation systems in
transforming digital content discovery and user engagement in the entertainment industry.
This abstract provides a more in-depth overview, touching on the algorithms, challenges, and
potential future enhancements for the system.
ACKNOWLEDGEMENT
First and foremost, I would want to express my gratitude to “Mr. Himanshu Punetha”, my
project mentor, for all of their help and support during this project. Their knowledge and
perceptions have greatly influenced the course and result of this effort.
I am thankful to the Professor’s for guiding me through this project and continuously
encouraging me. It would not have been possible to complete this project without his support.
I am also thankful to all the faculty members of Department of Information Technology, Manas
College of Science Technology and Management, Pithoragarh for helping me during the project.
I am grateful to my team, family and friends for their unending support without which
completion of this project was not possible.
In closing, I would want to thank all of the writers and scholars whose contributions have served
as a strong basis and source material for this project.
TABLE OF CONTENTS
INTRODUCTION
INTRODUCTION……………………………………………………………........................1
PROBLEM STATEMENT………………………………………………………………………...2
OBJECTIVE…………………………………………………………………............................3
METHODOLOGY.……….…………………………………………………………………………4
ORGANISATION…………………………………………………………………………………….14
SYSTEM DESIGN
Dataset………………………………………………………………………………………………..18
Algorithms Used…………………………………………………………………………............21
Concepts Requirements…………………………………………………………………………24
PERFORMANCE ANALYSIS
Comparisons and Results………………………………………………………..................25
CONCLUSIONS
Conclusion…………………………………………………………………………………40
Chapter 1
INTRODUCTION
Recommendation system is basically a filtering system that predicts the users choices and then
suggest them the the more accurate results based on the the previous likings of the users . We
have a variety of varied applications of this recommendation systems in which we can can be
used over the years and now used in various online platforms the basic content of all this
platforms are basically different types of movies such as action thriller romantic or maybe your
eCommerce website any platform of social media having a professional website such as
Linkedin .
For example when we use Instagram we can see the previous stories that on the feed of the
people we follow so here we can see that the Instagram can monitor our interaction with the
various people are our past activities and then it just suggest kind of other related stories of
some other accounts that have done some same kind of activity previously or currently.
Quite a few time is recommender system also keep improving the activities of a bunch of users
based on the activities they have scroll through you attempted. For example on Flipkart when
we buy some laptop or any mobile phone then it simply suggests mobile cover tempered glass
for mobile or buy USB type C adaptor or type A adaptor for the laptop also.
Safed enhancements in the recommender systems users get good recommendation all the time
and it keeps on improving as we move forward in the 21st century and they make almost
accurate solutions.
In case of clash of any e App Music any music platform or any educational then use a simply
deny using the app in addition to this the companies have to focus on their recommendation
system which is more Complex than it seems. Every user has different preferences and different
choices based upon their different type of activities sometime mood also so in case of music’s
while playing, travelling, running aur after having some fight in relationships etc.
PROBLEM STATEMENT
Recommender systems are tools that aims to get the user's rating and then recommend the
movies from a big set of data on the basis of the users matching interest and then classify them
into different categories. The sole purpose of the whole system of this recommendation is the
search for the content that it would fit into the person's interest for an individual's personal oasis.
However it takes into account different factors that would create some different list of content
that is specific to different categories of individual/ users .
AI based algorithms that recommender systems basically used creates a list of possible different
scenarios of devices and then customizing that all the interesting and matching interest/ choices
of the individual categories in the end. All the results are basically based on the different
activities that they have done previously such as how does the profile look what have gone
through the Chrome Browser Opera browser and other Browser which includes their previously
browsed history for considering the demographic traits or the possibility how they would like
the movie is based on the genre, a set of predictive modelling is constructed through the
data(big) which is available and then the movies are protected through the list of 2000 movies
set a bunch of few selected movies are recommended using different algorithms different
methods different similarity measures
OBJECTIVE
Movie recommendation system provides the mechanism and classifying the users with the
same interest and searches for the content that would be so much interesting belonging to
different set of users and then creating different kind of lists and providing interesting
recommendations to the individual based on the content the love. The main objective of the
recommender system is to used approaches suggest demographic filtering ,content based
filtering , collaborative filtering to find the set of movies with every user likes for specific set
of users.
The movies that have high probability of being liked by the general set of users will be
displayed to the user by the recommender in the end and then in another technique we will try
to find the users with different interest using the information collected through different
activities an Indian in collaborative filtering will test all those users which have same type of
interests to get the final set of movies to be recommended to the users individually.
So we will use different categories of recommender filtering techniques and then compare in
contrast that results obtained in different methods and will try to to improve the results as h
dataset for set of movies goes larger and larger above the computational bound of the system
which is generally a limitation on the large dataset.
METHODOLOGY
. Shorting the scores and then recommending the movies which are best rated for the users
So here numeric quantity that will be used to calculate the similarity between the two types of
movies will be cosine similarity and we will calculate the score it is very very fast to calculate
the the magnitude of the score which is obtained through the cosine similarity
. Getting the first 10 element of the list excluding the first one as it is the movie name in itself. .
Getting the top elements
Repeating above steps we will find the top movies based on the distances which it can get
rhe best possible recommendation, the movies that have high probability of being liked by the
general set of users will be displayed to the user by the recommender in the end and then in
another technique we will try to find the users with different interest using the information
collected through different activities an Indian in collaborative filtering will test all those
users which have same type of interests to get the final set of movies to be recommended to
the users individually. The cosine similarity is the cause of the angle between the two vectors
where the vectors are non zero and the inner product space it is described as the dot product
of the two vectors divide by by the product of the euclidean magnitude. In most cases cosine
similarity is used to get preffered recommendations for users.
3.Collaborative based Filtering: Content based filtering suffer from various limitations
which is only capable of the suggesting movies having only one type of users preferences and
then unable to provide recommendations in case of genres . However collaborative filtering
based system provides much complexibility in finding the record between the similarity of
user and the the likes of the users having similar interest. For measuring the similarity of
users views cosine similarity or pearson's correlation. Taking example in the below Matrix
every row has a user with column corresponding to the movies having the same similarity it
also has the ratings of different movies which the user have given to each movie has a target
user.
All the the collaborative filtering in case of user based is simple but it has also drawbacks the
biggest challenges that the choices of the users where is with time. Pre computing the Matrix
orphan let the problem of lower performance. So we can use the item based collaborative
filtering which basically considers the items based on the similarity with the items and that it it
find the similar matches with the target users the same similarity coefficients suggest pearson's
correlation or Cosine similarity can be used. Item based collaborative filtering is most static in
nature . Like blow example only one user which has related both Matrix and Titanic so similarity
which stands between them is only one . There may be cases where we have millions of users and
the similarity between those two different movies is very high as they have same rank for the user
who have rated them both.
In collaborative filtering try to find out the users have which have name interest and similar li
kes. In this case we don't use features of the item to recommend it but we use the classificatio n
of users into clusters of similar types and then seperate each cluster into the order of the pre
ference of the user. we can also use the cosine distance here which takes into account the user s
with the similar interest greater the cosine small angle between the two user. Here we simpl y
use the utility matrix we can assign the zero value to the sparse columns forming the calcul
ations easy. Item based Colaborative filtering is preferred in general because it takes into acc
ount the movie instead of the number of users which further only make the classification of th e
movies and user much easier. Hence the user based collaborative filtering is not preferred b
ecause it's simply only takes the user's into account and ignore the sparse values which create s
the issues in bringing out the performance of the recommender system.
4.Hybrid Based Filtering: It is simply a mixture of content based filtering and collaborative
based filtering methods where we will take the input as the the userid and the title of the
movie and the output will be e the similar movies shorted by the particular users based on the
expected ratings. Expected ratings are calculated internally where the ideas from content and
collaborative filtering are used to build a engine where movies are suggested to the particular
user and then estimation of the ratings takes place
In the comparisons section below we will see how movies are determined through the hybrid
technique of filtering where we have both used content based method as well as the
collaborative based filtering method. It is clear that hybrid filtering method is is good in most
of the cases and scenarios where it is difficult to distinguish or get the accuracy which the
users can get the recommended movies.
Hybrid Filtering Method
SYSTEM DESIGN
Dataset
1) For Content and Collaborative Based Filtering:
• Kaggle provided the data set. The Movie Recommendation System uses it as a standard
Dataset.
• We used the movie dataset from 'Movie Lens(Kaggle)' for the project.
Consists of 26,000,000 ratings and 750,000 taag applications applied to 45,000 movies by
270,000 users
Ratings are from 1-5 scale and taken from Group Lens Officially.
Visualizing the no. of users Voted
Algorithms Used
K-Means Algorithm:
K means clustering algorithm just simply create the cluster inside a cluster which have same
matching features in between them. The degree of closeness defines the the similarity basis as 2
how 2 points are related to each other. In this algorithm re simplify and centroid and then repeat
the the process until optimum centroid is is calculated or found . It simply determines the best
value for the K Centre points by iterative process and then assign each data point to the closest
nearest centre of K value.The number of clusters found from the data is denoted simply by the
notation 'K'. Simple unsupervised ml algorithm categorize the data points into subgroups even
from the very less information about the data.
K-Mean algorithm
● Jupyter Notebook
● Wamdp Server
● Visual Studio Code
● Sublime Text
● MYSQL
CONCEPTS REQUIREMENTS
• scikit-learn
• seaborn
• Panda bears
• matplotlib (matplotlib)
• Cleaning of data
PERFORMANCE ANALYSIS
3.) Collaborative based Filtering: In the collaborative filtering behaviour used here
item based collaborative filtering where we have taken 3 different types of metrics and
varied the results accordingly. Brief comparison of three of the metric used in the
collaborative filtering are are shown with the movies recommended from them based on
the the bounds set to the number of users and a number of ratings by a user to a movie.
Metric =”Cosine” Cosine similarity, or the cosine kernel, computes similarity as the
normalized dot product of X and Y: K(X, Y) = <X, Y> / (||X||*||Y||) On L2-normalized
data, this function is equivalent to linear_kernel.
Collaborative Based Filtering Method(Metric =Cosine) Output
Metric=”Cityblock”-> This function simply returns the valid pairwise distance metrics. It exists
to allow for a description of the mapping for each of the valid strings.
The function for the cityblock is as below
‘cityblock’ =metrics.pairwise.manhattan_distances
Collaborative Based Filtering Method(Metric =Cityblock) Output
Here we can see that the hybrid filtering technique stands good in in overcoming the the
issues faced in the content based filtering technique and the collaborative based filtering
method we can generalize from the method of root mean square error that the value for
hybrid filtering method is less so performance is higher for hybrid case. While we can
say that collaborative filtering technique stands good only in terms of the quality
perspective but when it comes to both qualitative and quantitative achievement of the
result will prefer hybrid filtering technique where the all flaws. While content based
filtering technique only outperform the collaborative in terms of similarity e the
collaborative filtering technique can you recommend one item to the other item of the
similar interest, the overall flaws can be removed by the hybrid based collaborative
filtering with two or more examination techniques are combined to gain the better
performance with the less possibilities of drawback of this system. In general in case of
hybrid filtering techniques the collaborative filtering technique is combined with some
other type of filtering technique to avoid the ramp up problem and thus it outperforms the
the major drawbacks of the system in case if we prefer to use single content based or
collaborative filtering technique.
So hybrid filtering recommender simply allows the user to select his own choices from a
given data which contain some attributes or some set of values which contain user
specific values and then recommend then the best movie which is based on the
similarities based calculating the the accumulator weight and then applies the algorithm
which is in our case K mean algorithm. Expected ratings are calculated internally where
the ideas from content and collaborative filtering are used to build a engine where
movies are suggested to the particular user and then estimation of the ratings takes place.
So in the the process of getting different results from different algorithms and techniques
hybrid approach is preferred to be better one between the content and collaborative
filtering techniques which simply overcomes the drawbacks of the the single algorithm
and then tries to improve the performance of the overall recommender system. Moreover
some other techniques like classification clustering can be used to get the best of the
recommendations which would simply increase our accuracy for the recommender
system. So the the better performance can be achieved in the end by a hybrid based
filtering technique which is why it is most preferable over the other two techniques.
System Framework
PERFORMANCE ANALYSIS
3.) Collaborative based Filtering: In the collaborative filtering behaviour used here
item based collaborative filtering where we have taken 3 different types of metrics and
varied the results accordingly. Brief comparison of three of the metric used in the
collaborative filtering are are shown with the movies recommended from them based on
the the bounds set to the number of users and a number of ratings by a user to a movie.
Metric =”Cosine” Cosine similarity, or the cosine kernel, computes similarity as the
normalized dot product of X and Y: K(X, Y) = <X, Y> / (||X||*||Y||) On L2-normalized
data, this function is equivalent to linear_kernel.
Collaborative Based Filtering Method(Metric =Cosine) Output
Metric=”Cityblock”-> This function simply returns the valid pairwise distance metrics. It exists to allow
for a description of the mapping for each of the valid strings.
The function for the cityblock is as below
‘cityblock’ =metrics.pairwise.manhattan_distances
Collaborative Based Filtering Method(Metric =Cityblock) Output
30
Comparitative output
Here we can see that the hybrid filtering technique stands good in in overcoming the the
issues faced in the content based filtering technique and the collaborative based filtering
method we can generalize from the method of root mean square error that the value for
hybrid filtering method is less so performance is higher for hybrid case. While we can
say that collaborative filtering technique stands good only in terms of the quality
perspective but when it comes to both qualitative and quantitative achievement of the
result will prefer hybrid filtering technique where the all flaws. While content based
filtering technique only outperform the collaborative in terms of similarity e the
collaborative filtering technique can you recommend one item to the other item of the
similar interest, the overall flaws can be removed by the hybrid based collaborative
filtering with two or more examination techniques are combined to gain the better
performance with the less possibilities of drawback of this system. In general in case of
hybrid filtering techniques the collaborative filtering technique is combined with some
other type of filtering technique to avoid the ramp up problem and thus it outperforms
the the major drawbacks of the system in case if we prefer to use single content based or
collaborative filtering technique.
So hybrid filtering recommender simply allows the user to select his own choices from a
given data which contain some attributes or some set of values which contain user
specific values and then recommend then the best movie which is based on the
similarities based calculating the the accumulator weight and then applies the algorithm
which is in our case K mean algorithm. Expected ratings are calculated internally where
the ideas from content and collaborative filtering are used to build a engine where
movies are suggested to the particular user and then estimation of the ratings takes place.
So in the the process of getting different results from different algorithms and techniques
hybrid approach is preferred to be better one between the content and collaborative
filtering techniques which simply overcomes the drawbacks of the the single algorithm
and then tries to improve the performance of the overall recommender system. Moreover
some other techniques like classification clustering can be used to get the best of the
recommendations which would simply increase our accuracy for the recommender
system. So the the better performance can be achieved in the end by a hybrid based
filtering technique which is why it is most preferable over the other two techniques.
System Framework
CONCLUSIONS
So for implementing a hybrid technique for content and collaborative based filtering we
take into account the hybrid approach which improves the overall performance of the
system and then recommended movies to the users as per the choice in a much better
way than the other two system of recommendation lower the mean average error, it
further increases the the accuracy of the recommender system and then we can use h
system of recommendation for future uses as well in a better way. We also have some
system computational bounds or limitations to perform the recommender system on the
large dataset here but we have done enough to distinguish between the various
recommender system which finally put hybrid system of recommendation on the top of
the all. Hence we can conclude that hybrid based filtering helps in getting the system
fragmentation much efficient enhance the Precision of the overall system and and no
doubt it is the the the mixture of both content in collaborative based filtering methods
where even if one method fails The Other takes over and maintains the overall accuracy
of the the system and and simply increase the performance overall all around.