UNIT I - Introduction-Recommender Systems
UNIT I - Introduction-Recommender Systems
UNITI INTRODUCTION 6
Introduction andbasictaxonomyofrecommendersystems-Traditionalandnon-personalized
RecommenderSystems-Overviewofdataminingmethodsforrecommendersystems-similarity
measures- Dimensionality reduction – Singular Value Decomposition (SVD) Suggested
Activities:
• Practicallearning–ImplementDatasimilaritymeasures.
• ExternalLearning–SingularValueDecomposition(SVD)applications
SuggestedEvaluationMethods:
• QuizonRecommendersystems.
• QuizofpythontoolsavailableforimplementingRecommendersystems
INTRODUCTION:
Recommender systems, also known as recommendation systems or engines, area
type of software application designed to provide personalized suggestions or
recommendations to users. These systems are widely used in various online platforms
and services to help users discover items or content of interest. Recommender systems
leverage data about users' preferences, behaviors, and interactions to generate accurate
and relevant recommendations.
WHATARERECOMMENDERSYSTEMS?
Recommender systems are sophisticated algorithms designed to provide product-
relevant suggestions to users. Recommender systems play a paramount role in
enhancing user experiences on various online platforms, including e-commerce
websites, streaming services, and social media.
Essentially, recommender systems aim to analyze user data and behavior to make
tailored recommendations.
• Data collection:Recommender systems start by gathering data on user interactions,
preferences,andbehaviors.Thisdatacanincludepastpurchases,browsinghistory,
ratings,andsocialconnections.
• Dataprocessing:Oncecollected,theyprocessthedatatoextractmeaningfulpatterns and
insights. This involves techniques like data cleaning, transformation, and feature
engineering.
• Algorithm selection: Depending on the specific platform and its data, a specific
recommender algorithm is applied to generate recommendations. Common types
include collaborative filtering, content-based filtering, and hybrid methods.
• User profiling: Using historical data, recommender systems create user profiles.
These represent their preferences, interests, and behavior, allowing the system to
understand individual tastes.
• Item profiling:Similarly, items or content available on the platform are also profiled
based on their characteristics. Think of attributes like genres, keywords, or product
features.
• Recommendation generation: The next step involves algorithms matching user
profiles with item profiles. For example, collaborative filtering identifies users with
similar preferences and recommends items liked by others with similar profiles.
Content-based filtering recommends items based on the attributes of items users have
previously interacted with.
1
• Ranking and presentation: Finally, the recommended items are ranked based on
their relevance to the user. The top-ranked items are then presented to the user
through interfaces like recommendation lists, personalized emails, or pop-up
suggestions.
There are several types of recommender systems, each with its ownapproach
to generating recommendations. The basic taxonomy of recommender systems
includes:
2
a. Content-BasedRecommenderSystems:
• Overview: Content-based systems recommend items based on the features of the
items themselves and the preferences expressed by the user.
• Key Components: The system analyzes the content of items and creates user
profiles based on the features of items the user has liked or interacted with in the
past.
b. CollaborativeFilteringRecommenderSystems:
• Overview: Collaborative filtering relies on user-item interactions and
recommendations from other users with similar preferences to make predictions
for a target user.
• Types:
o User-BasedCollaborativeFiltering:Recommendsitemsbasedon the
preferences of users who are similar to the target user.
o Item-BasedCollaborativeFiltering:Recommendsitemsthataresimilar to
those the user has liked or interacted with in the past.
c. HybridRecommenderSystems:
• Overview: Hybrid systems combine multiple recommendation techniques to
overcome the limitations of individual methods, providing more accurate anddiverse
recommendations.
• Types:
o WeightedHybrid:Assignsdifferentweightstorecommendationsfrom
different methods and combines them.
o SwitchingHybrid:Switchesbetweendifferentrecommendationmethodsbasedon certain
conditions or user interactions.
3
Therearesomemoretypesofrecommendersystems.Theyare:
i. MatrixFactorizationRecommenderSystems:
• Overview:Matrixfactorizationmodelsdecompose the user-item interactionmatrix
into latent factors, allowing the system to make predictions based on these factors.
ii. Context-AwareRecommenderSystems:
• Overview: Context-aware systems take into account additional contextual
information, such as time, location, or user activity, to enhance the relevance of
recommendations.
iii. Knowledge-BasedRecommenderSystems:
• Overview: Knowledge-based systems recommend items by taking into account
explicit knowledge about user preferences, requirements, and item characteristics.
iv. DeepLearning-BasedRecommenderSystems:
GOALSOFRECOMMENDERSYSTEMS
Thetwoprimarymodels areas follows:
1. Prediction version of problem: The first approach is to predict the rating value for a
user-item combination. It is assumed that training data is available, indicating user
preferencesfor items. For m users and n items, this corresponds to an incomplete m×n
matrix, where the specified (or observed) values are used for training. The missing (or
unobserved) values are predicted using this training model. This problem is also
referred to as the matrix completion problem because we have an incompletely
specified matrix of values, and the remaining values are predicted by the learning
algorithm.
4
2. Ranking version of problem: In practice, it is not necessary to predict the ratings of
users for specific items in order to make recommendations to users. Rather, a merchant
may wish to recommend the top-k items for a particular user, or determine the top-k
users to target for a particular item. The determination of the top-k items is more
common than the determination of top-k users, although the methods in the two cases
are exactly analogous.
ExamplesofRecommenderSystems
In today’s digital age, we are bombarded with vast information and choices. From
online shopping to streaming services, it can be overwhelming to navigate throughthe
plethora of options available. This is where recommender systems come in – they help
us make sense of the endless choices by suggesting relevant options based on our
interests and preferences.
5
1. Netflix
Netflix’srecommendationengine isperhapsthemostwell-knownandwidelyused
recommender system. It uses an algorithm to analyze a user’s viewing history, rating,
and search behavior to suggest movies and TV shows that the user is likely to enjoy.
The algorithm takes into account the genre, the actors, the director, and other factors to
make personalized recommendations for each user.
2. Amazon
Amazon’s recommendation engine suggests products based on a user’s purchase
history, search history, and browsing behavior. It makes personalizedrecommendations
based on the user’s prior purchases, products viewed, and items added to their shopping
cart.
3. Spotify
Spotify’s music recommendation system suggests songs, playlists, and albums
depending on a user’s listening history, liked songs, and search history. It tailors
recommendations based on the user’s listening habits, favorite genres, and favorite
artists.
4. YouTube
YouTube’s recommendation engine suggests videos based on a user’s viewing
history, liked videos, and search history. The algorithm considers factors such as the
user’s favourite channels, the length of time spent watching a video, and other viewing
habits to make personalized recommendations.
5. LinkedIn
LinkedIn’s recommendation engine suggests jobs, connections, and content based on
a user’s profile, skills, and career history. To make personalized recommendations, the
algorithm takes the user’s job title, industry, and location.
6. Zillow
Zillow’s recommendation system suggests real estate properties depend on a user’s
searchhistoryandpreferences.Userscanreceivepersonalizedrecommendations based on
their budget, location, and desired features.
7. Airbnb
Airbnb’srecommendationsystem suggestsaccommodationsbasedonauser’s search
history, preferences, and reviews. Personal recommendationsare made basedon factors
such as the user’s travel history, location, and desired amenities.
8. Uber
Uber’s recommendation system suggests ride options created on a user’s previous
rides and preferred options.When recommending rides, the algorithm considersfactors
such as the user’s preferred vehicle type, location, and other preferences.
9. GoogleMaps
GoogleMaps’recommendation systemsuggestsplacestovisit,eat,andshopbased on a
user’s search history and location. Personalized recommendations are generated based
on factors such as the user’s location, time of day, and preferences.
10. Goodreads
6
Goodreads’ recommendation engine suggests books centred on a user’s reading history,
ratings, and reviews. To provide personalized recommendations, the algorithm
considers factors such as the user’s reading habits, genres, and favorite authors.
From online shopping to entertainment and travel. These systems have significantly
improved the user experience by suggesting relevant options based on our interests and
preferences. The success of these real-world examples showcases the power and
effectiveness of recommender systems in various industries. With advancements in
artificial intelligence, recommender systems are expected to become even more accurateand
personalized in the future.
A. TraditionalRecommenderSystems:
• Overview: Traditional recommender systems typically use explicit input featuresor
rules to generate recommendations. These systems often rely on general attributes
of items or users, and their recommendations arenot personalized to the
specificpreferencesorbehaviorofindividualusers.
• ExampleFeatures:
o Genreofamovie
o Authorofabook
o Popularityoroverallratingsof items
• Methodology:
7
o Recommendations are made based on fixed criteria or predetermined
rules.
o Users receive the same recommendations regardless of their unique
preferences.
o
• Advantages:
o Simplicityandeaseof implementation.
o Lessrelianceonindividualuserdata.
o Suitableforscenarioswherepersonalizationisnotacriticalfactor.
B. Non-personalizedRecommenderSystems:
• Overview: Non-personalized recommender systems provide the same set of
recommendations to all users, without considering individual user preferences or
behaviors.Thesesystemsoftenfocusonprovidingpopularortrendingitemsthat
arelikelytoappealtoabroadaudience.
• Examples:
o "Top10"listsorrankings
o Bestsellers
o Mostvieweditems
• Methodology:
o Recommendations are based on aggregate data, such as overall popularityor
global trends.
o Allusersreceiveidenticalrecommendations.
• Advantages:
o Easytoimplementandcomputationally efficient.
o Applicableinscenarioswherepersonalizationisnotfeasibleor necessary.
o Canbeeffectivefornewusersorwhen limiteduserdataisavailable.
PERSONALIZEDRECOMMENDERSYSTEMS
Based on the user’s data such as purchases or ratings, personalized recommenders try to
understand and predict what items or content a specific user is likely to be interested in. In
that way, every user will get customized recommendations.
whatmakesagoodrecommendation?
8
• Ispersonalized(relevanttothatuser),
• Isdiverse(includesdifferentuserinterests),
• Doesn’trecommendthesameitemstousersforthesecondtime, and
• Recommendsavailableproductsattherighttime.
Thereareafewtypesofpersonalizedrecommendationsystems,includingcontent-based filtering,
collaborative filtering, and hybrid recommenders.
TYPESOFPERSONALIZEDRECOMMENDERSYSTEMS
Personalized recommender systems can be categorized into several types, each with its own
methods and techniques for providing tailored recommendations.
Theseinclude:
• Content-basedfiltering,
• Collaborativefiltering,and
• Hybridrecommenders.
CONTENT-BASEDFILTERING
• Content-based recommender systems use items or user metadata to create specific
recommendations. To do this, we look at the user’s purchase history.
• For example, if a user has already read a book from one author or a product from a
certain brand, you assume that they have a preference for that author or that brand.
Also, there is a probability that they will buy a similar product in the future.
9
Let’s assume that Jenny loves sci-fi books and her favorite writer is Walter Jon Williams. If
she reads the Aristoi book, then her recommended book will be Angel Station, also a sci-fi
book written by Walter Jon Williams.
Prosofthecontent-basedapproach
Advantages
• Less cold-start problem: Content-based recommendations can effectively address the
“cold-start” problem, allowing new users or items with limited interaction history to
still receive relevant recommendations.
• Transparency: Content-based filtering allows users to understand why a
recommendationismadebecauseit’sbasedonthecontentandattributesofitems
they’vepreviouslyinteractedwith.
• Diversity: Considering various attributes, content-based systems can provide diverse
recommendations. For example, in a movie recommendation system,
recommendations can be based on genre, director, and actors.
10
• Reduced data privacy concerns: Since content-based systems primarily use item
attributes, they may not require as much user data, which can mitigate privacy
concerns associated with collecting and storing user data.
Disadvantagesofthecontent-basedapproach
• The “Filter bubble”: Content filtering can recommend only content similar to the user’s
pastpreferences.Ifauserreadsabookaboutapoliticalideologyandbooksrelated tothat
ideologyarerecommendedtothem,theywillbeinthe“bubbleoftheirprevious
interests”.
• Limited serendipity: Content-based systems may have limited capability to recommend
items that are outside a user’s known preferences.
• In the first case scenario, 20% of items attract the attention of 70-80% of users and 70-
80% of items attract the attention of 20% of users. The recommender’s goal is to
introduce other products that are not available to users at first glance.
• Inthesecondcasescenario,content-basedfilteringrecommendsproductsthatarefitting
content-wise, yet very unpopular (i.e. people don’t buy those products for some reason,
for example, the book is bad even though it fits thematically).
• Over-specialization:Ifthecontent-basedsystemreliestooheavilyonauser’spast
interactions, itcanrecommend items thataretoosimilar towhattheuserhasalready seen or
interacted with, potentially missing opportunities for diversification.
COLLABORATIVEFILTERING
• Collaborative filtering is a popular technique used to provide personalized
recommendations to users based on the behavior and preferences of similar users.
• The fundamental idea behind collaborative filtering is that users who have interacted
with items in similar ways or have had similar preferences in the past are likely to
have similar preferences in the future, too.
• Collaborativefilteringreliesonthecollectivewisdomoftheusercommunityto
generaterecommendations.
Therearetwomaintypesofcollaborativefiltering:memory-basedandmodel-based.
Memory-based recommenders
• Memory-basedrecommendersrelyonthedirectsimilaritybetweenusersoritemsto
makerecommendations.
• Usually, these systems use raw, historical user interaction data, such as user-item
ratings or purchase histories, to identify similarities between users or items and
generate personalized recommendations.
• The biggest disadvantage ofmemory-based recommenders isthat they requirea lotof
datatobestoredandcomparingeveryitem/userwitheveryitem/userisextremely
computationallydemanding.
• Memory-based recommenders can be categorized into two main types user-based and
item-based collaborative filtering.
Auser-basedcollaborativefilteringrecommendersystem
11
• With the used-based approach, recommendations to the target user are made by
identifyingotheruserswhohaveshownsimilarbehaviororpreferences.This
translates to finding users who are most similar to the target user based on their
historical interactions with items. This could be “users who are similar to you also
liked…” type of recommendations.
• Butifwesaythatusersaresimilar,whatdoesthatmean?
• Let’ssaythatJennyandTombothlovesci-fibooks.Thismeansthat,whenanewsci-
fi book appears and Jenny buys that book, that same book will be recommended to
Tom, since he also likes sci-fi books.
Anitem-basedcollaborativefilteringrecommendersystem
12
• The idea is to find items that share similar user interactions and recommend those
items tothetarget user.Thiscaninclude “userswholiked thisitem alsoliked…” type of
recommendations.
• Toillustratewithanexample,let’sassumethatJohn,Robert,andJennyhighlyrated
sci-fi books Fahrenheit 451 and The Time Machine, giving them 5 stars. So, when
Tom buys Fahrenheit 451, the system automatically recommends The Time Machine
to him because it has identified it as similar based on other users’ ratings.
Howtocalculateuser-useranditem-itemsimilarities?
• Unlike thecontent-basedapproachwheremetadataaboutusersoritemsisused,inthe
collaborative filtering memory-based approach we are looking at the user’s behavior
e.g.whethertheuserlikedorratedanitemorwhethertheitem wasliked orratedbya certain
user.
• Forexample,theideaistorecommendRobertthenewsci-fibook.Let’slookatthe
stepsinthisprocess:
• Createauser-item-ratingmatrix.
• Createauser-usersimilaritymatrix:Cosinesimilarityiscalculated(alternatives:
adjusted cosine similarity, Pearson similarity, Spearman rank correlation) between
everytwousers.Thisishowwegeta user-usermatrix. Thismatrix issmaller than the initial
user-item-rating matrix.
• Look up similar users: In the user-user matrix, we observe users that are mostsimilar
to Robert.
• Candidate generation: When we find Robert’s most similar users, we look at all the
books these users read and the ratings they gave them.
• Candidate scoring: Depending on the other users’ratings, booksare ranked from the
ones they liked the most, to the ones they liked the least. The results are normalizedon
a scale from 0 to 1.
• Candidate filtering: We check if Robert has already bought any of these books and
eliminate those he already read.
• The item-item similarity calculation is done in an identical way and has all the same
steps as user-user similarity.
Model-basedrecommenders
• Model-basedrecommendersmakeuseofmachinelearningmodelstogenerate
recommendations.
13
• These systems learn patterns, correlations, and relationships from historical user-item
interaction data to make predictions about a user’s preferences for items they haven’t
interacted with yet.
• Therearedifferenttypesofmodel-basedrecommenders,suchasmatrixfactorization,
SingularValueDecomposition(SVD),orneuralnetworks.
• However, matrix factorization remains the most popular one, so let’s explore it a bit
further.
Matrixfactorization
• Matrix factorization is a mathematical technique used to decompose a large matrix into
the product of multiple smaller matrices.
• In the context of recommender systems, matrix factorization is commonly employed to
uncover latent patterns or features in user-item interaction data, allowing for personalized
recommendations. Latent information can be reported by analyzing user behavior.
• Ifthereisfeedbackfromtheuser,forexample–theyhavewatchedaparticularmovieor
read a particular book and have given a rating, that can be represented in the form of a
matrix. In this case,
o Rowsrepresentusers,
o Columnsrepresentitems,and
o The values in the matrixrepresent user-item interactions(e.g., ratings,purchase
history, clicks, or binary preferences).
Since it’s almost impossible for the user to rate every item, this matrix will have many
unfilled values. This is called sparsity.
Thematrixfactorizationprocess
Matrix factorization aims to approximate this interaction matrix by factorizing it into two or
more lower-dimensional matrices:
• User latent factor matrix (U), which contains information about users and their
relationships with latent factors.
• Item latent factor matrix (V), which contains information about items and their
relationships with latent factors.
The rating matrix is a product of two smaller matrices – the item-feature matrix and the user-
feature matrix. The higher the score in the matrix, the better the match between the item and
the user.
14
Thematrixfactorizationprocessincludesthefollowingsteps:
• Initializationofrandomuseranditemmatrix,
• Theratingsmatrixisobtainedbymultiplyingtheuserandthetransposeditemmatrix,
• Thegoalofmatrixfactorizationisto minimizethelossfunction(thedifferencein the
ratingsofthepredictedandactualmatricesmustbeminimal).Eachratingcanbe
describedasadotproductofarowintheusermatrix andacolumnintheitemmatrix.
Minimizationoflossfunction
• Where K is a set of (u, i) pairs, r(u, i) is the rating for item i by user u and λ is a
regularization term (used to avoid overfitting).
• In order to minimize loss function we can apply Stochastic Gradient Descent (SGD)or
Alternating Least Squares (ALS). Both methods can be used to incrementally update
the model as new rating comes in. SGD is faster and more accurate than ALS.
Advantagesofcollaborativefiltering
• Effective personalization: Collaborative filtering is highly effective in providing
personalizedrecommendationstousers.Ittakesintoaccountthebehaviorandpreferences of
similar users to suggest items that a particular user is likely to enjoy.
• No need for item attributes: Collaborative filtering works solely based on user-item
interactions,makingitapplicabletoawiderangeofrecommendationscenarioswhere
item features may be sparse or unavailable. This is especially useful in content-rich
platforms.
15
• Serendipitous(unanticipated)discoveries:Collaborativefilteringcanintroduceusersto
items they might not have discovered otherwise. By analyzing user behaviors and
identifyingpatternsacrosstheusercommunity,collaborativefilteringcanrecommend
itemsthatalignwithauser’stastesbutmaynotbeimmediatelyobvioustothem.
Disadvantagesofcollaborativefiltering
It’s important to note that while collaborative filtering offers these and other advantages, it
also has its limitations, including:
The“cold-start”problem:
• User cold start occurs when a new user joins the system without any prior interaction
history. Collaborative filtering relies on historical interactions to make
recommendations, so it can’t provide personalized suggestions to new users who start
withnodata.
• Itemcoldstarthappens whenanew itemisadded,and there’s nouserinteraction data for it.
Collaborative filtering has difficulty recommending new items since it lacks
information about how users have engaged with these items in the past.
• Sensitivitytosparsedata:Collaborativefilteringdependsonhavingenoughuser-
item interaction data to provide meaningful recommendations. In situations wheredata
is sparse and users interact with only a small number of items, collaborative filtering
may struggle to find useful patterns or similarities between users and items.
• Potentialforpopularitybias:Collaborativefilteringtendstorecommendpopular
items more frequently. This can lead to a “rich get richer” phenomenon, wherealready
popular items receive even more attention, while niche or less-known items are
overlooked.
• Toaddresstheseandotherlimitations,recommendationsystemsoftenusehybrid
approaches that combine collaborative filtering with content-based methods or other
techniques to improve recommendation quality in the long run.
HYBRIDRECOMMENDERS
16
compared to individual methods, benefiting users by offering more relevant
suggestions.
• Enhancedrobustnessandflexibility:Hybridmodelsareoftenmorerobustin
handling various recommendation scenarios. They can adapt to different data
characteristics, user behaviors, and recommendation challenges. This flexibility is
valuable in real-world recommendation systems.
• Addressingcommonrecommendationlimitations:Hybridrecommenderscan
mitigate the limitations of individual recommendation techniques. For example, they
can overcome the “cold-start” problem for new users and items by incorporating
content-based recommendations, providing serendipitous suggestions, and reducing
popularity bias.
Consofhybridrecommenders
• Justlikeallotherrecommenderssystems,hybridrecommendershavetheirdownsides, too.
Some include:
EVALUATIONMETRICSFORRECOMMENDERSYSTEMS
• To assess the performance and effectiveness of recommender systems, you have
totake into consideration certain evaluation metrics.
• Theycanhelpyoumeasurehowwellarecommendationalgorithmormodelis performing
and provide insights into its strengths and weaknesses.
• Thereareseveralcategoriesofevaluationmetrics,dependingonthespecificaspectof
recommendations being assessed.
Somecommonevaluationmetricsinclude:
• Accuracy metricsassess the accuracy of the recommendations made by a system in
termsofhowwelltheymatch theuser’sactualpreferences orbehavior.Herewehave
17
Mean Absolute Error (MAE), Root Mean Square Error (RMSE), or Mean Squared
Logarithmic Error (MSLE).
• Ranking metrics evaluate how well a recommender system ranks items for a user,
especially in top-N recommendation scenarios. Think of hit rate, average reciprocal
hit rate (ARHR), cumulative hit rate, or rating hit rate.
• Diversity metrics assess the diversity of recommended items to ensure that
recommendations are not overly focused on a narrow set of items. These include
Intra-List Diversity or Inter-List Diversity.
• Noveltymetricsevaluatehowwellarecommendersystemintroducesuserstonewor
unfamiliar items. Catalog coverage and item popularity belong to this category.
• Serendipity metrics assess the system’s ability to recommend unexpected but
interesting items to users – surprise or diversity are looked at in this case.
You can also choose to look at some business metrics such as conversion rate, click-
through rate (CTR), or revenue impact. But, ultimately, the best way to do an online
evaluation of your recommender system is through A/B testing.
OVERVIEWOFDATAMININGMETHODS:
Recommender Systems (RS) typically apply techniques and methodologies from other
neighboring areas– such as Human Computer Interaction (HCI) or Information Retrieval
(IR). However, most of these systems bear in their core an algorithm that can be
understood as a particular instance of a Data Mining (DM) technique
Data mining methods play a crucial role in building effective recommender systems by
extracting patterns and insights from large datasets. One key aspect of recommender
systems involves measuring similarity between users, items, or both. Let's explore an
overview of data mining methods for recommender systems and common similarity
measures:
DataMiningMethodsforRecommenderSystems:
1. AssociationRuleMining:
• Overview: Association rule mining identifies relationships or patterns in user-
item interactions. It helps discover associations between items that are frequently
co-purchased or co-viewed.
• Application: Generating recommendations based on association rules, e.g.,"Users
who bought X also bought Y."
2. ClusteringAlgorithms:
• Overview:Clusteringmethodsgroupusersoritemswithsimilarcharacteristics. Users or
items within the same cluster are likely to share common preferences.
• Application: Recommending items popular within a user's cluster, assuming similar
preferences within the group.
3. ClassificationAlgorithms:
• Overview: Classification models predict user preferences for items based on
historical interactions. These models can be trained to classify items as relevant or
irrelevant to a user.
• Application: Providing recommendations by predicting user preferences for itemsnot
yet interacted with.
18
4. MatrixFactorization:
• Overview: Matrix factorization techniques decompose the user-item interaction
matrix into latent factors, capturing hidden patterns and relationships. Singular Value
Decomposition (SVD) and Alternating Least Squares (ALS) are common matrix
factorization methods.
• Application: Predicting missing values in the user-item matrix to recommend itemsa
user might like.
5. DeepLearningModels:
• Overview: Deep learning models, such as neural networks, can capture complex
patterns in user-item interactions. Neural collaborative filtering is an example
where embeddings are used to represent users and items.
• Application: Learning intricate user-item relationships for more accurate and
personalized recommendations.
SimilarityMeasures:
Different data types require different functions to measure the similarity of data points.
Diffentiation between unary, binary and quantitative data helps with most problems. Unary
data could be the number of likes for a blog post. Binary data could be likesanddislikes of a
video and quantitative data could be rating provided like 4/10 stars or similar. The following
table summarises which similarity functions are suitable for different data types.
1. CosineSimilarity:
• Definition: Measures the cosine of the angle between two vectors, representing
users or items, in a multidimensional space.
• Cosine similarity is a measure used to determine the similarity between two non-
zero vectors in a vector space. It calculates the cosine of the angle between the
vectors, representing their orientation and similarity.
• A · B denotes the dot product of vectors A and B, which is the sum of the element-
wise multiplication of their corresponding components.
• ||A||representstheEuclideannormormagnitudeofvectorA,calculatedasthe square root
of the sum of the squares of its components.
• ||B||representstheEuclideannormormagnitudeofvectorB.
The resulting value ranges from -1 to 1, where 1 indicates that the vectors are in the same
direction (i.e., completely similar), -1 indicates they are in opposite directions (i.e.,
completelydissimilar),and0indicatestheyareorthogonalorindependent(i.e.,no
19
similarity). It is particularly useful in scenarios where the magnitude of the vectors is not
significant, and the focus is on the direction or relative orientation of the vectors.
Dimensionality Independence:It is not affected by the magnitude or length of vectors. It
solely focuses on the direction or orientation of the vectors. This property makes itvaluable
when dealing with high-dimensional data or sparse vectors, where the magnitude of the
vectors may not be as informative as their relative angles or orientations.
Sparse Data: It is particularly effective when working with sparse data, where vectors
have many zero or missing values. In such cases, the non-zero elements play a crucial role
in capturing the meaningful information and similarity between vectors.
• Application: Inrecommender systems, cosine similarity can be used to measure the
similarity between user preferences or item characteristics, aiding in generating
personalised recommendations based on similar user preferences or item profiles.
2. PearsonCorrelationCoefficient:
• Definition: Measures linear correlation between two variables, providing a
measure of the strength and direction of a linear relationship.
• ThePearsoncorrelationcoefficient,alsoknownasPearson’scorrelationorsimply
correlation coefficient, is a statistical measure that quantifies the linear
relationship between two variables. It measures how closely the data points of the
variables align on a straight line, indicating the strength and direction of the
relationship.
ThePearsoncorrelationcoefficientisdenotedbythesymbol“r”andtakesvalues
between -1 and 1. The coefficient value indicates the following:
• r=1:Perfectpositivecorrelation.Thevariableshaveastrongpositivelinear
relationship, meaning that as one variable increases, the other variable also
increases proportionally.
• r = -1: Perfect negative correlation. The variables have a strong negative linear
relationship,meaningthatasonevariableincreases,theothervariabledecreases
proportionally.
• r=0:Nolinearcorrelation.Thereisnolinearrelationshipbetweenthevariables. They
are independent of each other.
• Application:Evaluatinghowwellusers'preferencesalign,especiallyin scenarios
with numerical ratings.
3. JaccardSimilarity:
• Definition: Measures the intersection over the union of sets, quantifying the
similarity between two sets.
• It calculates the size of the intersection of the sets divided by the size of their union.
Theresultingvaluerangesfrom0to1,where0indicates nosimilarityand1indicates
complete similarity.
20
• Inotherwords,tocalculatetheJaccardsimilarity,youneedtodeterminethecommon
elements between the sets of interest and divide it by the total number of distinct
elements across both sets.
• Inotherwords,tocalculatetheJaccardsimilarity,youneedtodeterminethecommon
elements between the sets of interest and divide it by the total number of distinct
elements across both sets.
• It isuseful because itprovides a straightforward and intuitive measure to quantify the
similarity between sets. Its simplicity makes it applicable in various domains and
scenarios.
• Herearesomekeyreasonsforitsusefulness:
• Set Comparison: It enables the comparison of sets without considering the
specific elements or their ordering. It focuses on the presence or absence of
elements, making it suitable for cases where the structure or attributes of the
elements are not important or would need additional feature engineering,
which would slow down the system.
• Scale-Invariant:Itremainsunaffectedbythesizeofthesetsbeingcompared. It
solely relies on the intersection and union of sets, making it a robust measure
even when dealing with sets of different sizes.
• Binary Data: It is particularly suitable for binary data, where elements are
either present or absent in the sets. It can be applied to scenarios where the
presence or absence of specific features or attributes is important for
comparison.
• Applications
• In the context of a recommender system, Jaccard similarity can be used to
identify users with similar item preferences and recommend items that are
highly rated or popular among those similar users. By leveraging Jaccard
similarity,therecommendercanenhancethepersonalisationof
recommendations and help users discover relevant items based on the
preferences of users with similar tastes.
• Assessingsimilaritybetweensetsofitemslikedorinteracted withbyusers.
4. EuclideanDistance:
• Definition:Representsthestraight-linedistancebetweentwopointsina
multidimensional space.
• Application:Quantifyingthedissimilarityorproximitybetweenuseroritem vectors.
5. ManhattanDistance:
• Definition:Measuresthedistancebetweentwopointsbysummingtheabsolute
differences along each dimension.
• Application:SimilartoEuclideandistance,butmaybelesssensitivetooutliers.
6. HammingDistance:
• Definition: Measures the number of positions at which corresponding bits differ in
two binary strings.
• Application:Suitableforcomparingbinaryuserprofilesoritem representations.
21
Choosing the appropriate data mining method and similarity measure depends on
the characteristics of the data, the nature of the recommendation problem, and
computational considerations. Hybrid approaches that combine multiple methods or
measures often yield more robust and accurate recommendations.
DIMENSIONALITYREDUCTION:
Overview:
Dimensionality reduction is a technique used to reduce the number of features
(dimensions) in a dataset while preserving its essential information. In the context of
recommender systems, dimensionality reduction is often applied to user-item
interaction matrices to capture latent factors that represent hidden patterns in the data.
By reducing the dimensionality, the computational complexity decreases, and the
model becomes more efficient.
Methods:
• Principal Component Analysis (PCA): PCA is a popular linear dimensionality
reduction method that transforms the original features into a new set of uncorrelated
variables (principal components) while preserving the variance in the data.
• Singular Value Decomposition (SVD): SVD is a matrix factorization techniquethat
decomposes a matrix into three other matrices, capturing latent factors. It is
commonly used in collaborative filtering for recommender systems.
• Non-Negative MatrixFactorization(NMF):NMFdecomposesa matrixinto
two lower-rank matrices with non-negative elements, making it suitable for
scenarios where non-negativity is a meaningful constraint.
ApplicationsinRecommenderSystems:
• ReducingSparsity:Recommendersystemdatasetsareoftensparse,withmany missing
values in the user-item interaction matrix. Dimensionality reduction helps in
fillinginmissingvaluesbyapproximatingtheoriginalmatrixwithlower-rank
approximations.
• CapturingLatentFactors:Byreducingthedimensionality,latentfactors
representing user preferences and item characteristics can be identified, leading
tomoreefficientandeffective recommendations.
SINGULARVALUEDECOMPOSITION:
When itcomes to dimensionality reduction, the Singular Value Decomposition (SVD) is
a popular method in linear algebra for matrix factorization in machine learning. Such a
method shrinks the space dimension from N-dimension to K-dimension (where K<N)
and reduces the number of features. SVD constructs a matrix with the row of users and
columns of items and the elements are given by the users’ ratings. Singular value
decomposition decomposes a matrix into three other matrices and extracts the factors
from the factorization of a high-level (user-item-rating) matrix.
• MatrixU:singularmatrixof(user*latentfactors)
• MatrixS:diagonalmatrix(showsthestrengthofeachlatentfactor)
• MatrixU:singularmatrixof(item*latentfactors)
22
From matrix factorization, the latent factors show the characteristics of the items.
Finally, the utility matrix A is produced with shape m*n. The final output of the matrix
A reduces the dimension through latent factors’ extraction. From the matrix A, it shows
the relationships between users and items by mapping the user and item into r-
dimensionallatentspace.VectorX_iisconsideredeachitemandvectorY_uis regarded
aseachuser.TheratingisgivenbyauseronanitemasRui =XTi∗Y u.Thelosscanbe
minimized by the square error difference between the product of R_ui and the expected
rating.
Regularization is used to avoid overfitting and generalize the dataset by adding the
penalty.
Here,weaddabiastermtoreducetheerrorofactualversuspredictedvaluebythe model.
(u,i):user-itempair
μ:theaverageratingofallitems
bi:averageratingofitemiminusμ
bu:theaverageratinggivenbyuseruminusμ
Theequationbelowaddsthebiastermandtheregularizationterm:
IntroductiontotruncatedSVD
When it comes to matrix factorization technique, truncated Singular Value
Decomposition(SVD) is a popular method to produce features that factors a matrix M
into the three matrices U, Σ, and V. Another popular method is Principal Component
Analysis (PCA). Truncated SVD shares similarity with PCA while SVD is produced
from the data matrix and the factorization of PCA is generated from the covariance
matrix. Unlike regular SVDs, truncated SVD produces a factorization where the number
of columns can be specified for a number of truncation. For example, given an n x n
matrix, truncated SVD generates the matrices with the specified number of columns,
whereas SVD outputs n columns of matrices.
TheadvantagesoftruncatedSVDover PCA
Truncated SVDcandeal withsparsematrix togeneratefeatures’matrices,whereasPCA
would operate on the entire matrix for the output of the covariance matrix.
1. Hands-onexperienceofpythoncode
2. DataDescription:
The metadata includes 45,000 movies listed in the Full MovieLens Dataset and movies
are released before July 2017. Cast, crew, plot keywords, budget, revenue, posters,
release dates, languages, production companies, countries, TMDB vote counts and vote
averages are in the dataset. The scale of ratings is 1–5 and obtained from the official
GroupLens website. The dataset is referred to from the Kaggle dataset.
3. RecommendingmoviesusingSVD
Singular value decomposition (SVD) is a collaborative filtering method for movie
recommendation. The aim for the code implementation is to provide users with movies’
recommendation from the latent features of item-user matrices. The code would show
you how to use the SVD latent factor model for matrix factorization.
ApplicationsinRecommenderSystems:
23
• Matrix Factorization: SVD is used to factorize the user-item interaction matrix
into lower-rank approximations, capturing latent factors that represent user
preferences and item characteristics.
• Collaborative Filtering: SVD is a key technique in collaborative filtering-
based recommender systems, where it helps in identifying latent relationships
between users and items.
• Handling Sparsity: SVD can handle sparse matrices effectively, providing a
way to impute missing values in theoriginalmatrix and improving the quality
of recommendations.
• Regularization Techniques: Regularized versions of SVD, such asRegularized
SVD, incorporate regularization terms to prevent overfitting and enhance the
generalization ability of the model.
24