0% found this document useful (0 votes)
1 views14 pages

Unit 2

The document provides an overview of Association Rules Mining and Recommendation Systems, detailing the concept of association rules, their importance in machine learning, and key parameters such as support, confidence, and lift. It discusses various algorithms used for mining association rules, including Apriori and FP-Growth, along with their advantages and disadvantages. Additionally, it highlights practical applications of association rules in fields like retail, e-commerce, and medical diagnosis.

Uploaded by

sumedhkamble1421
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views14 pages

Unit 2

The document provides an overview of Association Rules Mining and Recommendation Systems, detailing the concept of association rules, their importance in machine learning, and key parameters such as support, confidence, and lift. It discusses various algorithms used for mining association rules, including Apriori and FP-Growth, along with their advantages and disadvantages. Additionally, it highlights practical applications of association rules in fields like retail, e-commerce, and medical diagnosis.

Uploaded by

sumedhkamble1421
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Prof.

Sandip Eknathrao Ingle 9960575712

Unit No 2: Association Rules Mining and Recommendation Systems


[7 Hours]
What are Association Rules, Association Rule Parameters, Calculating Association Rule
Parameters, Recommendation Engines, Recommendation Engines working, Collaborative
Filtering, and Content Based Filtering.

What are Association Rules?


Association Rules are a machine learning technique used to discover interesting
relationships, patterns, and associations among items in large datasets. This method falls under
unsupervised learning because it identifies hidden patterns without pre-labeled outcomes.
The foundation of association rule mining was laid by Rakesh Agrawal, Tomasz Imieliński, and
Arun Swami in their landmark 1993 paper: "Mining Association Rules between Sets of Items in
Large Databases".
The primary purpose is to find rules that predict the occurrence of an item based on the
occurrences of other items in the dataset. Association rules are a fundamental concept in data
mining and knowledge discovery that aims to uncover interesting relationships or patterns between
items in large datasets, typically transactional databases. These patterns are often used to identify
associations between products, customer behavior, and preferences in a variety of fields such as
retail, marketing, healthcare, and web analytics.
The most common application of association rules is Market Basket Analysis (MBA),
where retailers try to discover products that customers frequently purchase together. For example,
a common association rule might be:
"If a customer buys bread, they are likely to buy butter."
Association rules are most often represented as if-then statements, which express the
relationship between two or more items.

Basic Structure
The goal is to find interesting relationships or patterns between items in large datasets.
Often used to answer:
➔ "When X happens, how likely is Y to happen?"
An association rule is usually written in the form:
X→Y
Where:
X is called the Antecedent (the "if" part)
Y is called the Consequent (the "then" part)
This rule reads as: "If X occurs, then Y is likely to occur."

Why Association Rules Are Important in Machine Learning


Unsupervised Learning: No labels or outcomes are given. The model finds patterns on its own.
Pattern Discovery: Helps in discovering which items/events occur together frequently.
Foundation for Recommender Systems: Used by Amazon, Netflix, and Flipkart to recommend
products or shows.

Hi-Tech Institute of Technology 55 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

Importance in Machine Learning


Association rule mining is vital because:
It helps discover patterns that can be used for decision-making.
It forms the base for many real-world recommendation systems.
It improves marketing strategies by analysing customer behaviour.
It is scalable to very large datasets, such as those used in retail, healthcare, and web analysis.

Advantages vs Disadvantages of Association Rules


Advantages Disadvantages
Simple and easy to understand Generates too many rules (many are trivial)
Works with unlabelled data (unsupervised) Computationally expensive on large datasets
Useful for market basket analysis Requires careful tuning of support & confidence
Scalable to large datasets Ignores the sequence/order of items
Improves cross-selling & business strategy Can produce redundant or overlapping rules
Foundation for recommendation systems Not suitable for numerical data without pre-
processing
Discovers hidden, non-obvious patterns Risk of spurious patterns (correlation ≠
causation)

Applications
1. Market Basket Analysis: Retail stores use association rules to find products that are frequently
bought together. Example: Milk and Bread, Chips and Soft Drinks.
2. E-commerce Recommendations: Websites like Amazon or Flipkart recommend products based
on association rules derived from previous user behavior.
3. Medical Diagnosis: Helps doctors find patterns in symptoms and diseases.
Example: Patients with symptom A and B are likely to have disease C.
4. Web Usage Mining: Predicts which webpages a user is likely to visit next based on browsing
history.
5. Fraud Detection: Detects unusual patterns of transactions that might indicate fraudulent
activity.
Association Rule Parameters
Key Terms Every Student Should Know
Term Meaning
Item A product or object (e.g., Milk, Bread)
Itemset A group of items bought together
Transaction A record of purchased items
Rule An if-then statement (e.g., X → Y)
Association Rule Mining is not just about finding rules; it's about finding strong and useful rules.
To measure the quality and strength of these rules, we use certain parameters:

1. Support
2. Confidence
3. Lift

These are the three key metrics used to measure the strength, usefulness, evaluate and filter
association rules.

Hi-Tech Institute of Technology 56 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

1. Support
Definition:
Support tells us how frequently an itemset appears in the dataset. It helps in identifying rules that
are relevant to a large number of transactions.
Formula:

Purpose:
 Helps to filter out infrequent and insignificant rules.
 High support = Rule is applicable to many transactions (more useful).
Example:
If 100 transactions are recorded and 20 include both Milk and Bread,
Support=20/100=0.20 (or 20%)
2. Confidence
Definition:
Confidence measures how often items in Y appear in transactions that contain X.
In simple terms: When X occurs, how likely is Y to also occur?
Formula:

Measures the likelihood of occurrence of the consequent given the antecedent.


It represents the strength of the implication rule.
Purpose:
 Indicates the strength or reliability of the rule.
 High confidence = Strong relationship between X and Y.
Example:
If 30 out of 100 transactions have Milk, and 20 out of those 30 also have Bread:
Confidence=20/30=0.66 (or 66%)
This means that when Milk is bought, there's a 66% chance Bread is also bought.
3. Lift
Definition:
Lift measures how much more likely X and Y occur together than expected if they were
independent.
Formula:

Measures how much more likely the antecedent and consequent are to occur together than if they
were independent.
Purpose:
 Shows the strength and significance of a rule beyond random chance.
 Helps to identify whether the rule is truly useful.
Interpretation:
A lift value greater than 1 indicates a positive association between X and Y.
 Lift > 1 ➔ Positive correlation (X and Y occur together more than by chance)
 Lift = 1 ➔ X and Y are independent (no real association)
 Lift < 1 ➔ Negative correlation (X and Y occur together less often than by chance)

Hi-Tech Institute of Technology 57 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

Example:
If Confidence is 0.66 and Support for Bread is 0.40:
Lift=0.66/0.40=1.65
This means Milk and Bread are 1.65 times more likely to be bought together than randomly.
Algorithms Used in Association Rule
1. Apriori Algorithm
The Apriori algorithm is one of the foundational algorithms for mining frequent itemsets and
association rules in large datasets. It is widely used in market basket analysis, where it helps to find
patterns in transaction data, like which products are frequently bought together.
 The Apriori Algorithm is one of the most popular algorithms used for mining association rules.
 It is based on the bottom-up approach, where frequent itemsets of length 1 are found first, and
then progressively longer itemsets are generated.

Working:
Apriori follows a bottom-up approach, where it starts with individual items and generates
progressively larger itemsets based on previously found frequent itemsets. It uses the property "Apriori
Property" which states that if an itemset is frequent, then all of its subsets must also be frequent.
Conversely, if an itemset is infrequent, all its supersets will also be infrequent.
1. Generate frequent itemsets: Start with itemsets of size 1 and progressively generate
larger itemsets by combining frequent itemsets of smaller sizes.
2. Prune itemsets: After generating the candidate itemsets, the algorithm uses a "pruning"
step to eliminate itemsets that do not meet the minimum support threshold.
3. Rule generation: From the frequent itemsets, the association rules are generated based
on the specified confidence threshold.

Algorithm
Step 1: Generate Candidate Itemsets
Step 2: Count the Support for Each Candidate Itemset
Step 3: Prune Infrequent Itemsets
Step 4: Generate New Candidate Itemsets
Step 5: Repeat Steps 2 to 4 for Larger Itemsets
Step 6: Generate Association Rules

Advantages:
Easy to understand and implement.
Efficient in finding frequent itemsets.

Disadvantages:
Computationally expensive for large datasets.
The need for multiple database scans (inefficient when working with large data).
Generates a large number of candidate itemsets.
2. FP-Growth Algorithm (Frequent Pattern Growth)
 The FP-Growth Algorithm is an improvement over the Apriori algorithm.
 It uses a tree structure called FP-tree (Frequent Pattern Tree) to store compressed
information about frequent itemsets.
 Unlike Apriori, FP-Growth does not generate candidate itemsets explicitly.

Hi-Tech Institute of Technology 58 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

3. Eclat Algorithm (Equivalence Class Transformation)


 The Eclat algorithm is another method for association rule mining, but instead of using
a breadth-first search like Apriori, it uses a depth-first search to find frequent
itemsets.
 It uses a vertical data format, where each itemset is represented as a list of transaction
IDs.

4. Direct Hashing and Pruning (DHP)


 The DHP algorithm is an improvement on the Apriori algorithm designed to reduce
candidate generation and improve efficiency.
 It uses a hash table to efficiently prune unpromising itemsets and generate candidates
for frequent itemsets.

5. AIS Algorithm (Agrawal, Imielinski, and Swami)


 The AIS algorithm is one of the first algorithms designed for association rule mining.
 It works by maintaining a frequent itemset list and uses it to find possible itemsets.

6. HAN (Hypertext Association Mining) Algorithm


 The HAN algorithm is designed for mining association rules from web data
(specifically, hyperlinks in the web).
 It works with hyperlinks and web page visits to find associations between web pages.

Algorithm Type of Approach Key Advantage Key Disadvantage


Apriori Breadth-first search Simple, intuitive, and Computationally expensive for
easy to understand large datasets
FP-Growth Depth-first search Fast and memory- Complex to implement and
with FP-tree efficient due to tree requires high memory usage
structure
Eclat Depth-first search Fast for sparse datasets Less efficient for dense datasets
with vertical data
format
DHP Hash-based pruning Reduces candidate Complex implementation and
generation, improving limited scalability
efficiency
AIS Brute-force approach Simple to implement Inefficient for large datasets
and understand
HAN Web data mining Tailored for mining Limited to web data, not general-
web data purpose

Hi-Tech Institute of Technology 59 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

Calculating Association Rule Parameters


Example:
Consider the following dataset with 5 transactions:
Transaction ID Items Purchased
T1 {Milk, Bread, Butter}
T2 {Milk, Bread}
T3 {Bread, Butter}
T4 {Milk, Bread, Butter}
T5 {Milk, Bread, Butter, Ice Cream}

Solution:
Given
Transaction ID Items Purchased
T1 {Milk, Bread, Butter}
T2 {Milk, Bread}
T3 {Bread, Butter}
T4 {Milk, Bread, Butter}
T5 {Milk, Bread, Butter, Ice Cream}

Items: {Milk, Bread, Butter, Ice Cream}


Define Support, Confidence, and Lift:
 Support: The frequency of an itemset appearing in the dataset.
 Confidence: The likelihood of finding item Y when item X is found.
 Lift: Measures the strength of the rule, considering the support of the items independently.
We will consider support and confidence thresholds (usually, support > 0.3, confidence > 0.5).
Support (for 1-itemsets) total transaction = 5
Item Frequency/count Support
Milk 4 4/5 = 0.8
Bread 5 5/5 = 1.0
Butter 4 4/5 = 0.8
Ice cream 1 1/5 = 0.2

Calculate the Support for individual items (1-itemsets). Support is defined as:

Total number of transactions = 5


Support for each 1-itemset:
 {Milk}: Appears in transactions T1, T2, T4, T5.
Support = 4/5=0.80
 {Bread}: Appears in transactions T1, T2, T3, T4, T5.
Support = 5/5=1.00
 {Butter}: Appears in transactions T1, T3, T4, T5.
Support = 4/5=0.80
 {Ice Cream}: Appears only in transaction T5.
Support = 1/5=0.20
Hi-Tech Institute of Technology 60 Dept of CSE (AIML)
Prof. Sandip Eknathrao Ingle 9960575712

Prune Infrequent 1-itemsets


Assume the minimum support threshold is 60% (0.60).
Discard {Ice Cream} as it has a support of 20%, which is below the threshold.
The frequent 1-itemsets are: {Milk}, {Bread}, {Butter}

Candidate 2-itemsets
The candidate 2-itemsets are: {Milk, Bread}, {Milk, Butter}, {Bread, Butter}
Calculate Support for 2-itemsets
Now, calculate the Support for each 2-itemset:
 {Milk, Bread}: Appears in transactions T1, T2, T4, T5.
Support = 4/5=0.80
 {Milk, Butter}: Appears in transactions T1, T4, T5.
Support = 3/5=0.60
 {Bread, Butter}: Appears in transactions T1, T3, T4, T5.
Support = 4/5=0.80
Pair Transactions it appears in Count Support
Milk, Bread T1, T2, T4, T5 4 4/5 = 0.8
Milk, Butter T1, T4, T5 3 3/5 = 0.6
Milk, Ice Cream T5 1 1/5 = 0.2
Bread, Butter T1, T3, T4, T5 4 4/5 = 0.8
Bread, Ice Cream T5 1 1/5 = 0.2
Butter, Ice Cream T5 1 1/5 = 0.2

Prune Infrequent 2-itemsets


We assume the minimum support threshold is still 60% (0.60).
{Milk, Bread} has a support of 80%, so it is frequent.
{Milk, Butter} has a support of 60%, so it is frequent.
{Bread, Butter} has a support of 80%, so it is frequent.
None of the 2-itemsets are pruned because all of them meet the minimum support threshold.
The frequent 2-itemsets are: {Milk, Bread}, {Milk, Butter}, {Bread, Butter}

Candidate 3-itemsets
The candidate 3-itemset is: {Milk, Bread, Butter}
Calculate Support for 3-itemset
Now, calculate the Support for the 3-itemset:
 {Milk, Bread, Butter}: Appears in transactions T1, T4, T5.
Support = 3/5=0.60
Triplet Transactions it appears in Count Support
Milk, Bread, Butter T1, T4, T5 3 3/5 = 0.6
Milk, Bread, Ice Cream T5 1 1/5 = 0.2
Milk, Butter, Ice Cream T5 1 1/5 = 0.2
Bread, Butter, Ice Cream T5 1 1/5 = 0.2
Prune Infrequent 3-itemsets
We assume the minimum support threshold is still 60% (0.60).
 {Milk, Bread, Butter} has a support of 60%, so it is frequent.
So, the frequent 3-itemsets is: {Milk, Bread, Butter}

Hi-Tech Institute of Technology 61 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

Generate Association Rules


Now, let's generate association rules from the frequent itemsets. We will calculate the confidence
for each rule. Confidence is calculated using the following formula:

,
Generate Association Rules from Frequent Itemsets
From 2-itemsets
{Milk, Bread}
 Rule 1: Milk → Bread
Confidence=0.8/0.8=1.0
Lift=1.0/1.0=1.0
Rule 2: Bread → Milk
Confidence=0.8/1.0=0.8 Lift=0.8/0.8=1.0
{Milk, Butter}
 Rule 3: Milk → Butter
Confidence=0.6/0.8=0.75
Lift=0.75/0.8=0.9375
 Rule 4: Butter → Milk
Confidence=0.6/0.8=0.75
Lift=0.75/0.8=0.9375
{Bread, Butter}
 Rule 5: Bread → Butter
Confidence=0.8/1.0=0.8
Lift=0.8/0.8=1.0
 Rule 6: Butter → Bread
Confidence=0.8/0.8=1.0
Lift=1.0/1.0=1.0

From 3-itemset {Milk, Bread, Butter}


 Rule 7: Milk & Bread → Butter
Confidence=0.6/0.8=0.75
Lift=0.75/0.8=0.9375

 Rule 8: Milk & Butter → Bread


Confidence=0.6/0.6=1.0
Lift=1.0/1.0=1.0
Rule 9: Bread & Butter → Milk
Confidence=0.6/0.8=0.75
Lift=0.75/0.8=0.9375

Rule Confidence Lift


Milk → Bread 1.0 1.0
Bread → Milk 0.8 1.0
Milk → Butter 0.75 0.9375
Butter → Milk 0.75 0.9375
Bread → Butter 0.8 1.0
Hi-Tech Institute of Technology 62 Dept of CSE (AIML)
Prof. Sandip Eknathrao Ingle 9960575712

Butter → Bread 1.0 1.0


Milk & Bread → Butter 0.75 0.9375
Milk & Butter → Bread 1.0 1.0
Bread & Butter → Milk 0.75 0.9375

Recommendation Engines
A recommendation engine is a system designed to suggest items or content to users based
on their preferences, behaviours, or past interactions. These engines are used across a variety of
platforms and industries, including e-commerce, media streaming, social networks, and more. The
core goal of a recommendation engine is to personalize the user experience by providing re A
recommendation engine is a system that gives customers recommendations based on their
behaviour patterns and similarities to people who might have shared preferences. These systems,
also known as recommenders, use statistical modelling, machine learning, and behavioural and
predictive analytics algorithms to personalize the web experience.
E-commerce companies, social media platforms and content-based websites frequently use
recommendation engines to generate product recommendations and relevant content that matches
the characteristics of a particular web visitor. They are also used to suggest products that
complement what a shopper has ordered. Search engines are a popular type of recommender, using
a searcher's query and personal data, such as their location and browsing history, to generate
relevant results. Levant suggestions that enhance user engagement and satisfaction.

Types of Recommendation Systems


Type Principle Example
Content-Based Recommends items similar to what Netflix suggesting similar movies
Filtering the user liked in the past
Collaborative Recommends based on the Amazon recommending based on
Filtering preferences of other users similar shoppers
Hybrid Systems Combine multiple approaches for YouTube recommendations
better results

Recommendation Engines working


1. Collaborative Filtering (CF)
 Idea:
Leverages the behaviour and opinions of other users.
 Two Main Types:
1. User-based Collaborative Filtering
 Find users similar to the current user and suggest what they liked.
2. Item-based Collaborative Filtering
 Find items similar to what the user liked.
 Techniques:
o Neighbourhood methods (k-Nearest Neighbours).
o Matrix Factorization (SVD, ALS).
o Deep Learning approaches.
 Advantages:
o Captures community taste.
o No need for item metadata.

Hi-Tech Institute of Technology 63 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

 Challenges:
o Cold start for new users/items.
o Sparsity problem (few interactions).
o Scalability for large datasets.
 Matrix Factorization breaks the big user-item matrix into 2 smaller matrices:
o One represents user preferences.
o One represents item features.

User-Item Matrix ≈ User Matrix (U)×Item Matrix (V)T

2 Content Based Filtering (CBF)


 Idea:
Focuses on the properties of items and user profiles.
 Method:
o Create item profiles (keywords, features).
o Create user profiles based on past interactions.
o Recommend items similar to items the user liked.
 Techniques:
o TF-IDF (Term Frequency–Inverse Document Frequency).
o Cosine Similarity between item vectors.
 Advantages:
o Works well with few users.
o Personalized for individuals.
 Challenges:
o Cold-start for new users/items.
o Over-specialization (suggesting similar things).

Example of CBF:
Recommend items similar to an item the user already bought, based on past transaction patterns
(item-item similarity).
Example:
Consider the following dataset with 5 transactions:
Transaction ID Items Purchased
T1 {Milk, Bread, Butter}
T2 {Milk, Bread}
T3 {Bread, Butter}
T4 {Milk, Bread, Butter}
T5 {Milk, Bread, Butter, Ice Cream}

Step 1: Build Item-Transaction Matrix


Transaction ID Milk Bread Butter Ice Cream
T1 1 1 1 0
T2 1 1 0 0
T3 0 1 1 0
T4 1 1 1 0
T5 1 1 1 1

Hi-Tech Institute of Technology 64 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

Step 2: Calculate Support (Item Frequency)


Item Support Count
Milk 4 (T1, T2, T4, T5)
Bread 5 (T1, T2, T3, T4, T5)
Butter 4 (T1, T3, T4, T5)
Ice Cream 1 (T5)

Compute Cosine Similarity between Items


Formula:

Similarity between Milk & Bread


 Co-occurrence count = 4 (T1, T2, T4, T5)
 Support Milk = 4
 Support Bread = 5

Similarity between Milk & Butter


 Co-occurrence count = 3 (T1, T4, T5)
 Support Milk = 4
 Support Butter = 4

Similarity between Milk & Ice Cream


 Co-occurrence count = 1 (T5)
 Support Milk = 4
 Support Ice Cream = 1

Similarity between Bread & Butter


 Co-occurrence count = 4 (T1, T3, T4, T5)
 Support Bread = 5
 Support Butter = 4

Similarity between Bread & Ice Cream


 Co-occurrence count = 1 (T5)
 Support Bread = 5
 Support Ice Cream = 1

Similarity between Butter & Ice Cream


 Co-occurrence count = 1 (T5)
 Support Butter = 4

Hi-Tech Institute of Technology 65 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

 Support Ice Cream = 1

Build the Similarity Matrix


Milk Bread Butter Ice Cream
Milk 1.0 0.894 0.75 0.5
Bread 0.894 1.0 0.894 0.447
Butter 0.75 0.894 1.0 0.5
Ice Cream 0.5 0.447 0.5 1.0
Make Recommendations
Let’s say a user bought Milk.
Recommend items with highest similarity to Milk.

Item Similarity with Milk


Bread 0.894 ✅ (Recommend!)
Butter 0.75 ✅ (Recommend!)
Ice Cream 0.5 ❌ (Lower similarity)

Content-Based Recommendations:
 If customer buys Milk, recommend:
o Bread (Similarity 0.894)
o Butter (Similarity 0.75)
 Avoid recommending Ice Cream (low similarity 0.5).

3. Hybrid Recommendation Systems


 Idea:
Combine multiple recommendation strategies to improve performance.
 Combination Methods:
o Weighted (combine scores).
o Switching (choose strategy based on context).
o Feature augmentation (use one model’s output as input to another).
 Examples:
Netflix combines collaborative and content-based methods.
 Advantages:
o Handles cold start and sparsity better.
o Improved accuracy.
Evaluation of Recommendation Systems
 Metrics:
o Precision and Recall
(How many recommended items are relevant?)
o F1-Score
(Balance between precision and recall).
o Mean Average Precision (MAP)
(Ranking quality).
o Normalized Discounted Cumulative Gain (NDCG)
(Graded relevance of recommendations).

Hi-Tech Institute of Technology 66 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

o Coverage
(Fraction of items recommended).
 Offline vs. Online Evaluation:
o Offline: Testing on historical data (train-test split).
o Online: A/B testing on live users.

Challenges in Recommendation Systems


 Cold Start Problem
o No past behavior for new users/items.
 Scalability
o Billions of users/items (e.g., Spotify, Netflix).
 Sparsity
o User-item interaction matrix is very sparse.
 Diversity vs. Accuracy
o Need to balance recommending what user wants vs. new exploration.

Difference between Recommendation systems


Feature Content-Based Collaborative Hybrid Knowledge-Based
Recommendation Recommendation Recommendation Recommendation
How It Suggests items Suggests items that Combines content- Suggests based on your
Works similar to what you similar users liked based and collaborative needs, not past behavior
liked before
Example Recommends more "People who bought Netflix uses both user Travel site suggests
action movies if you this also bought..." behavior & movie hotels based on budget
watched one features
Key Focus Item features User behavior Best of both for better Explicit user input
(genre, brand, price) (ratings, purchases) accuracy (filters, forms)
Data Item details + your Lots of user ratings Both item details + user User preferences (e.g.,
Required past interactions or purchases behavior budget, location)
Advantages Works well for new Finds surprising, High accuracy and No need for historical
users with some diverse suggestions balanced results user data
history
Disadvantage Limited if user Needs many users Complex to build and May not adapt well
s history is small and ratings (suffers maintain over time
(cold start) cold start)
When to Use When you have rich When you have lots When accuracy is When users have
item features and of user activity data critical (e.g., Netflix, specific needs (travel,
user clicks Amazon) real estate)

Hi-Tech Institute of Technology 67 Dept of CSE (AIML)


Prof. Sandip Eknathrao Ingle 9960575712

Question Bank
1. Define Association Rules. Give an example using items from a shopping basket.
2. What are the three main parameters used to evaluate Association Rules?
3. A rule says: {Milk} ⇒ {Bread}
Given: Support(Milk) = 60% , Support(Bread) = 70% , Support(Milk ∩ Bread) = 50%
Calculate the Confidence of the rule.
4. If the Confidence of {Milk} ⇒ {Bread} is 83%, and the Support(Bread) is 70%,
Calculate the Lift of the rule.
5. Explain the meaning of Support, Confidence, and Lift with an example in e-commerce.
6. What is a Recommendation Engine? List three industries where it's used.
7. Differentiate between Content-Based Filtering and Collaborative Filtering in tabular form
(write at least 6 points).
8. In Recommendation Engines, what does it mean when we say “personalization”?
9. Numerical:
Suppose a user rated the following movies:
Action Movie A: 5 stars
Action Movie B: 4.5 stars
Sic-fiction Movie C: 1 star
According to Content-Based Filtering, should we recommend another action movie or sci-fiction
movie? Why?
10. Explain how Collaborative Filtering uses "users who are similar" to make
recommendations
11. Given the following purchase data:
User Milk Bread Butter
U1 1 1 0
U2 1 0 1
U3 0 1 1
Which product would Collaborative Filtering recommend to U1?
12. Describe user-based Collaborative Filtering and item-based Collaborative Filtering.
13. Given user similarity scores: User A and User B similarity: 0.9
User A and User C similarity: 0.4
If User B likes Item X, should we recommend Item X to User A? Explain.
14. A user watches 5 Sci-Fi movies and rates them highly but gives low ratings to Drama
movies. According to Content-Based Filtering, what genre should be recommended next?
15. If the cosine similarity between a user profile and Movie A is 0.85 and with Movie B is
0.65, which movie should be recommended? Why?

Hi-Tech Institute of Technology 68 Dept of CSE (AIML)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy