0% found this document useful (0 votes)

1 views14 pages

Unit 2

The document provides an overview of Association Rules Mining and Recommendation Systems, detailing the concept of association rules, their importance in machine learning, and key parameters such as support, confidence, and lift. It discusses various algorithms used for mining association rules, including Apriori and FP-Growth, along with their advantages and disadvantages. Additionally, it highlights practical applications of association rules in fields like retail, e-commerce, and medical diagnosis.

Uploaded by

sumedhkamble1421

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views14 pages

Unit 2

Uploaded by

sumedhkamble1421

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Prof.

Sandip Eknathrao Ingle 9960575712

Unit No 2: Association Rules Mining and Recommendation Systems

[7 Hours]
What are Association Rules, Association Rule Parameters, Calculating Association Rule
Parameters, Recommendation Engines, Recommendation Engines working, Collaborative
Filtering, and Content Based Filtering.

What are Association Rules?

Association Rules are a machine learning technique used to discover interesting
relationships, patterns, and associations among items in large datasets. This method falls under
unsupervised learning because it identifies hidden patterns without pre-labeled outcomes.
The foundation of association rule mining was laid by Rakesh Agrawal, Tomasz Imieliński, and
Arun Swami in their landmark 1993 paper: "Mining Association Rules between Sets of Items in
Large Databases".
The primary purpose is to find rules that predict the occurrence of an item based on the
occurrences of other items in the dataset. Association rules are a fundamental concept in data
mining and knowledge discovery that aims to uncover interesting relationships or patterns between
items in large datasets, typically transactional databases. These patterns are often used to identify
associations between products, customer behavior, and preferences in a variety of fields such as
retail, marketing, healthcare, and web analytics.
The most common application of association rules is Market Basket Analysis (MBA),
where retailers try to discover products that customers frequently purchase together. For example,
a common association rule might be:
"If a customer buys bread, they are likely to buy butter."
Association rules are most often represented as if-then statements, which express the
relationship between two or more items.

Basic Structure
The goal is to find interesting relationships or patterns between items in large datasets.
Often used to answer:
➔ "When X happens, how likely is Y to happen?"
An association rule is usually written in the form:
X→Y
Where:
X is called the Antecedent (the "if" part)
Y is called the Consequent (the "then" part)
This rule reads as: "If X occurs, then Y is likely to occur."

Why Association Rules Are Important in Machine Learning

Unsupervised Learning: No labels or outcomes are given. The model finds patterns on its own.
Pattern Discovery: Helps in discovering which items/events occur together frequently.
Foundation for Recommender Systems: Used by Amazon, Netflix, and Flipkart to recommend
products or shows.

Hi-Tech Institute of Technology 55 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

Importance in Machine Learning

Association rule mining is vital because:
It helps discover patterns that can be used for decision-making.
It forms the base for many real-world recommendation systems.
It improves marketing strategies by analysing customer behaviour.
It is scalable to very large datasets, such as those used in retail, healthcare, and web analysis.

Advantages vs Disadvantages of Association Rules

Advantages Disadvantages
Simple and easy to understand Generates too many rules (many are trivial)
Works with unlabelled data (unsupervised) Computationally expensive on large datasets
Useful for market basket analysis Requires careful tuning of support & confidence
Scalable to large datasets Ignores the sequence/order of items
Improves cross-selling & business strategy Can produce redundant or overlapping rules
Foundation for recommendation systems Not suitable for numerical data without pre-
processing
Discovers hidden, non-obvious patterns Risk of spurious patterns (correlation ≠
causation)

Applications
1. Market Basket Analysis: Retail stores use association rules to find products that are frequently
bought together. Example: Milk and Bread, Chips and Soft Drinks.
2. E-commerce Recommendations: Websites like Amazon or Flipkart recommend products based
on association rules derived from previous user behavior.
3. Medical Diagnosis: Helps doctors find patterns in symptoms and diseases.
Example: Patients with symptom A and B are likely to have disease C.
4. Web Usage Mining: Predicts which webpages a user is likely to visit next based on browsing
history.
5. Fraud Detection: Detects unusual patterns of transactions that might indicate fraudulent
activity.
Association Rule Parameters
Key Terms Every Student Should Know
Term Meaning
Item A product or object (e.g., Milk, Bread)
Itemset A group of items bought together
Transaction A record of purchased items
Rule An if-then statement (e.g., X → Y)
Association Rule Mining is not just about finding rules; it's about finding strong and useful rules.
To measure the quality and strength of these rules, we use certain parameters:

1. Support
2. Confidence
3. Lift

These are the three key metrics used to measure the strength, usefulness, evaluate and filter
association rules.

Hi-Tech Institute of Technology 56 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

1. Support
Definition:
Support tells us how frequently an itemset appears in the dataset. It helps in identifying rules that
are relevant to a large number of transactions.
Formula:

Purpose:
 Helps to filter out infrequent and insignificant rules.
 High support = Rule is applicable to many transactions (more useful).
Example:
If 100 transactions are recorded and 20 include both Milk and Bread,
Support=20/100=0.20 (or 20%)
2. Confidence
Definition:
Confidence measures how often items in Y appear in transactions that contain X.
In simple terms: When X occurs, how likely is Y to also occur?
Formula:

Measures the likelihood of occurrence of the consequent given the antecedent.

It represents the strength of the implication rule.
Purpose:
 Indicates the strength or reliability of the rule.
 High confidence = Strong relationship between X and Y.
Example:
If 30 out of 100 transactions have Milk, and 20 out of those 30 also have Bread:
Confidence=20/30=0.66 (or 66%)
This means that when Milk is bought, there's a 66% chance Bread is also bought.
3. Lift
Definition:
Lift measures how much more likely X and Y occur together than expected if they were
independent.
Formula:

Measures how much more likely the antecedent and consequent are to occur together than if they
were independent.
Purpose:
 Shows the strength and significance of a rule beyond random chance.
 Helps to identify whether the rule is truly useful.
Interpretation:
A lift value greater than 1 indicates a positive association between X and Y.
 Lift > 1 ➔ Positive correlation (X and Y occur together more than by chance)
 Lift = 1 ➔ X and Y are independent (no real association)
 Lift < 1 ➔ Negative correlation (X and Y occur together less often than by chance)

Hi-Tech Institute of Technology 57 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

Example:
If Confidence is 0.66 and Support for Bread is 0.40:
Lift=0.66/0.40=1.65
This means Milk and Bread are 1.65 times more likely to be bought together than randomly.
Algorithms Used in Association Rule
1. Apriori Algorithm
The Apriori algorithm is one of the foundational algorithms for mining frequent itemsets and
association rules in large datasets. It is widely used in market basket analysis, where it helps to find
patterns in transaction data, like which products are frequently bought together.
 The Apriori Algorithm is one of the most popular algorithms used for mining association rules.
 It is based on the bottom-up approach, where frequent itemsets of length 1 are found first, and
then progressively longer itemsets are generated.

Working:
Apriori follows a bottom-up approach, where it starts with individual items and generates
progressively larger itemsets based on previously found frequent itemsets. It uses the property "Apriori
Property" which states that if an itemset is frequent, then all of its subsets must also be frequent.
Conversely, if an itemset is infrequent, all its supersets will also be infrequent.
1. Generate frequent itemsets: Start with itemsets of size 1 and progressively generate
larger itemsets by combining frequent itemsets of smaller sizes.
2. Prune itemsets: After generating the candidate itemsets, the algorithm uses a "pruning"
step to eliminate itemsets that do not meet the minimum support threshold.
3. Rule generation: From the frequent itemsets, the association rules are generated based
on the specified confidence threshold.

Algorithm
Step 1: Generate Candidate Itemsets
Step 2: Count the Support for Each Candidate Itemset
Step 3: Prune Infrequent Itemsets
Step 4: Generate New Candidate Itemsets
Step 5: Repeat Steps 2 to 4 for Larger Itemsets
Step 6: Generate Association Rules

Advantages:
Easy to understand and implement.
Efficient in finding frequent itemsets.

Disadvantages:
Computationally expensive for large datasets.
The need for multiple database scans (inefficient when working with large data).
Generates a large number of candidate itemsets.
2. FP-Growth Algorithm (Frequent Pattern Growth)
 The FP-Growth Algorithm is an improvement over the Apriori algorithm.
 It uses a tree structure called FP-tree (Frequent Pattern Tree) to store compressed
information about frequent itemsets.
 Unlike Apriori, FP-Growth does not generate candidate itemsets explicitly.

Hi-Tech Institute of Technology 58 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

3. Eclat Algorithm (Equivalence Class Transformation)

 The Eclat algorithm is another method for association rule mining, but instead of using
a breadth-first search like Apriori, it uses a depth-first search to find frequent
itemsets.
 It uses a vertical data format, where each itemset is represented as a list of transaction
IDs.

4. Direct Hashing and Pruning (DHP)

 The DHP algorithm is an improvement on the Apriori algorithm designed to reduce
candidate generation and improve efficiency.
 It uses a hash table to efficiently prune unpromising itemsets and generate candidates
for frequent itemsets.

5. AIS Algorithm (Agrawal, Imielinski, and Swami)

 The AIS algorithm is one of the first algorithms designed for association rule mining.
 It works by maintaining a frequent itemset list and uses it to find possible itemsets.

6. HAN (Hypertext Association Mining) Algorithm

 The HAN algorithm is designed for mining association rules from web data
(specifically, hyperlinks in the web).
 It works with hyperlinks and web page visits to find associations between web pages.

Algorithm Type of Approach Key Advantage Key Disadvantage

Apriori Breadth-first search Simple, intuitive, and Computationally expensive for
easy to understand large datasets
FP-Growth Depth-first search Fast and memory- Complex to implement and
with FP-tree efficient due to tree requires high memory usage
structure
Eclat Depth-first search Fast for sparse datasets Less efficient for dense datasets
with vertical data
format
DHP Hash-based pruning Reduces candidate Complex implementation and
generation, improving limited scalability
efficiency
AIS Brute-force approach Simple to implement Inefficient for large datasets
and understand
HAN Web data mining Tailored for mining Limited to web data, not general-
web data purpose

Hi-Tech Institute of Technology 59 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

Calculating Association Rule Parameters

Example:
Consider the following dataset with 5 transactions:
Transaction ID Items Purchased
T1 {Milk, Bread, Butter}
T2 {Milk, Bread}
T3 {Bread, Butter}
T4 {Milk, Bread, Butter}
T5 {Milk, Bread, Butter, Ice Cream}

Solution:
Given
Transaction ID Items Purchased
T1 {Milk, Bread, Butter}
T2 {Milk, Bread}
T3 {Bread, Butter}
T4 {Milk, Bread, Butter}
T5 {Milk, Bread, Butter, Ice Cream}

Items: {Milk, Bread, Butter, Ice Cream}

Define Support, Confidence, and Lift:
 Support: The frequency of an itemset appearing in the dataset.
 Confidence: The likelihood of finding item Y when item X is found.
 Lift: Measures the strength of the rule, considering the support of the items independently.
We will consider support and confidence thresholds (usually, support > 0.3, confidence > 0.5).
Support (for 1-itemsets) total transaction = 5
Item Frequency/count Support
Milk 4 4/5 = 0.8
Bread 5 5/5 = 1.0
Butter 4 4/5 = 0.8
Ice cream 1 1/5 = 0.2

Calculate the Support for individual items (1-itemsets). Support is defined as:

Total number of transactions = 5

Support for each 1-itemset:
 {Milk}: Appears in transactions T1, T2, T4, T5.
Support = 4/5=0.80
 {Bread}: Appears in transactions T1, T2, T3, T4, T5.
Support = 5/5=1.00
 {Butter}: Appears in transactions T1, T3, T4, T5.
Support = 4/5=0.80
 {Ice Cream}: Appears only in transaction T5.
Support = 1/5=0.20
Hi-Tech Institute of Technology 60 Dept of CSE (AIML)
Prof. Sandip Eknathrao Ingle 9960575712

Prune Infrequent 1-itemsets

Assume the minimum support threshold is 60% (0.60).
Discard {Ice Cream} as it has a support of 20%, which is below the threshold.
The frequent 1-itemsets are: {Milk}, {Bread}, {Butter}

Candidate 2-itemsets
The candidate 2-itemsets are: {Milk, Bread}, {Milk, Butter}, {Bread, Butter}
Calculate Support for 2-itemsets
Now, calculate the Support for each 2-itemset:
 {Milk, Bread}: Appears in transactions T1, T2, T4, T5.
Support = 4/5=0.80
 {Milk, Butter}: Appears in transactions T1, T4, T5.
Support = 3/5=0.60
 {Bread, Butter}: Appears in transactions T1, T3, T4, T5.
Support = 4/5=0.80
Pair Transactions it appears in Count Support
Milk, Bread T1, T2, T4, T5 4 4/5 = 0.8
Milk, Butter T1, T4, T5 3 3/5 = 0.6
Milk, Ice Cream T5 1 1/5 = 0.2
Bread, Butter T1, T3, T4, T5 4 4/5 = 0.8
Bread, Ice Cream T5 1 1/5 = 0.2
Butter, Ice Cream T5 1 1/5 = 0.2

Prune Infrequent 2-itemsets

We assume the minimum support threshold is still 60% (0.60).
{Milk, Bread} has a support of 80%, so it is frequent.
{Milk, Butter} has a support of 60%, so it is frequent.
{Bread, Butter} has a support of 80%, so it is frequent.
None of the 2-itemsets are pruned because all of them meet the minimum support threshold.
The frequent 2-itemsets are: {Milk, Bread}, {Milk, Butter}, {Bread, Butter}

Candidate 3-itemsets
The candidate 3-itemset is: {Milk, Bread, Butter}
Calculate Support for 3-itemset
Now, calculate the Support for the 3-itemset:
 {Milk, Bread, Butter}: Appears in transactions T1, T4, T5.
Support = 3/5=0.60
Triplet Transactions it appears in Count Support
Milk, Bread, Butter T1, T4, T5 3 3/5 = 0.6
Milk, Bread, Ice Cream T5 1 1/5 = 0.2
Milk, Butter, Ice Cream T5 1 1/5 = 0.2
Bread, Butter, Ice Cream T5 1 1/5 = 0.2
Prune Infrequent 3-itemsets
We assume the minimum support threshold is still 60% (0.60).
 {Milk, Bread, Butter} has a support of 60%, so it is frequent.
So, the frequent 3-itemsets is: {Milk, Bread, Butter}

Hi-Tech Institute of Technology 61 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

Generate Association Rules

Now, let's generate association rules from the frequent itemsets. We will calculate the confidence
for each rule. Confidence is calculated using the following formula:

,
Generate Association Rules from Frequent Itemsets
From 2-itemsets
{Milk, Bread}
 Rule 1: Milk → Bread
Confidence=0.8/0.8=1.0
Lift=1.0/1.0=1.0
Rule 2: Bread → Milk
Confidence=0.8/1.0=0.8 Lift=0.8/0.8=1.0
{Milk, Butter}
 Rule 3: Milk → Butter
Confidence=0.6/0.8=0.75
Lift=0.75/0.8=0.9375
 Rule 4: Butter → Milk
Confidence=0.6/0.8=0.75
Lift=0.75/0.8=0.9375
{Bread, Butter}
 Rule 5: Bread → Butter
Confidence=0.8/1.0=0.8
Lift=0.8/0.8=1.0
 Rule 6: Butter → Bread
Confidence=0.8/0.8=1.0
Lift=1.0/1.0=1.0

From 3-itemset {Milk, Bread, Butter}

 Rule 7: Milk & Bread → Butter
Confidence=0.6/0.8=0.75
Lift=0.75/0.8=0.9375

 Rule 8: Milk & Butter → Bread

Confidence=0.6/0.6=1.0
Lift=1.0/1.0=1.0
Rule 9: Bread & Butter → Milk
Confidence=0.6/0.8=0.75
Lift=0.75/0.8=0.9375

Rule Confidence Lift

Milk → Bread 1.0 1.0
Bread → Milk 0.8 1.0
Milk → Butter 0.75 0.9375
Butter → Milk 0.75 0.9375
Bread → Butter 0.8 1.0
Hi-Tech Institute of Technology 62 Dept of CSE (AIML)
Prof. Sandip Eknathrao Ingle 9960575712

Butter → Bread 1.0 1.0

Milk & Bread → Butter 0.75 0.9375
Milk & Butter → Bread 1.0 1.0
Bread & Butter → Milk 0.75 0.9375

Recommendation Engines
A recommendation engine is a system designed to suggest items or content to users based
on their preferences, behaviours, or past interactions. These engines are used across a variety of
platforms and industries, including e-commerce, media streaming, social networks, and more. The
core goal of a recommendation engine is to personalize the user experience by providing re A
recommendation engine is a system that gives customers recommendations based on their
behaviour patterns and similarities to people who might have shared preferences. These systems,
also known as recommenders, use statistical modelling, machine learning, and behavioural and
predictive analytics algorithms to personalize the web experience.
E-commerce companies, social media platforms and content-based websites frequently use
recommendation engines to generate product recommendations and relevant content that matches
the characteristics of a particular web visitor. They are also used to suggest products that
complement what a shopper has ordered. Search engines are a popular type of recommender, using
a searcher's query and personal data, such as their location and browsing history, to generate
relevant results. Levant suggestions that enhance user engagement and satisfaction.

Types of Recommendation Systems

Type Principle Example
Content-Based Recommends items similar to what Netflix suggesting similar movies
Filtering the user liked in the past
Collaborative Recommends based on the Amazon recommending based on
Filtering preferences of other users similar shoppers
Hybrid Systems Combine multiple approaches for YouTube recommendations
better results

Recommendation Engines working

1. Collaborative Filtering (CF)
 Idea:
Leverages the behaviour and opinions of other users.
 Two Main Types:
1. User-based Collaborative Filtering
 Find users similar to the current user and suggest what they liked.
2. Item-based Collaborative Filtering
 Find items similar to what the user liked.
 Techniques:
o Neighbourhood methods (k-Nearest Neighbours).
o Matrix Factorization (SVD, ALS).
o Deep Learning approaches.
 Advantages:
o Captures community taste.
o No need for item metadata.

Hi-Tech Institute of Technology 63 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

 Challenges:
o Cold start for new users/items.
o Sparsity problem (few interactions).
o Scalability for large datasets.
 Matrix Factorization breaks the big user-item matrix into 2 smaller matrices:
o One represents user preferences.
o One represents item features.

User-Item Matrix ≈ User Matrix (U)×Item Matrix (V)T

2 Content Based Filtering (CBF)

 Idea:
Focuses on the properties of items and user profiles.
 Method:
o Create item profiles (keywords, features).
o Create user profiles based on past interactions.
o Recommend items similar to items the user liked.
 Techniques:
o TF-IDF (Term Frequency–Inverse Document Frequency).
o Cosine Similarity between item vectors.
 Advantages:
o Works well with few users.
o Personalized for individuals.
 Challenges:
o Cold-start for new users/items.
o Over-specialization (suggesting similar things).

Example of CBF:
Recommend items similar to an item the user already bought, based on past transaction patterns
(item-item similarity).
Example:
Consider the following dataset with 5 transactions:
Transaction ID Items Purchased
T1 {Milk, Bread, Butter}
T2 {Milk, Bread}
T3 {Bread, Butter}
T4 {Milk, Bread, Butter}
T5 {Milk, Bread, Butter, Ice Cream}

Step 1: Build Item-Transaction Matrix

Transaction ID Milk Bread Butter Ice Cream
T1 1 1 1 0
T2 1 1 0 0
T3 0 1 1 0
T4 1 1 1 0
T5 1 1 1 1

Hi-Tech Institute of Technology 64 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

Step 2: Calculate Support (Item Frequency)

Item Support Count
Milk 4 (T1, T2, T4, T5)
Bread 5 (T1, T2, T3, T4, T5)
Butter 4 (T1, T3, T4, T5)
Ice Cream 1 (T5)

Compute Cosine Similarity between Items

Formula:

Similarity between Milk & Bread

 Co-occurrence count = 4 (T1, T2, T4, T5)
 Support Milk = 4
 Support Bread = 5

Similarity between Milk & Butter

 Co-occurrence count = 3 (T1, T4, T5)
 Support Milk = 4
 Support Butter = 4

Similarity between Milk & Ice Cream

 Co-occurrence count = 1 (T5)
 Support Milk = 4
 Support Ice Cream = 1

Similarity between Bread & Butter

 Co-occurrence count = 4 (T1, T3, T4, T5)
 Support Bread = 5
 Support Butter = 4

Similarity between Bread & Ice Cream

 Co-occurrence count = 1 (T5)
 Support Bread = 5
 Support Ice Cream = 1

Similarity between Butter & Ice Cream

 Co-occurrence count = 1 (T5)
 Support Butter = 4

Hi-Tech Institute of Technology 65 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

 Support Ice Cream = 1

Build the Similarity Matrix

Milk Bread Butter Ice Cream
Milk 1.0 0.894 0.75 0.5
Bread 0.894 1.0 0.894 0.447
Butter 0.75 0.894 1.0 0.5
Ice Cream 0.5 0.447 0.5 1.0
Make Recommendations
Let’s say a user bought Milk.
Recommend items with highest similarity to Milk.

Item Similarity with Milk

Bread 0.894 ✅ (Recommend!)
Butter 0.75 ✅ (Recommend!)
Ice Cream 0.5 ❌ (Lower similarity)

Content-Based Recommendations:
 If customer buys Milk, recommend:
o Bread (Similarity 0.894)
o Butter (Similarity 0.75)
 Avoid recommending Ice Cream (low similarity 0.5).

3. Hybrid Recommendation Systems

 Idea:
Combine multiple recommendation strategies to improve performance.
 Combination Methods:
o Weighted (combine scores).
o Switching (choose strategy based on context).
o Feature augmentation (use one model’s output as input to another).
 Examples:
Netflix combines collaborative and content-based methods.
 Advantages:
o Handles cold start and sparsity better.
o Improved accuracy.
Evaluation of Recommendation Systems
 Metrics:
o Precision and Recall
(How many recommended items are relevant?)
o F1-Score
(Balance between precision and recall).
o Mean Average Precision (MAP)
(Ranking quality).
o Normalized Discounted Cumulative Gain (NDCG)
(Graded relevance of recommendations).

Hi-Tech Institute of Technology 66 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

o Coverage
(Fraction of items recommended).
 Offline vs. Online Evaluation:
o Offline: Testing on historical data (train-test split).
o Online: A/B testing on live users.

Challenges in Recommendation Systems

 Cold Start Problem
o No past behavior for new users/items.
 Scalability
o Billions of users/items (e.g., Spotify, Netflix).
 Sparsity
o User-item interaction matrix is very sparse.
 Diversity vs. Accuracy
o Need to balance recommending what user wants vs. new exploration.

Difference between Recommendation systems

Feature Content-Based Collaborative Hybrid Knowledge-Based
Recommendation Recommendation Recommendation Recommendation
How It Suggests items Suggests items that Combines content- Suggests based on your
Works similar to what you similar users liked based and collaborative needs, not past behavior
liked before
Example Recommends more "People who bought Netflix uses both user Travel site suggests
action movies if you this also bought..." behavior & movie hotels based on budget
watched one features
Key Focus Item features User behavior Best of both for better Explicit user input
(genre, brand, price) (ratings, purchases) accuracy (filters, forms)
Data Item details + your Lots of user ratings Both item details + user User preferences (e.g.,
Required past interactions or purchases behavior budget, location)
Advantages Works well for new Finds surprising, High accuracy and No need for historical
users with some diverse suggestions balanced results user data
history
Disadvantage Limited if user Needs many users Complex to build and May not adapt well
s history is small and ratings (suffers maintain over time
(cold start) cold start)
When to Use When you have rich When you have lots When accuracy is When users have
item features and of user activity data critical (e.g., Netflix, specific needs (travel,
user clicks Amazon) real estate)

Hi-Tech Institute of Technology 67 Dept of CSE (AIML)

Prof. Sandip Eknathrao Ingle 9960575712

Question Bank
1. Define Association Rules. Give an example using items from a shopping basket.
2. What are the three main parameters used to evaluate Association Rules?
3. A rule says: {Milk} ⇒ {Bread}
Given: Support(Milk) = 60% , Support(Bread) = 70% , Support(Milk ∩ Bread) = 50%
Calculate the Confidence of the rule.
4. If the Confidence of {Milk} ⇒ {Bread} is 83%, and the Support(Bread) is 70%,
Calculate the Lift of the rule.
5. Explain the meaning of Support, Confidence, and Lift with an example in e-commerce.
6. What is a Recommendation Engine? List three industries where it's used.
7. Differentiate between Content-Based Filtering and Collaborative Filtering in tabular form
(write at least 6 points).
8. In Recommendation Engines, what does it mean when we say “personalization”?
9. Numerical:
Suppose a user rated the following movies:
Action Movie A: 5 stars
Action Movie B: 4.5 stars
Sic-fiction Movie C: 1 star
According to Content-Based Filtering, should we recommend another action movie or sci-fiction
movie? Why?
10. Explain how Collaborative Filtering uses "users who are similar" to make
recommendations
11. Given the following purchase data:
User Milk Bread Butter
U1 1 1 0
U2 1 0 1
U3 0 1 1
Which product would Collaborative Filtering recommend to U1?
12. Describe user-based Collaborative Filtering and item-based Collaborative Filtering.
13. Given user similarity scores: User A and User B similarity: 0.9
User A and User C similarity: 0.4
If User B likes Item X, should we recommend Item X to User A? Explain.
14. A user watches 5 Sci-Fi movies and rates them highly but gives low ratings to Drama
movies. According to Content-Based Filtering, what genre should be recommended next?
15. If the cosine similarity between a user profile and Movie A is 0.85 and with Movie B is
0.65, which movie should be recommended? Why?

Hi-Tech Institute of Technology 68 Dept of CSE (AIML)

Unit 4 Association Rule Mining
No ratings yet
Unit 4 Association Rule Mining
18 pages
Association Rule Mining
No ratings yet
Association Rule Mining
26 pages
Lab - Association Rule
No ratings yet
Lab - Association Rule
6 pages
Evolution of Hacking - Ronit Chakraborty
No ratings yet
Evolution of Hacking - Ronit Chakraborty
59 pages
Association Rule Learning
No ratings yet
Association Rule Learning
16 pages
Chapter 4 New
No ratings yet
Chapter 4 New
17 pages
Association Rules Overview
No ratings yet
Association Rules Overview
23 pages
ML Module3
No ratings yet
ML Module3
83 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
016-0171-559-C - Viper 4 and Viper 4 Plus Installation Manual
No ratings yet
016-0171-559-C - Viper 4 and Viper 4 Plus Installation Manual
34 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
Topic 03 - Mining Association Rules
No ratings yet
Topic 03 - Mining Association Rules
12 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
Eye in The Sky Real Time Drone Surveillance System
No ratings yet
Eye in The Sky Real Time Drone Surveillance System
7 pages
DM Unit-2
No ratings yet
DM Unit-2
22 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
S130 SDS v2.0
No ratings yet
S130 SDS v2.0
87 pages
231123 智能无线通信技术研究概况PPT 演说
No ratings yet
231123 智能无线通信技术研究概况PPT 演说
28 pages
Keywords
No ratings yet
Keywords
3 pages
CL-1208 CL-1216: User Manual
No ratings yet
CL-1208 CL-1216: User Manual
82 pages
Lec 2
No ratings yet
Lec 2
18 pages
Connecting To Zebra Scanners
No ratings yet
Connecting To Zebra Scanners
9 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
SagarRane (10 0) PDF
No ratings yet
SagarRane (10 0) PDF
7 pages
Index DSA
No ratings yet
Index DSA
2 pages
Keberhasilan Media Promosi Judi Online Dalam Menarik Minat Masyarakat-1
No ratings yet
Keberhasilan Media Promosi Judi Online Dalam Menarik Minat Masyarakat-1
10 pages
Association Rules
No ratings yet
Association Rules
24 pages
Unit-3 New
No ratings yet
Unit-3 New
75 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Unit 2
No ratings yet
Unit 2
23 pages
1-Blockchain Technology Adoption Barriers in The Indian Agricultural Supply
No ratings yet
1-Blockchain Technology Adoption Barriers in The Indian Agricultural Supply
15 pages
UNIT III
No ratings yet
UNIT III
13 pages
Falcon Outdoor OI
No ratings yet
Falcon Outdoor OI
78 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Importance of Association Rule Mining and Its Real-Time Applications
No ratings yet
Importance of Association Rule Mining and Its Real-Time Applications
28 pages
Lecture - 11 - Sathya - Zainab
No ratings yet
Lecture - 11 - Sathya - Zainab
17 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
Unit-II Association Rules
No ratings yet
Unit-II Association Rules
16 pages
2020 - Supervised Community Detection With Line Graph Neural Networks - Chen Et Al
No ratings yet
2020 - Supervised Community Detection With Line Graph Neural Networks - Chen Et Al
24 pages
Chapter 3
No ratings yet
Chapter 3
8 pages
Aml Unit 3
No ratings yet
Aml Unit 3
17 pages
Association and Recommendation System
No ratings yet
Association and Recommendation System
24 pages
Camara Horizontal DS-2CD1653G0-IZ HIKVISION
No ratings yet
Camara Horizontal DS-2CD1653G0-IZ HIKVISION
4 pages
Google: Google, LLC Is An American Multinational Technology Company
100% (1)
Google: Google, LLC Is An American Multinational Technology Company
45 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Unit - III
No ratings yet
Unit - III
27 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Classify: What Are Associations in Machine Learning?
No ratings yet
Classify: What Are Associations in Machine Learning?
5 pages
Artificial Intelligence: Machine Learning Algorithms Id3 Dbscan
No ratings yet
Artificial Intelligence: Machine Learning Algorithms Id3 Dbscan
30 pages
ISDN, B-ISDN, X.25, Frame-Relay, ATM Networks: A Telephony View of Convergence Architectures
No ratings yet
ISDN, B-ISDN, X.25, Frame-Relay, ATM Networks: A Telephony View of Convergence Architectures
158 pages
ACCTG 6910 Building Enterprise & Business Intelligence Systems (E.bis)
No ratings yet
ACCTG 6910 Building Enterprise & Business Intelligence Systems (E.bis)
26 pages
Contents
No ratings yet
Contents
59 pages
IMaster NCE Smart LCT V100R021C00 User Guide 01-C
No ratings yet
IMaster NCE Smart LCT V100R021C00 User Guide 01-C
59 pages
Iris Recognition With Off-the-Shelf CNN Features: A Deep Learning Perspective
No ratings yet
Iris Recognition With Off-the-Shelf CNN Features: A Deep Learning Perspective
8 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
DWDM Unit 2 and 3
No ratings yet
DWDM Unit 2 and 3
31 pages
Unit 3 Final
No ratings yet
Unit 3 Final
13 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Clickstream Analytics
No ratings yet
Clickstream Analytics
22 pages
E-Mail Requisition Form v1 - 3 (Opened)
No ratings yet
E-Mail Requisition Form v1 - 3 (Opened)
2 pages
Ariori Introduction and Concept
No ratings yet
Ariori Introduction and Concept
37 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Data Minig Unit 2nd
No ratings yet
Data Minig Unit 2nd
9 pages
Data Mining
No ratings yet
Data Mining
4 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Foundation
88% (25)
Foundation
19 pages
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
No ratings yet
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
27 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Technical Skills
No ratings yet
Technical Skills
5 pages
MTS 102 Module 5
No ratings yet
MTS 102 Module 5
8 pages
Computer Fundamentals
No ratings yet
Computer Fundamentals
10 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Data Structure Unit 1
No ratings yet
Data Structure Unit 1
49 pages
IFX Expo Limassol - Exhibitor Manual
No ratings yet
IFX Expo Limassol - Exhibitor Manual
19 pages
Jquery Validation
No ratings yet
Jquery Validation
2 pages
Data Mining: Association
No ratings yet
Data Mining: Association
41 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
Email Alerts On Whatsapp
No ratings yet
Email Alerts On Whatsapp
12 pages
Lecture 8
No ratings yet
Lecture 8
13 pages
X - AI - Question Bank2022
No ratings yet
X - AI - Question Bank2022
7 pages
Data Analytics. Fast Overview.
From Everand
Data Analytics. Fast Overview.
George Letton
2.5/5 (19)
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
From Everand
The Fundamentals of Machine Learning: Building Intelligent Systems from Data
Ethan Bennett
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 2

Uploaded by

Unit 2

Uploaded by

Prof.

Sandip Eknathrao Ingle 9960575712

Unit No 2: Association Rules Mining and Recommendation Systems

What are Association Rules?

Why Association Rules Are Important in Machine Learning

Hi-Tech Institute of Technology 55 Dept of CSE (AIML)

Importance in Machine Learning

Advantages vs Disadvantages of Association Rules

Hi-Tech Institute of Technology 56 Dept of CSE (AIML)

Measures the likelihood of occurrence of the consequent given the antecedent.

Hi-Tech Institute of Technology 57 Dept of CSE (AIML)

Hi-Tech Institute of Technology 58 Dept of CSE (AIML)

3. Eclat Algorithm (Equivalence Class Transformation)

4. Direct Hashing and Pruning (DHP)

5. AIS Algorithm (Agrawal, Imielinski, and Swami)

6. HAN (Hypertext Association Mining) Algorithm

Algorithm Type of Approach Key Advantage Key Disadvantage

Hi-Tech Institute of Technology 59 Dept of CSE (AIML)

Calculating Association Rule Parameters

Items: {Milk, Bread, Butter, Ice Cream}

Total number of transactions = 5

Prune Infrequent 1-itemsets

Prune Infrequent 2-itemsets

Hi-Tech Institute of Technology 61 Dept of CSE (AIML)

Generate Association Rules

From 3-itemset {Milk, Bread, Butter}

 Rule 8: Milk & Butter → Bread

Rule Confidence Lift

Butter → Bread 1.0 1.0

Types of Recommendation Systems

Recommendation Engines working

Hi-Tech Institute of Technology 63 Dept of CSE (AIML)

User-Item Matrix ≈ User Matrix (U)×Item Matrix (V)T

2 Content Based Filtering (CBF)

Step 1: Build Item-Transaction Matrix

Hi-Tech Institute of Technology 64 Dept of CSE (AIML)

Step 2: Calculate Support (Item Frequency)

Compute Cosine Similarity between Items

Similarity between Milk & Bread

Similarity between Milk & Butter

Similarity between Milk & Ice Cream

Similarity between Bread & Butter

Similarity between Bread & Ice Cream

Similarity between Butter & Ice Cream

Hi-Tech Institute of Technology 65 Dept of CSE (AIML)

 Support Ice Cream = 1

Build the Similarity Matrix

Item Similarity with Milk

3. Hybrid Recommendation Systems

Hi-Tech Institute of Technology 66 Dept of CSE (AIML)

Challenges in Recommendation Systems

Difference between Recommendation systems

Hi-Tech Institute of Technology 67 Dept of CSE (AIML)

Hi-Tech Institute of Technology 68 Dept of CSE (AIML)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.