FML
FML
A type of learning where the model is trained A type of learning where the model
Definition on a labeled dataset (input and correct finds patterns from data without labeled
output given). outputs.
✅ Definition of DBSCAN:
✅ Key Concepts:
Epsilon (ε): The maximum distance between two points to be considered neighbors.
MinPts: Minimum number of points required to form a dense region (cluster).
Border Point: A point that is within the ε-radius of a core point but has fewer than MinPts
neighbors.
Noise Point (Outlier): A point that is neither a core point nor a border point.
o If it has at least MinPts points within ε → mark as core point and start a new cluster.
o Then add all density-connected points (reachable through other core points).
✅ Illustration:
Example:
✅ Advantages of DBSCAN:
✅ Definition:
Association Rule Mining is a data mining technique used to find interesting relationships (patterns)
among items in large datasets.
It is mainly used in market basket analysis to understand what items are frequently bought together.
✅ Basic Terms:
Transaction: A record of items bought together (e.g., one customer's shopping cart).
✅ Purpose:
Which means: if a customer buys Bread and Butter, they are likely to buy Jam too.
✅ Important Metrics:
1. Support:
o Formula:
2. Confidence:
o Formula:
✅ Example:
1 Milk, Bread
4 Bread, Butter
Confidence = 3/4 = 0.75 (4 transactions have Milk, and in 3 of them Diaper is also present)
✅ 1. Support:
Definition:
Support tells us how frequently an item or itemset appears in the total transactions.
It helps identify frequent patterns in the data.
Formula:
Example:
Suppose there are 5 transactions:
TID Items Bought
1 Milk, Bread
4 Bread, Butter
Support=3/5=0.6 or 60%
✅ 2. Confidence:
Definition:
Confidence tells how often item B appears in transactions that contain item A.
It measures the strength of the rule.
Formula:
Example (continued):
Transactions containing Milk = 4
Transactions containing both Milk and Diaper = 3
Confidence=3/4=0.75 or 75%
So, in 75% of the cases where Milk is bought, Diaper is also bought.
✅ Definition:
The Apriori algorithm is a classic algorithm used for Association Rule Mining.
It is used to find frequent itemsets and then generate association rules from those itemsets.
5. From the frequent itemsets, generate strong association rules using minimum confidence.
✅ Example Dataset:
1 Milk, Bread
4 Bread, Butter
✅ Step-by-Step Apriori:
1-itemsets:
Milk → 4 times ✅
Bread → 4 times ✅
Diaper → 3 times ✅
Beer → 1 time ❌
Butter → 1 time ❌
2-itemsets:
Frequent 2-itemsets:
{Milk, Bread}, {Milk, Diaper}, {Bread, Diaper}
3-itemset:
✅ Final Output:
Frequent Itemsets:
Milk → Bread
✅ Definition of Clustering:
Clustering is an unsupervised learning technique in which data points are grouped into clusters
based on similarity.
The goal is to group similar data points together and separate dissimilar ones.
Each group is called a cluster, and data points in the same cluster have similar properties.
✅ Example:
If we have data about customer purchases, clustering can help identify customer groups such as:
1. Partitioning Methods:
2. Hierarchical Clustering:
o Two types:
3. Density-Based Methods:
o Example: DBSCAN
4. Grid-Based Methods:
5. Model-Based Clustering:
o Example: EM (Expectation-Maximization)
✅ Summary Table:
✅ How it Works:
1. K-Means
2. K-Medoids
Let’s say we have this dataset of students with their Math and Science scores:
A 85 90
B 82 88
C 30 40
Student Math Science
D 25 38
E 70 65
Steps in K-Means:
✅ Output:
Student E may fall into the cluster where it's closest based on average.
Easy to implement.
✅ Disadvantages:
Q6. Explain the K-Medoids algorithm and how it differs from K-Means.
✅ What is K-Medoids?
K-Medoids is a partitioning-based clustering algorithm similar to K-Means, but instead of using the
mean (centroid) of data points, it uses medoids (actual data points) as the center of a cluster.
✅ Definition of Medoid:
A medoid is the most centrally located point in a cluster (i.e., the point with the minimum total
distance to all other points in the same cluster).
2. Assign each data point to the nearest medoid using a distance metric (e.g., Manhattan or
Euclidean).
3. For each cluster, try replacing the medoid with a non-medoid point and check if the overall
cost (total distance) reduces.
✅ Example:
Resulting clusters are grouped around actual data points (2 and 11).
Output Clusters May include non-data centers Always data points as centers
✅ Advantages of K-Medoids:
Association Rules are a data mining technique used to find interesting patterns or relationships
between items in large datasets, especially in market basket analysis.
✅ Purpose:
Business decisions
Product recommendations
Cross-selling strategies
Example:
If a customer buys bread and butter → Then they also buy milk.
1. Support:
How often the itemset appears in the dataset.
o Example: Out of 5 times bread was bought, if 4 times milk was also bought,
Confidence = 4/5 = 0.8 (80%)
3. Lift:
How much more likely item B is purchased when item A is purchased.
✅ Applications:
✅ Summary Example:
1 Milk, Bread
3 Bread, Butter
The Apriori principle is a key concept in association rule mining. It helps in finding frequent itemsets
(combinations of items that appear together frequently) efficiently.
How it Works:
The Apriori algorithm works by first identifying individual items (1-itemsets) that appear
frequently in the dataset.
Then, it uses these frequent 1-itemsets to generate larger itemsets (2-itemsets, 3-itemsets,
etc.), filtering out itemsets that do not meet the minimum support threshold.
4. Repeat:
Continue the process for larger itemsets (3-itemsets, 4-itemsets) using the frequent itemsets
found in the previous step.
T2 Milk, Bread
T3 Milk, Butter
T4 Bread, Butter
1. Frequent 1-itemsets:
Efficiency: It reduces the search space by eliminating itemsets that are not frequent.
Combinatorial Explosion: As the size of the itemset increases, the number of candidate
itemsets grows exponentially.
Slow for Large Datasets: It can be computationally expensive for large datasets.
✅ Applications of Apriori:
Market Basket Analysis: Identifying products that are often bought together (e.g., bread and
butter).
Q8. Demonstrate the process of building association rules from a transaction dataset.
T2 Milk, Bread
T3 Milk, Butter
T4 Bread, Butter
Minimum Support: Defines how frequently an itemset should appear in the dataset to be
considered frequent. Let's assume Support ≥ 60% (at least 3 transactions).
Minimum Confidence: Defines how often an association rule holds true. Let's assume
Confidence ≥ 75%.
Frequent 1-itemsets:
Identify individual items that appear frequently in the dataset.
Frequent 2-itemsets:
Now, identify pairs of items that appear frequently together.
Now, using the frequent itemsets, we generate association rules that satisfy the minimum
confidence.
The Lift of a rule measures how much more likely the consequent is to occur, given the antecedent.
After calculating support, confidence, and lift, filter the association rules based on your thresholds.
Rules to keep:
{Milk} → {Bread}
{Bread} → {Milk}
{Bread} → {Butter}
{Butter} → {Bread}
These are the association rules that describe the relationships between items in the dataset.