Unit 2
Unit 2
The association rule learning is one of the very important concepts of machine learning, and it is
employed in Market Basket analysis, Web usage mining, continuous production, etc. Here market
basket analysis is a technique used by the various big retailer to discover the associations between
items. We can understand it by taking an example of a supermarket, as in a supermarket, all products
that are purchased together are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these
products are stored within a shelf or mostly nearby. Consider the below diagram:
1. Apriori
2. Eclat
Here the If element is called antecedent, and then statement is called as Consequent. These types of
relationships where we can find out some association or relation between two items is known as single
cardinality. It is all about creating rules, and if the number of items increases, then cardinality also
increases accordingly. So, to measure the associations between thousands of data items, there are
several metrics. These metrics are given below:
o Support
o Confidence
o Lift
Support
Support is the frequency of A or how frequently an item appears in the dataset. It is defined as the
fraction of the transaction T that contains the itemset X. If there are X datasets, then for transactions T, it
can be written as:
Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items X and Y
occur together in the dataset when the occurrence of X is already given. It is the ratio of the transaction
that contains X and Y to the number of records that contain X.
Lift
It is the ratio of the observed support measure and expected support if X and Y are independent of each
other. It has three possible values:
o Lift>1: It determines the degree to which the two itemsets are dependent to each other.
o Lift<1: It tells us that one item is a substitute for other items, which means one item has a
negative effect on another.
Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed to work on the
databases that contain transactions. This algorithm uses a breadth-first search and Hash Tree to
calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to understand the products that can be bought
together. It can also be used in the healthcare field to find drug reactions for patients.
Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-first search
technique to find frequent itemsets in a transaction database. It performs faster execution than Apriori
Algorithm.
The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the Apriori
Algorithm. It represents the database in the form of a tree structure that is known as a frequent pattern
or tree. The purpose of this frequent tree is to extract the most frequent patterns.
It has various applications in machine learning and data mining. Below are some popular applications of
association rule learning:
o Market Basket Analysis: It is one of the popular examples and applications of association rule
mining. This technique is commonly used by big retailers to determine the association between
items.
o Medical Diagnosis: With the help of association rules, patients can be cured easily, as it helps in
identifying the probability of illness for a particular disease.
o Protein Sequence: The association rules help in determining the synthesis of artificial Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many more other
applications.
In this article, we will discuss concepts of Multilevel Association Rule mining and its algorithms,
applications, and challenges.
Data mining is the process of extracting hidden patterns from large data sets. One of the fundamental
techniques in data mining is association rule mining. To identify relationships between items in a
dataset, Association rule mining is used. These relationships can then be used to make predictions about
future occurrences of those items.
Multilevel Association Rule mining is an extension of Association Rule mining. Multilevel Association
Rule mining is a powerful tool that can be used to discover patterns and trends.
Association rule mining is used to discover relationships between items in a dataset. An association rule
is a statement of the form "If A, then B," where A and B are sets of items. The strength of an association
rule is measured using two measures: support and confidence. Support measures the frequency of the
occurrence of the items in the rule, and confidence measures the reliability of the rule.
Apriori algorithm is a popular algorithm for mining association rules. It is an iterative algorithm that
works by generating candidate itemsets and pruning those that do not meet the support and confidence
thresholds.
Multilevel Association Rule mining is a technique that extends Association Rule mining to discover
relationships between items at different levels of granularity. Multilevel Association Rule mining can be
classified into two types: multi-dimensional Association Rule and multi-level Association Rule.
This is used to find relationships between items in different dimensions of a dataset. For example, in a
sales dataset, multi-dimensional Association Rule mining can be used to find relationships between
products, regions, and time.
Multidimensional rule mining is important because data at lower levels may not exhibit any meaningful
patterns, yet it can contain valuable insights. The goal is to find such hidden information within and
across levels of abstraction.
There are several algorithms for Multilevel Association Rule mining, including partition-based,
agglomerative, and hybrid approaches.
Partition-based algorithms divide the data into partitions based on some criteria, such as the level of
granularity, and then mine Association Rules within each partition. Agglomerative algorithms start with
the smallest itemsets and then gradually merge them into larger itemsets, until a set of rules is obtained.
Hybrid algorithms combine the strengths of partition-based and agglomerative approaches.
Multilevel Association Rule mining has different approaches to finding relationships between items at
different levels of granularity. There are three approaches: Uniform Support, Reduced Support, and
Group-based Support. These are explained as follows below in brief.
where only one minimum support threshold is used for all levels. This approach is simple but may miss
meaningful associations at low levels.
where the minimum support threshold is lowered at lower levels to avoid missing important
associations. This approach uses different search techniques, such as Level-by-Level independence and
Level-cross separating by single item or K-itemset.
where the user or expert sets the support and confidence threshold based on a specific group or
product category.
For example, if an expert wants to study the purchase patterns of laptops and clothes in the non-
electronic category, a low support threshold can be set for this group to give attention to these items'
purchase patterns.
Multilevel Association Rule mining helps retailers gain insights into customer buying behavior and
preferences, optimize product placement and pricing, and improve supply chain management.
Healthcare Management
Multilevel Association Rule mining helps healthcare providers identify patterns in patient behavior,
diagnose diseases, identify high-risk patients, and optimize treatment plans.
Fraud Detection
Multilevel Association Rule mining helps companies identify fraudulent patterns, detect anomalies, and
prevent fraud in various industries such as finance, insurance, and telecommunications.
Multilevel Association Rule mining helps web-based companies gain insights into user preferences,
optimize website design and layout, and personalize content for individual users by analyzing data at
different levels of abstraction.
Multilevel Association Rule mining helps social network providers identify influential users, detect
communities, and optimize network structure and design by analyzing social network data at different
levels of abstraction.
Multilevel Association Rule mining poses several challenges, including high dimensionality, large data set
size, and scalability issues.
High dimensionality
It is the problem of dealing with data sets that have a large number of attributes.
It is the problem of dealing with data sets that have a large number of records.
Scalability
It is the problem of dealing with data sets that are too large to fit into memory.
RBF
In a mathematical context, RBF is a real valued function that we use to
calculate distance between a variable with respect to a reference point.
In a network context, an RBF network is an artificial neural network in
which we use the radial basis function as the activation function of
neurons.
2.1. Definition
RBF is a mathematical function, say , that measures the
distance between an input point (or a vector) with a given fixed point (or
a vector) of interest (a center or reference point) .
Here, can be any distance function, such as Euclidean distance. Further,
the function depends on the specific application and desired set of
properties. For the case of vectors, we call this function RBK (radial basis
kernel).
We use RBF in mathematics, signal processing, computer vision, and machine
learning. In these, we use radial functions to approximate those functions that
either lack a closed form or are too complex to solve. In most cases, this
approximation function is a generic neural network.
2.2. Types
RBF measures the similarity between a given data point and an agreed
reference point. We can then use this similarity score to take specific
actions, such as activating a dead neural network node. Over here, the
similarity directly correlates with the distance between the data and reference
points.
Depending upon the function definition, we can have different RBFs. One
of the most commonly used RBFs is the Gaussian RBF. It is given by:
Here, the parameter controls the variance (spread) of the Gaussian
curve. A more minor value of results in a broader curve, while a more
considerable value of leads to a narrower curve:
Other types of RBFs include the Multiquadric, Inverse Multiquadric, and
Thin Plate Splines. Each RBF has its characteristics and can be suitable for
specific applications or tasks. In the realm of neural networks, we often use
RBFs as activation functions in the network’s hidden layer.
To summarize this section, RBF represents a basis function that measures
the similarity between the input data and a reference point, influencing the
network’s output.
3.2. Structure
Now, we move ahead and describe the typical RBF network structure.
The RBF network consists of the following three layers:
1. the input layer (usually one)
2. the hidden layer (strictly one)
3. the output layer (usually one)
Training
RBF network training process involves two main steps:
Splines
In the previous lecture, we have discussed about linear regression, which is a
straight line to connect the dependent and non-dependent variables, but with
that linear line, it is not always possible to make a linear line. Then comes the
that the more polynomial terms we add, the more prone the model was to
overfitting.
To fit complex shapes that describe real data, we need a way to design complex
functions without overfitting. For that, we have to make a new method that is a
is considered as one, but in splines regression, we have to split the dataset into
many parts which we call bin. And the points in which we divide the data are
called knots and we use different methods in different bins. These separate
functions we use in the different bins are called piecewise step functions.
Why Splines?
We already discussed that linear regression is a straight line hence we made
polynomial regression but it can make the model overfitting issue. The need for
a model that can be used with the good properties of both linear and polynomial
overfitting.
determine where to break up the polynomial. The point where this division
In the example above, each P_x represents a knot. The knots at the ends of the
curves are known as boundary knots, while the knots within the curve are
the data
• Cross-validation
Types of Splines
The mathematics for splines can seem complicated without knowing some
refer you to the Elements of Statistical Learning, 2nd Edition by Trevor Hastie,
Cubic Splines
smoothly. This means that the first and second derivatives of these functions
must be continuous. The plot below shows a cubic spline and how the first
Natural Splines
Polynomial functions and other kinds of splines tend to have bad fits near the
ends of the functions. This variability can have huge consequences, particularly
Smoothing Splines
spline. The cost function is penalized if the variability of the coefficient is high.
Below is a plot that shows a situation where smoothing splines are needed to
suppose we are building several machine learning models to analyze the performance of
ii) Model_2 consists of 4 features say weather and max speed of the car including
the above two.
iii) Model_3 consists of 8 features say driver’s experience, number of wins, car
condition, and driver’s physical fitness including all the above features.
There are several domains where we can see the effect of this phenomenon.
Machine Learning is one such domain. Other domains include numerical
analysis, sampling, combinatorics, data mining, and databases. As it is clear from
the title we will see its effect only in Machine Learning.
The curse of dimensionality refers to the difficulties that arise when analyzing
or modeling data with many dimensions. These problems can be summarized in
the following points:
• Data Sparsity: Data points become increasingly spread out, making it hard
to find patterns or relationships.
• Computational Complexity: The computational burden of algorithms
increases exponentially.
• Overfitting: Models become more likely to memorize the training data
without generalizing well.
• Distortion of Distance Metrics: Traditional distance metrics become less
reliable in measuring proximity.
• Visualization Challenges: Projecting high-dimensional data onto lower
dimensions leads to loss of information.
• Data Preprocessing: Identifying relevant features and reducing
dimensionality is crucial for effective analysis.
• Algorithmic Efficiency: Algorithms need to be scalable and efficient to
handle the complexity of high-dimensional spaces.
• Domain-Specific Challenges: Each domain faces unique challenges in
high-dimensional spaces, requiring tailored approaches.
• Interpretability Issues: Understanding the decision-making process of
high-dimensional models becomes increasingly difficult.
• Data Storage Requirements: Efficient data storage and retrieval
strategies are essential for managing large volumes of high-dimensional
data
science, that are used to construct new data points within the range of a
discrete data set of known data points or can be used for determining a
formula of the function that will pass from the given set of points (x,y).
Interpolation Meaning
Interpolation is a method of deriving a simple function from the given
discrete data set such that the function passes through the provided data
points. This helps to determine the data points in between the given data
ones. This method is always needed to compute the value of a function for an
intermediate value of the independent function. In short, interpolation is a
process of determining the unknown values that lie in between the known
data points. It is mostly used to predict the unknown values for any
geographical related data points such as noise level, rainfall, elevation, and
so on.
Interpolation Methods
Nearest-neighbor interpolation
Linear interpolation
Support Vector Machine (SVM)
Support Vector Machine (SVM) is a powerful and versatile supervised machine learning algorithm used
for classification and regression tasks. It's particularly effective in cases where the data is separable into
distinct classes or has a clear margin of separation. SVM aims to find the optimal hyperplane that best
separates the data points of different classes while maximizing the margin between them.
1. Basic Concept:
• At its core, SVM works by finding the hyperplane that maximizes the margin between
the classes. A hyperplane in an n-dimensional space is a flat affine subspace of
dimension n-1.
• In simple terms, for a 2-dimensional dataset, the hyperplane is a line, and for a 3-
dimensional dataset, it's a plane. In higher dimensions, it's a hyperplane.
• SVM operates by mapping input data into a higher-dimensional feature space where the
data points can be separated by a hyperplane. This is done using a kernel function that
computes the inner products of the feature vectors in the higher-dimensional space.
2. Margin and Support Vectors:
• The margin is the distance between the hyperplane and the nearest data point from
each class, also known as support vectors.
• Support vectors are the data points closest to the hyperplane and play a crucial role in
determining its orientation.
• SVM aims to find the hyperplane with the maximum margin, which is the one that is
farthest from the support vectors.
3. Kernel Trick:
• In many cases, the data may not be linearly separable in the original feature space. The
kernel trick allows SVM to implicitly map the input data into a higher-dimensional space
without explicitly computing the transformation.
• Popular kernel functions include linear, polynomial, radial basis function (RBF), and
sigmoid kernels. These kernels enable SVM to handle complex decision boundaries and
nonlinear relationships in the data.
4. Optimization Objective:
• In cases where the data is not linearly separable or is noisy, SVM can be extended to
use a soft margin, allowing for some misclassifications. This is known as Soft Margin
SVM.
• The soft margin formulation introduces a penalty parameter (C) that controls the trade-
off between maximizing the margin and minimizing the classification error. A smaller
value of C allows for a wider margin but may lead to more misclassifications, while a
larger value of C reduces the margin to achieve higher accuracy.
6. Multiclass Classification:
• SVM inherently supports binary classification, but several strategies can be used to
extend it to multiclass classification problems. One common approach is the one-vs-all
(OvA) or one-vs-rest (OvR) strategy, where separate binary classifiers are trained for
each class, and the class with the highest confidence score is chosen as the predicted
class.
7. Applications:
• SVM has a wide range of applications in various domains, including text classification,
image recognition, bioinformatics, finance, and more.
• In text classification, SVM can be used for sentiment analysis, spam detection, and
document categorization.
• In image recognition, SVM can classify images into different categories based on their
features extracted from pixels.
• In bioinformatics, SVM is used for protein classification, gene expression analysis, and
disease prediction.
8. Advantages:
• It works well with both linearly separable and nonlinearly separable data using
appropriate kernel functions.
• SVM provides global optimality, meaning the solution is guaranteed to be the best
possible solution given the data and the chosen parameters.
9. Disadvantages:
• The choice of kernel function and its parameters can significantly impact the
performance of the SVM model, and selecting the appropriate kernel requires domain
knowledge and experimentation.
• SVM models are not very interpretable compared to some other machine learning
algorithms like decision trees.
In summary, Support Vector Machine is a versatile and powerful algorithm for classification and
regression tasks, capable of handling both linear and nonlinear relationships in the data. Its
effectiveness, especially in high-dimensional spaces, makes it a popular choice for various machine
learning applications. However, it's essential to carefully select the appropriate kernel function and tune
the model parameters to achieve optimal performance.