0% found this document useful (0 votes)

3 views26 pages

Week 8 Notes_DM

The document discusses Discriminant Analysis, particularly Linear Discriminant Analysis (LDA), as a method for dimensionality reduction in supervised learning. It outlines the steps involved in LDA, including computing mean vectors, scatter matrices, and eigenvalues, while also comparing LDA with Principal Component Analysis (PCA). Additionally, it addresses the importance of prior probabilities in classification tasks, especially in imbalanced datasets, and highlights the implications of misclassification errors in real-world scenarios.

Uploaded by

SUVODIP JANA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views26 pages

Week 8 Notes_DM

Uploaded by

SUVODIP JANA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Discriminant Analysis and Association Rules

Introduction to Discriminant Analysis

Dimensionality reduction is the transformation of data from high dimensional space
into a low dimensional space so that low dimensional space representation retains
nearly all the information ideally saying all the information only by reducing the
width of the data. Working with high dimensional space can be undesirable for many
reasons like raw data is mostly sparse and results in high computational cost.
Dimensionality reduction is common in a field that deals with large instances and
columns.

Methods of dimensionality reduction are divided into linear and non-linear

approaches. Dimensionality reduction can also be used for noise reduction, data
visualization, cluster analysis, and as an intermediate step while building predictive
models.

In most cases, linear discriminant analysis is used as dimensionality reduction for

supervised problems. It is used for projecting features from higher dimensional space
to lower-dimensional space. Basically many engineers and scientists use it as a
preprocessing step before finalizing a model. Under LDA we basically try to address
which set of parameters can best describe the association of groups for a class, and
what is the best classification model that separates those groups.

For example, we have two classes and we need to separate them efficiently. Classes
can have multiple features. Using only a single feature to classify them may result in
some overlapping as shown in the below figure. So, we will keep on increasing the
number of features for proper classification.

1
Suppose we have two sets of data points belonging to two different classes that we
want to classify. As shown in the given 2D graph, when the data points are plotted on
the 2D plane, there‟s no straight line that can separate the two classes of the data
points completely. Hence, in this case, LDA (Linear Discriminant Analysis) is used
which reduces the 2D graph into a 1D graph in order to maximize the separability
between the two classes.

Here, Linear Discriminant Analysis uses both the axes (X and Y) to create a new axis
and projects data onto a new axis in a way to maximize the separation of the two
categories and hence, reducing the 2D graph into a 1D graph.

Two criteria are used by LDA to create a new axis:

 Maximize the distance between means of the two classes.

 Minimize the variation within each class.

2
In the above graph, it can be seen that a new axis (in red) is generated and plotted in
the 2D graph such that it maximizes the distance between the means of the two classes
and minimizes the variation within each class. In simple terms, this newly generated
axis increases the separation between the data points of the two classes. After
generating this new axis using the above-mentioned criteria, all the data points of the
classes are plotted on this new axis and are shown in the figure given below.

LDA approaches by finding a linear combination of features that characterizes two or

more classes or outcomes and the resulting combination is used as a linear classifier or
for dimensionality reduction.

3
Assumption of LDA

 Each feature/column in the dataset is Gaussian distribution in simple words

data points are normally distributed having bell-shaped curves.
 Independent variables are normal for each level of the grouping variable.
 Predictive power can decrease with an increase in correlation between
variables.
 All instances are assumed to be randomly sampled and scores on one variable
are assumed to be independent.

LDA vs PCA

LDA is a very similar approach to principal component analysis both are linear
transformation techniques for dimensionality reduction but also we are pursuing some
differences those are listed below.

Consider another simple example of dimensionality reduction and feature extraction,

you want to check the quality of soap based on the information provided related to a

4
soap including various features such as weight and volume of soap, peoples‟
preferential score, odor, color, contrasts, etc.

A small scenario to understand the problem more clearly;

 Object to be tested -Soap;

 To check the quality of a product- class category as „good‟ or „bad‟( dependent
variable, categorical variable, measurement scale as a nominal scale);
 Features to describe the product- various parameters that describe the soap
(independent variable, measurement scale as nominal, ordinal, internal scale);

When the target variable or dependent variable is decided then other related
information can be dragged out from existing datasets to check the effectivity of
features on the target variables. And hence, the data dimension gets reduced out and
important related-features have stayed in the new dataset.

How LDA works

LDA projects features from higher dimension to lower dimension space, how LDA
achieves this, let‟s look into:

Step#1 Computes mean vectors of each class of dependent variable

Step#2 Computers with-in class and between-class scatter matrices

Step#3 Computes eigenvalues and eigenvector for SW(Scatter matrix within class)
and SB (scatter matrix between class)

Step#4 Sorts the eigenvalues in descending order and select the top k

Step#5 Creates a new matrix containing eigenvectors that map to the k eigenvalues

Step#6 Obtains the new features (i.e. linear discriminants) by taking the dot product of
the data and the matrix.

How to prepare the data for LDA

5
1. Outlier Treatment: Outliers from the data should be removed, outliers will introduce
skewness and in-turn computations of mean and variance will be influenced and
finally, that will have an impact on LDA computations.

2. Equal Variance: Standardization of input data, such that it has a mean 0 and a
standard deviation of 1.

3. Gaussian distribution: Univariate analysis of each input feature and if they do not
exhibit the gaussian distribution transform them to look like Gaussian distribution (log
and root for exponential distributions).

LDA can be performed in 5 steps:

1. Compute the mean vectors for the different classes from the dataset.
2. Compute the scatter matrices (in-between-class and within-class scatter matrices).
3. Compute the eigenvectors and corresponding eigenvalues for the scatter matrices.
4. Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with
the largest eigenvalues.
5. Use this eigenvector matrix to transform the samples onto the new subspace.

Computing the mean vectors

First, calculate the mean vectors for all classes inside the dataset.

Computing the scatter matrices

After calculating the mean vectors, the within-class and between-class scatter matrices
can be calculated.

6
Select linear discriminants for the new feature subspace

After calculating the eigenvectors and eigenvalues, we sort the eigenvectors from
highest to lowest depending on their corresponding eigenvalue and then choose the
top k eigenvectors, where k is the number of dimensions we want to keep.

Transform data onto the new subspace

After selecting the k eigenvectors, we can use the resulting d x k-dimensional

eigenvector matrix W to transform data onto the new subspace via the following
equation:

7
Y= X * W

Linear Discriminant Analysis Implementation

Let‟s implement the LDA on the Iris dataset. This dataset contains information about
the size of the petals and sepals of three different species of flowers. Before
implementing the LDA on the given dataset, ensure you have installed the following
modules on your system.

pandas
NumPy
matplotlib
sklearn
seaborn

%pip install pandas

pip install numpy
pip install matplotlib
pip install sklearn
pip install seaborn

Importing and exploring the dataset

# importing the module

from sklearn import datasets
# loading the iris data
dataset = datasets.load_iris()
Let‟s print the keys of the dataset and see what kind of information we have there:
# dataset key values
dataset.keys()

Output:

8
You can explore each of these on your own, but here we will just go through DESCR
because it contains the details about the dataset.
# information about dataset
print(dataset['DESCR'])

Next, you can find the dataset‟s statistics by using the DataFrame describe() function.
# importing the module
import pandas as pd
# convertig the dataset into pandas dataframe
data = pd.DataFrame(dataset.data, columns=dataset.feature_names)
# descriptive statistics
data.describe()
Output:

DataFrame‟s stats contain each column‟s count, such as mean, standard deviation,
minimum, maximum values, etc.

Using LDA for dimensionality reduction

There are 4 input variables in our dataset, so it is impossible to visualize them in one
graph. Let‟s apply LDA with 2 components so that the same data can be visualized
using the 2D plot.

9
# input and output variables
X = dataset.data
y = dataset.target
target_names = dataset.target_names
# importing the requried module
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# initializing the model with 2 components
lda = LinearDiscriminantAnalysis(n_components=2)
# fitting the dataset
X_r2 = lda.fit(X, y).transform(X)

Now our data is two-dimensional, and we can easily visualize it.

# importing the required module
import matplotlib.pyplot as plt
# plot size
plt.figure(figsize=(15, 8))
# plotting the graph
plt.scatter(X_r2[:,0],X_r2[:,1], c=dataset.target)
plt.show()

10
This graph shows that there are three types of output classes. The LDA has helped us
to visualize these three clusters in a 2D plot.

Prior Probabilities

For a categorical target variable, each modelling node can estimate posterior
probabilities for each class, which are defined as the conditional probabilities of the
classes given the input variables. By default, the posterior probabilities are based on
implicit prior probabilities that are proportional to the frequencies of the classes in the
training set. Prior probabilities should be specified when the sample proportions of the
classes in the training set differ substantially from the proportions in the operational
data to be scored, either through sampling variation or deliberate bias. For example,
when the purpose of the analysis is to detect a rare class, it is a common practice to
use a training set in which the rare class is over represented. If no prior probabilities
are used, the estimated posterior probabilities for the rare class will be too high. If you
specify correct priors, the posterior probabilities will be correctly adjusted no matter
what the proportions in the training set are.

Increasing the prior probability of a class increases the posterior probability of the
class, moving the classification boundary for that class so that more cases are
classified into the class. Changing the prior will have a more noticeable effect if the
original posterior is near 0.5 than if it is near zero or one. For linear logistic regression
and linear normal-theory discriminant analysis, classification boundaries are
hyperplanes; increasing the prior for a class moves the hyperplanes for that class
farther from the class mean, but decreasing the prior moves the hyperplanes closer to
the class mean. But changing the priors does not change the angles of the hyperplanes.

 Prior probabilities are assumed to be estimates of the true proportions of the

classes in the operational data to be scored.
 Prior probabilities are not used by default for parameter estimation. This
enables you to manipulate the class proportions in the training set by

11
nonproportional sampling or by a frequency variable in any manner that you
want.
 If you specify prior probabilities, the posterior probabilities computed by the
modeling nodes are always adjusted for the priors.

 If you specify prior probabilities, the profit and loss summary statistics are
always adjusted for priors and therefore provide valid model comparisons,
assuming that you specify valid decision consequences.

Posterior probabilities are adjusted for priors as follows.

Where,
t- be an index for target values (classes)
i - be an index for cases
OldPrior(t) - be the old prior probability or implicit prior probability for target t
OldPost(i,t) - be the posterior probability based on OldPrior(t)
Prior(t) - be the new prior probability desired for target t
Post(i,t) - be the posterior probability based on Prior(t)

For classification, each case i is assigned to the class with the greatest posterior
probability, that is, the class t for which Post(i,t) is maximized.

Prior probabilities have no effect on estimating parameters in the Regression node, on

learning weights in the Neural Network node, or, by default, on growing trees in the
Tree node. Prior probabilities do affect classification and decision processing for each
case. Hence, if you specify the appropriate options for each node, prior probabilities
can affect the choice of models in the Regression node, early stopping in the Neural
Network node, and pruning in the Tree node.

Prior probabilities do not affect:

 Estimating parameters in the Regression node.

12
 Learning weights in the Neural Network node.
 Growing (as opposed to pruning) trees in the Decision Tree node unless you
configure the property Use Prior Probability in Split Search.
 Residuals, which are based on posteriors before adjustment for priors, except in the
Decision Tree node if you choose to use prior probabilities in the split search.
 Error functions such as deviance or likelihood, except in the Decision Tree node if
you choose to use prior probabilities in the split search.
 Fit statistics such as MSE based on residuals or error functions, except in the
Decision Tree node if you choose to use prior probabilities in the split search.

Prior probabilities do affect:

 Posterior probabilities
 Classification
 Decisions
 Misclassification rate
 Expected profit or loss
 Profit and loss summary statistics, including the relative contribution of each
class.

Unequal Classification Costs

 Imbalanced classification problems often value false-positive classification errors

differently from false negatives.
 Cost-sensitive learning is a subfield of machine learning that involves explicitly
defining and using costs when training machine learning algorithms.
 Cost-sensitive techniques may be divided into three groups, including data
resampling, algorithm modifications, and ensemble methods.

Classification is a predictive modeling problem that involves predicting the class label
for an observation. There may be many class labels, so-called multi-class

13
classification problems, although the simplest and perhaps most common type of
classification problem has two classes and is referred to as binary classification. Most
data mining or machine learning algorithms designed for classification assume that
there is an equal number of examples for each observed class. This is not always the
case in practice, and datasets that have a skewed class distribution are referred to as
imbalanced classification problems.

In addition to assuming that the class distribution is balanced, most algorithms also
assume that the prediction errors made by a classifier are the same, so-called miss-
classifications. This is typically not the case for binary classification problems,
especially those that have an imbalanced class distribution.

For imbalanced classification problems, the examples from the majority class are
referred to as the negative class and assigned the class label 0. Those examples from
the minority class are referred to as the positive class and are assigned the class label
1.

The reason for this negative vs. positive naming convention is because the examples
from the majority class typically represent a normal or no-event case, whereas
examples from the minority class represent the exceptional or event case.

 Majority Class: Negative or no-event assigned the class label 0.

 Minority Class: Positive or event assigned the class label 1.

Real-world imbalanced binary classification problems typically have a different

interpretation for each of the classification errors that can be made. For example,
classifying a negative case as a positive case is typically far less of a problem than
classifying a positive case as a negative case.

This makes sense if we consider the goal of a classifier on imbalanced binary

classification problems is to detect the positive cases correctly and positive cases
represent an exceptional or event that we are most interested in.

Examples:
14
Bank Loan Problem: Consider a problem where a bank wants to determine whether to
give a loan to a customer or not. Denying a loan to a good customer is not as bad as
giving a loan to a bad customer that may never repay it.

Cancer Diagnosis Problem: Consider a problem where a doctor wants to determine

whether a patient has cancer or not. It is better to diagnose a healthy patient with
cancer and follow-up with more medical tests than it is to discharge a patient that has
cancer.

Fraud Detection Problem: Consider the problem of an insurance company wants to

determine whether a claim is fraudulent. Identifying good claims as fraudulent and
following up with the customer is better than honoring fraudulent insurance claims.

We can see with these examples that misclassification errors are not desirable in
general, but one type of misclassification is much worse than the other. Specifically
predicting positive cases as a negative case is more harmful, more expensive, or worse
in whatever way we want to measure the context of the target domain.

A confusion matrix is a summary of the predictions made by a model on classification

tasks. It is a table that summarizes the number of predictions made for each class,
separated by the actual class to which each example belongs.

It is best understood using a binary classification problem with negative and positive
classes, typically assigned 0 and 1 class labels respectively. The columns of the table
represent the actual class to which examples belong, and the rows represent the
predicted class (although the meaning of rows and columns can and often are
interchanged with no loss of meaning). A cell in the table is the count of the number
of examples that meet the conditions of the row and column, and each cell has a
specific common name.

An example of a confusion matrix for a binary classification task is listed below

showing the common names for the values in each of the four cells of the table.

| Actual Negative | Actual Positive

15
Predicted Negative | True Negative | False Negative
Predicted Positive | False Positive | True Positive

Now, we can consider the same table with the same rows and columns and assign a
cost to each of the cells. This is called a cost matrix.

Cost Matrix: A matrix that assigns a cost to each cell in the confusion matrix.

The example below is a cost matrix where we use the notation C() to indicate the cost,
the first value represented as the predicted class and the second value represents the
actual class. The names of each cell from the confusion matrix are also listed as
acronyms, e.g. False Positive is FP.

| Actual Negative | Actual Positive

Predicted Negative | C(0,0), TN | C(0,1), FN
Predicted Positive | C(1,0), FP | C(1,1), TP

An intuition from this matrix is that the cost of misclassification is always higher than
correct classification, otherwise, cost can be minimized by predicting one class. For
example, we might assign no cost to correct predictions in each class, a cost of 5 for
False Positives and a cost of 88 for False Negatives.

| Actual Negative | Actual Positive

Predicted Negative | 0 | 88
Predicted Positive | 5 |0

We can define the total cost of a classifier using this framework as the cost-weighted
sum of the False Negatives and False Positives.

Total Cost = C(0,1) * False Negatives + C(1,0) * False Positives

In some problem domains, defining the cost matrix might be obvious. In an insurance
claim example, the costs for a false positive might be the monetary cost of follow-up

16
with the customer to the company and the cost of a false negative might be the cost of
the insurance claim.

In other domains, defining the cost matrix might be challenging. For example, in a
cancer diagnostic test example, the cost of a false positive might be the monetary cost
of performing subsequent tests, whereas what is the equivalent dollar cost for letting a
sick patient go home and get sicker? A cost matrix might be able to be defined by a
domain expert or economist in such cases, or not.

Introduction to Association Rules

The goal of association rule mining is to find rules that will predict the occurrence of
an item (Item Y) based on the occurrence of other items (Item X) in the transaction.
For example: Predict the chance of user buying a phone cover (Item Y) if he already
bought the phone (Item X) and if the chance is high enough then recommend phone
cover to someone who are buying the phone. There is a big chance to discover strong
rules in big data, but keep in mind that the implication means co-occurrence does not
necessarily means causality! We cannot assist that buying one item is cause of buying
the other one when items are just frequently bought together.

Here is our data which consist of 5 transactions made by our customer. Each
transaction shows the products bought together in that transaction.

17
Given a set of transactions, the goal of association rule mining is to find the rules that
allow us to predict the occurrence of a specific item based on the occurrences of the
other items in the transaction.

An association rule consists of two parts, an antecedent (if) and a consequent (then).
An antecedent is something found in data, and a consequent is something located in
conjunction with the antecedent. For a quick understanding, consider the following
association rule: “If a customer buys bread, he‟s 70% likely of buying milk.”
Bread is the antecedent in the given association rule, and milk is the consequent.
Terminologies:

Item-set: It‟s a collection of one or more items. K-item-set means a set of k items. For
example: Item-set is {Bread, Milk}

k-itemset: An itemset that contains k items. For example: {Bread, Milk} is 2-itemset

Support Count: Indication of how frequently the item set appears in the database. It‟s
frequency of occurrence of an item-set. For example: {Bread, Milk} occurs 3 times in
our data set

Support: Fraction of transactions that contain the item-set. Support=Frequency of

Itemset/Total N of Transactions. For example: Support for {Bread, Milk} = 3/5=60%,
it means that 60% of the transactions contain itemset {Bread, Milk}

18
Confidence: For a rule X=>Y confidence shows the percentage in which X is bought
with Y. So confidence is the number of transactions with both X and Y divided by the
total number of transactions having X.

For example: Confidence for Bread => Milk = 3 / 4 = 75%, it means that 75% of the
transactions that contain X (Bread) also contain Y (Milk) together.

Confidence (X=>Y) = P(X∩Y)/P(X)=Frequency(X,Y)/frequency(X). Suppose, the

confidence of the association rule X⇒Y is 80%, it means that 80% of the transactions
that contain X also contain Y together.

Form of Association Rule: X=> Y [Support, Confidence], where X and Y are sets of
items in the transaction data.

For example: Bread => Milk [Support=60%, Confidence= 75%], where support shows
that in 60% of transactions bread and milk are purchased together, confidence shows
that 75% of customers who purchase bread also purchase milk

Frequent item-set: An itemset whose support is greater than or equal to a minimum

_support threshold

Strong rules: If a rule X=>Y [Support, Confidence] satisfies min_sup and

min_confidence then it is a strong rule

The Goal of Association Rule Mining: The goal of association rule mining is to find
all association rules having support≥minimum_support threshold and confidence ≥
minimum_confidence threshold

Lift: Lift gives the correlation between X and Y in the rule X=>Y. Correlation shows
how one item-set X effects the item-set Y. Lift(X=>Y) = Confidence of the rule
(X=>Y)/ Support(Y)

Lift for the rule {Bread}=>{Milk}: Confidence of the rule (75%) / Support (Milk) =
4/5 (80 %) = 75%/80 % = 93.75%

19
Evaluate the rule using the value of the Lift:
 If the rule had a lift of 1, then X and Y are independent and no rule can be
derived from them
 If the lift is < 1, then presence of X will have negative effect on Y
 If the lift is > 1, then X and Y are dependent on each other, and the degree of
which is given by lift value

Why use support and confidence?

Support and Confidence measure how interesting the rule is. Support is also used for
efficient discovery of association rules. Confidence, on the other hand, measures the
reliability of the inference made by a rule. For a given rule X->Y, the higher the
confidence, the more likely it is for Y to be present in transactions that contain X.
Confidence also provides an estimate of the conditional probability of Y given X.

Algorithms of Association Rule Mining:

 Apriori Algorithm
 Eclat Algorithm
 FP-Growth Algorithm

1) Apriori Algorithm - It delivers by characteristic the foremost frequent

individual things within the information and increasing them to larger and bigger item
sets as long as those item sets seem ofttimes enough within the information. The
common itemsets ensured by apriori also are accustomed make sure association rules
that highlight trends within the information. It counts the support of item sets
employing a breadth-first search strategy and a candidate generation perform that
takes advantage of the downward closure property of support.

2) Eclat Algorithm - Eclat denotes equivalence class transformation. The set

intersection was supported by its depth-first search formula. It‟s applicable for each
successive and parallel execution with spot-magnifying properties. This can be the

20
associate formula for frequent pattern mining supported by the item set lattice‟s depth-
first search cross.

It is a DFS cross of the prefix tree rather than a lattice.

For stopping, the branch and a specific technique are used.

3) FP-growth Algorithm - This algorithm is also called a recurring pattern. The FP

growth formula is used for locating frequent item sets terribly dealings data but not for
candidate generation. This was primarily designed to compress the database that
provides frequent sets and then divides the compressed data into conditional database
sets. This conditional database is associated with a frequent set. Each database then
undergoes the process of data mining.

The data source is compressed using the FP-tree data structure. This algorithm
operates in two stages. These are as follows:

 FP-tree construction
 Extract frequently used itemsets

Types of Association Rules:

There are various types of association rules in data mining:-

 Multi-relational association rules

 Generalized association rules
 Quantitative association rules
 Interval information association rules

1. Multi-relational association rules: Multi-Relation Association Rules (MRAR) is a

new class of association rules, different from original, simple, and even multi-
relational association rules (usually extracted from multi-relational databases), each

21
rule element consists of one entity but many a relationship. These relationships
represent indirect relationships between entities.

2. Generalized association rules: Generalized association rule extraction is a

powerful tool for getting a rough idea of interesting patterns hidden in data. However,
since patterns are extracted at each level of abstraction, the mined rule sets may be too
large to be used effectively for decision-making. Therefore, in order to discover
valuable and interesting knowledge, post-processing steps are often required.
Generalized association rules should have categorical (nominal or discrete) properties
on both the left and right sides of the rule.

3. Quantitative association rules: Quantitative association rules is a special type of

association rule. Unlike general association rules, where both left and right sides of
the rule should be categorical (nominal or discrete) attributes, at least one attribute
(left or right) of quantitative association rules must contain numeric attributes.

4. Interval Information Association Rules: Data is first pre-processed by data

smoothing and mapping. Next, interval association rules are generated which involved
data partitioning via clustering before the rules are generated using an Apriori
algorithm. Finally, these rules are used to identify data values that fall outside the
expected intervals.

Discovering Association Rules in Transaction Databases:

Apriori

Imagine you have a database about the items a customer purchases from the store. The
Apriori algorithm helps to uncover interesting relationships & patterns in this data. It
does that by finding the sets of items that occur together, frequently.

22
The following are the main steps of the algorithm:

1. Set the minimum support threshold - min frequency required for an itemset to be
"frequent".
2. Identify frequent individual items - count the occurence of each individual item.
3. Generate candidate itemsets of size 2 - create pairs of frequent items discovered.
4. Prune infrequent itemsets - eliminate itemsets that do not meet the threshold levels.
5. Generate itemsets of larger sizes - combine the frequent itemsets of size 3, 4, and
so on.
6. Repeat the pruning process - keep eliminating the itemsets that do not meet the
threshold levels.
7. Iterate till no more frequent itemsets can be generated.
8. Generate association rules that express the relationship between them - calculate
measures to evaluate the strength & significance of these rules.

Example of Apriori: Support threshold=50%, Confidence= 60%

23
Support threshold=50% => 0.5*6= 3 => min_sup=3

1. Count of Each Item

2. Prune Step: Following shows that I5 item does not meet min_sup=3, thus it is
deleted, only I1, I2, I3, I4 meet min_sup count.

24
3. Join Step: Form 2-itemset. From the first table, find out the occurrences of 2-
itemset.

4. Prune Step: Next table shows that item set {I1, I4} and {I3, I4} does not meet
min_sup, thus it is deleted.

5. Join and Prune Step: Form 3-itemset. From the first table, find out occurrences of 3-
itemset. From the previous table, find out the 2-itemset subsets which support
min_sup.
We can see for itemset {I1, I2, I3} subsets, {I1, I2}, {I1, I3}, {I2, I3} are occurring as
shown in step 4, thus {I1, I2, I3} is frequent. We can see for itemset {I1, I2, I4}
subsets, {I1, I2}, {I1, I4}, {I2, I4}, {I1, I4} is not frequent, as it is not occurring as
shown in step 4, thus {I1, I2, I4} is not frequent, hence it is deleted.

25
26

Against "Ostrich" Nominalism 1980
No ratings yet
Against "Ostrich" Nominalism 1980
10 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
27 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
21 pages
CO3 Session 14
No ratings yet
CO3 Session 14
15 pages
9 - Linear Discriminant Analysis
No ratings yet
9 - Linear Discriminant Analysis
19 pages
Unit-4 Dimensionality Reduction
No ratings yet
Unit-4 Dimensionality Reduction
17 pages
3.4 Lda
No ratings yet
3.4 Lda
12 pages
LDA
No ratings yet
LDA
10 pages
Linear Discriminat Analysis
No ratings yet
Linear Discriminat Analysis
23 pages
ML-UNIT4
No ratings yet
ML-UNIT4
44 pages
What Is Linear Discriminant Analysis
No ratings yet
What Is Linear Discriminant Analysis
3 pages
UNIT - 4
No ratings yet
UNIT - 4
76 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
16 pages
Linear Discriminant Analysis: January 2015
No ratings yet
Linear Discriminant Analysis: January 2015
67 pages
1694601448-Unit 3.5 Linear Discriminant Analysis CU 2.0
No ratings yet
1694601448-Unit 3.5 Linear Discriminant Analysis CU 2.0
25 pages
Dimensionality Reduction: Linear Discriminant Analysis (LDA)
No ratings yet
Dimensionality Reduction: Linear Discriminant Analysis (LDA)
8 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
1 page
ML Ex 1
No ratings yet
ML Ex 1
4 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
12 pages
Two-Dimensional Linear Discriminant Analysis: Jieping Ye Ravi Janardan Qi Li
No ratings yet
Two-Dimensional Linear Discriminant Analysis: Jieping Ye Ravi Janardan Qi Li
8 pages
ML 6
No ratings yet
ML 6
5 pages
ML Unit 3
No ratings yet
ML Unit 3
29 pages
Linear+Discriminant+Analysis+Reference
No ratings yet
Linear+Discriminant+Analysis+Reference
6 pages
Lecture 5
No ratings yet
Lecture 5
13 pages
Machine Learning Lab Manual 8
No ratings yet
Machine Learning Lab Manual 8
12 pages
LDA Final
No ratings yet
LDA Final
25 pages
ANN UNIT-II (1)
No ratings yet
ANN UNIT-II (1)
29 pages
Linear Discriminant Analysis: A Detailed Tutorial
No ratings yet
Linear Discriminant Analysis: A Detailed Tutorial
22 pages
Principal Component Analysis (PCA) and Linear Discriminant Analysis for Image Recognition
No ratings yet
Principal Component Analysis (PCA) and Linear Discriminant Analysis for Image Recognition
17 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
2 pages
Linear Discriminant Analysis LDA PDF
No ratings yet
Linear Discriminant Analysis LDA PDF
2 pages
Lda PDF
No ratings yet
Lda PDF
47 pages
LDA Tutorial
No ratings yet
LDA Tutorial
47 pages
Data Preprocessing-VI (Feature Extraction- LDA)_41bb2da568511945498fd61f3cdcf116
No ratings yet
Data Preprocessing-VI (Feature Extraction- LDA)_41bb2da568511945498fd61f3cdcf116
24 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
32 pages
Presentation
No ratings yet
Presentation
16 pages
PA
No ratings yet
PA
8 pages
AI Com LDA Tarek
No ratings yet
AI Com LDA Tarek
22 pages
Fishers LDA
No ratings yet
Fishers LDA
47 pages
U20cs604 Machine Learning Unit II
No ratings yet
U20cs604 Machine Learning Unit II
50 pages
1.2. Linear and Quadratic Discriminant Analysis — scikit-learn 1.6.1 documentati
No ratings yet
1.2. Linear and Quadratic Discriminant Analysis — scikit-learn 1.6.1 documentati
10 pages
ML 4
No ratings yet
ML 4
14 pages
ML-unit-4
No ratings yet
ML-unit-4
20 pages
Fisher Linear Discriminant Analysis: 1 What's LDA
No ratings yet
Fisher Linear Discriminant Analysis: 1 What's LDA
6 pages
Lecture W12ab
No ratings yet
Lecture W12ab
60 pages
B22CS014 Report
No ratings yet
B22CS014 Report
11 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
Lecture 16_25.09.2024_PCA, Unsupervised Learning-Clustring & Metrics
No ratings yet
Lecture 16_25.09.2024_PCA, Unsupervised Learning-Clustring & Metrics
51 pages
ML-UNIT4
No ratings yet
ML-UNIT4
41 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
28 pages
1-7 MMC Efficient and Robust Feature Extraction by Maximum Margin Criterion
No ratings yet
1-7 MMC Efficient and Robust Feature Extraction by Maximum Margin Criterion
9 pages
UNIT-4 Machine Learning
No ratings yet
UNIT-4 Machine Learning
20 pages
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
No ratings yet
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
33 pages
Linear Discriminant Analysis How To Have A Practical Approach To An LDA Model?
No ratings yet
Linear Discriminant Analysis How To Have A Practical Approach To An LDA Model?
6 pages
Incomplete 1
No ratings yet
Incomplete 1
9 pages
lec9lda
No ratings yet
lec9lda
48 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Lombardi Co. (a)- A Company Buyout- Confidential Instructions for Luca Lombardi, Student Spreadsheet
No ratings yet
Lombardi Co. (a)- A Company Buyout- Confidential Instructions for Luca Lombardi, Student Spreadsheet
1 page
Pump Questions PDF
No ratings yet
Pump Questions PDF
11 pages
ANA CRM Maturity Model
100% (1)
ANA CRM Maturity Model
1 page
Mepc 354 (78) Cii
No ratings yet
Mepc 354 (78) Cii
6 pages
Abnormal Hemoglobins: General Considerations: History and Nomenclature
No ratings yet
Abnormal Hemoglobins: General Considerations: History and Nomenclature
19 pages
Indicating: Bd-700 Aircraft Maintenance Manual - Part I
No ratings yet
Indicating: Bd-700 Aircraft Maintenance Manual - Part I
4 pages
Rigging-Parts-And-Accessories 2022
No ratings yet
Rigging-Parts-And-Accessories 2022
39 pages
FACTORINGPOLYNOMIALS
No ratings yet
FACTORINGPOLYNOMIALS
21 pages
Operating Instructions: KX-TS520MX
No ratings yet
Operating Instructions: KX-TS520MX
2 pages
9-2-potential-difference-and-power-Yor18y1LJ~GdUhOM
No ratings yet
9-2-potential-difference-and-power-Yor18y1LJ~GdUhOM
12 pages
2007 F4 Add Math Projects
100% (4)
2007 F4 Add Math Projects
17 pages
CT Relays (Act) : Features Typical Applications
No ratings yet
CT Relays (Act) : Features Typical Applications
5 pages
TB - FX5U GXW3 Using File Password and Permanent PLC Lock Function
No ratings yet
TB - FX5U GXW3 Using File Password and Permanent PLC Lock Function
13 pages
Tugop Elementary School Second Summative Test in Science Iv Quarter 3 (Week 3&4)
No ratings yet
Tugop Elementary School Second Summative Test in Science Iv Quarter 3 (Week 3&4)
2 pages
Buku Baiduri: Rubric For Assessment
No ratings yet
Buku Baiduri: Rubric For Assessment
28 pages
Valve Material Selection Guide
No ratings yet
Valve Material Selection Guide
15 pages
Tutorial 7 Questions - Model - Answers-21
No ratings yet
Tutorial 7 Questions - Model - Answers-21
5 pages
EDC Notes
No ratings yet
EDC Notes
248 pages
2012 - Cosmetic Ingredient Review - Amended Safety Assessment of Alkyl Esters As Used in Cosmetics
No ratings yet
2012 - Cosmetic Ingredient Review - Amended Safety Assessment of Alkyl Esters As Used in Cosmetics
83 pages
Grid Tied Multilevel Inverter With Power Quality Monitoring Using Myrio and Labview
No ratings yet
Grid Tied Multilevel Inverter With Power Quality Monitoring Using Myrio and Labview
5 pages
STK/WIF/20-21/283 Bar No.: QTC With Despatch
No ratings yet
STK/WIF/20-21/283 Bar No.: QTC With Despatch
58 pages
Dr. Pramit Das APLC MBA
No ratings yet
Dr. Pramit Das APLC MBA
23 pages
SeriesIII Firmware v2 1 16232 ReleaseNotes
No ratings yet
SeriesIII Firmware v2 1 16232 ReleaseNotes
3 pages
Characterization of PLA/Bovine Bone Composite As A Candidate Material For Artificial Bone
No ratings yet
Characterization of PLA/Bovine Bone Composite As A Candidate Material For Artificial Bone
9 pages
Robusta Cupping Protocols
100% (1)
Robusta Cupping Protocols
9 pages
ITT American Electric HPS Micro-Watt Flood Series M Spec Sheet 1-82
No ratings yet
ITT American Electric HPS Micro-Watt Flood Series M Spec Sheet 1-82
6 pages
Field-Testing of Power Semiconductor Modules: Application Note
No ratings yet
Field-Testing of Power Semiconductor Modules: Application Note
11 pages
Physics Prefixes and Base Untis
No ratings yet
Physics Prefixes and Base Untis
2 pages
Ph.D. Preliminary Examination: Design: 30 Kip/ft (Dead Load) 20 Kip (Earthquake Load) 5 FT
No ratings yet
Ph.D. Preliminary Examination: Design: 30 Kip/ft (Dead Load) 20 Kip (Earthquake Load) 5 FT
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 8 Notes_DM

Uploaded by

Week 8 Notes_DM

Uploaded by

Discriminant Analysis and Association Rules

Introduction to Discriminant Analysis

Methods of dimensionality reduction are divided into linear and non-linear

In most cases, linear discriminant analysis is used as dimensionality reduction for

Two criteria are used by LDA to create a new axis:

 Maximize the distance between means of the two classes.

LDA approaches by finding a linear combination of features that characterizes two or

 Each feature/column in the dataset is Gaussian distribution in simple words

Consider another simple example of dimensionality reduction and feature extraction,

A small scenario to understand the problem more clearly;

 Object to be tested -Soap;

How LDA works

Step#1 Computes mean vectors of each class of dependent variable

Step#2 Computers with-in class and between-class scatter matrices

How to prepare the data for LDA

LDA can be performed in 5 steps:

Computing the mean vectors

Computing the scatter matrices

Transform data onto the new subspace

After selecting the k eigenvectors, we can use the resulting d x k-dimensional

Linear Discriminant Analysis Implementation

%pip install pandas

Importing and exploring the dataset

# importing the module

Using LDA for dimensionality reduction

Now our data is two-dimensional, and we can easily visualize it.

 Prior probabilities are assumed to be estimates of the true proportions of the

Posterior probabilities are adjusted for priors as follows.

Prior probabilities have no effect on estimating parameters in the Regression node, on

Prior probabilities do not affect:

 Estimating parameters in the Regression node.

Prior probabilities do affect:

Unequal Classification Costs

 Imbalanced classification problems often value false-positive classification errors

 Majority Class: Negative or no-event assigned the class label 0.

Real-world imbalanced binary classification problems typically have a different

This makes sense if we consider the goal of a classifier on imbalanced binary

Cancer Diagnosis Problem: Consider a problem where a doctor wants to determine

Fraud Detection Problem: Consider the problem of an insurance company wants to

A confusion matrix is a summary of the predictions made by a model on classification

An example of a confusion matrix for a binary classification task is listed below

| Actual Negative | Actual Positive

| Actual Negative | Actual Positive

| Actual Negative | Actual Positive

Total Cost = C(0,1) * False Negatives + C(1,0) * False Positives

Introduction to Association Rules

Support: Fraction of transactions that contain the item-set. Support=Frequency of

Confidence (X=>Y) = P(X∩Y)/P(X)=Frequency(X,Y)/frequency(X). Suppose, the

Frequent item-set: An itemset whose support is greater than or equal to a minimum

Strong rules: If a rule X=>Y [Support, Confidence] satisfies min_sup and

Why use support and confidence?

Algorithms of Association Rule Mining:

1) Apriori Algorithm - It delivers by characteristic the foremost frequent

2) Eclat Algorithm - Eclat denotes equivalence class transformation. The set

It is a DFS cross of the prefix tree rather than a lattice.

For stopping, the branch and a specific technique are used.

3) FP-growth Algorithm - This algorithm is also called a recurring pattern. The FP

Types of Association Rules:

There are various types of association rules in data mining:-

 Multi-relational association rules

1. Multi-relational association rules: Multi-Relation Association Rules (MRAR) is a

2. Generalized association rules: Generalized association rule extraction is a

3. Quantitative association rules: Quantitative association rules is a special type of

4. Interval Information Association Rules: Data is first pre-processed by data

Discovering Association Rules in Transaction Databases:

Example of Apriori: Support threshold=50%, Confidence= 60%

1. Count of Each Item

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.