0% found this document useful (0 votes)

23 views18 pages

UNIT 5 NOTES DWM

The document discusses two primary forms of data analysis: classification and prediction, which are used to extract models for understanding data classes and predicting future trends. It explains decision tree induction, detailing the structure and benefits of decision trees, and introduces Bayesian classification, emphasizing its probabilistic approach and applications in various fields. Additionally, it covers rule-based classification and backpropagation for training neural networks, highlighting the importance of these techniques in data mining and machine learning.

Uploaded by

Jayshree Borkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views18 pages

UNIT 5 NOTES DWM

Uploaded by

Jayshree Borkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Classification and Predication

here are two forms of data analysis that can be used to extract models
describing important classes or predict future data trends. These two
forms are as follows:

1. Classification
2. Prediction

We use classification and prediction to extract a model,

representing the data classes to predict future data trends.
Classification predicts the categorical labels of data with the
prediction models. This analysis provides us with the best
understanding of the data at a large scale.
Classification models predict categorical class labels, and prediction
models predict continuous-valued functions. For example, we can
build a classification model to categorize bank loan applications as
either safe or risky or a prediction model to predict the expenditures
in dollars of potential customers on computer equipment given their
income and occupation.

Decision Tree Induction

A decision tree is a structure that includes a root node, branches, and leaf
nodes. Each internal node denotes a test on an attribute, each branch
denotes the outcome of a test, and each leaf node holds a class label. The
topmost node in the tree is the root node.

The following decision tree is for the concept buy_computer that indicates
whether a customer at a company is likely to buy a computer or not. Each
internal node represents a test on an attribute. Each leaf node represents
a class.

The benefits of having a decision tree are as follows –

 It does not require any domain knowledge.

 It is easy to comprehend.
 The learning and classification steps of a decision tree are simple and fast.

Decision Tree Induction Algorithm

A machine researcher named J. Ross Quinlan in 1980 developed a decision

tree algorithm known as ID3 (Iterative Dichotomiser). Later, he presented
C4.5, which was the successor of ID3. ID3 and C4.5 adopt a greedy
approach. In this algorithm, there is no backtracking; the trees are
constructed in a top-down recursive divide-and-conquer manner.

Generating a decision tree form training tuples of data

partition D
Algorithm : Generate_decision_tree

Input:
Data partition, D, which is a set of training tuples
and their associated class labels.
attribute_list, the set of candidate attributes.
Attribute selection method, a procedure to determine the
splitting criterion that best partitions that the data
tuples into individual classes. This criterion includes a
splitting_attribute and either a splitting point or splitting
subset.

Output:
A Decision Tree

Method
create a node N;

if tuples in D are all of the same class, C then

return N as leaf node labeled with class C;

if attribute_list is empty then

return N as leaf node with labeled
with majority class in D;|| majority voting

apply attribute_selection_method(D, attribute_list)

to find the best splitting_criterion;
label node N with splitting_criterion;

if splitting_attribute is discrete-valued and

multiway splits allowed then // no restricted to binary
trees

attribute_list = splitting attribute; // remove splitting

attribute
for each outcome j of splitting criterion

// partition the tuples and grow subtrees for each

partition
let Dj be the set of data tuples in D satisfying outcome j;
// a partition
if Dj is empty then
attach a leaf labeled with the majority
class in D to node N;
else
attach the node returned by Generate
decision tree(Dj, attribute list) to node N;
end for
return N;

Bayesian Classification

Bayesian classification in data mining is a statistical technique used to classify data based on
probabilistic reasoning. It is a type of probabilistic classification that uses Bayes' theorem to
predict the probability of a data point belonging to a certain class. The Bayesian
classification is a powerful technique for probabilistic inference and decision-making and is
widely used in various applications such as medical diagnosis, spam classification, fraud
detection, etc.

Introduction to Bayesian Classification in Data Mining

Bayesian classification in data mining is a statistical approach to data classification that uses
Bayes' theorem to make predictions about a class of a data point based on observed data. It is
a popular data mining and machine learning technique for modelling the probability of
certain outcomes and making predictions based on that probability.

The basic idea behind Bayesian classification in data mining is to assign a class label to a
new data instance based on the probability that it belongs to a particular class, given the
observed data. Bayes' theorem provides a way to compute this probability by multiplying the
prior probability of the class (based on previous knowledge or assumptions) by the likelihood
of the observed data given that class (conditional probability).

Several types of Bayesian classifiers exist, such as naive Bayes, Bayesian network classifiers,
Bayesian logistic regression, etc. Bayesian classification is preferred in many applications
because it allows for the incorporation of new data (just by updating the prior probabilities)
and can update the probabilities of class labels accordingly.

This is important when new data is constantly being collected, or the underlying distribution
may change over time. In contrast, other classification techniques, such as decision trees or
support vector machines, do not easily accommodate new data and may require re-training of
the entire model to incorporate new information. This can be computationally expensive and
time-consuming.

Bayesian classification is a powerful tool for data mining and machine learning and is widely
used in many applications, such as spam filtering, text classification, and medical diagnosis.
Its ability to incorporate prior knowledge and uncertainty makes it well-suited for real-world
problems where data is incomplete or noisy and accurate predictions are critical.

Bayes’ Theorem in Data Mining

Bayes' theorem is used in Bayesian classification in data mining, which is a technique for
predicting the class label of a new instance based on the probabilities of different class labels
and the observed features of the instance. In data mining, Bayes' theorem is used to compute
the probability of a hypothesis (such as a class label or a pattern in the data) given some
observed event (such as a set of features or attributes). It is named after Reverend Thomas
Bayes, an 18th-century British mathematician who first formulated it.

Bayes' theorem states that the probability of a hypothesis H given some observed event E is
proportional to the likelihood of the evidence given the hypothesis, multiplied by the prior
probability of the hypothesis, as shown below -

P(H∣E)=P(E∣H)P(H)P(E)P(H∣E)=P(E)P(E∣H)P(H)

where P(H∣E)P(H∣E) is the posterior probability of the hypothesis given the event
E, P(E∣H)P(E∣H) is the likelihood or conditional probability of the event given the
hypothesis, P(H)P(H) is the prior probability of the hypothesis, and P(E)P(E) is the
probability of the event.

What is Prior Probability?

Prior probability is a term used in probability theory and statistics that refers to the
probability of a hypothesis or event before any event or data is considered. It represents our
prior belief or expectation about the likelihood of a hypothesis or event based on previous
knowledge or assumptions.

For example, we are interested in the probability of a certain disease in a population. Our
prior probability might be based on previous studies or epidemiological data and might be
relatively low if the disease is rare. As we collect data from medical tests or patient
symptoms, we can update our probability estimate using Bayes' theorem to reflect the new
evidence.

What is Posterior Probability?

The posterior probability is a term used in Bayesian inference to refer to the updated
probability of a hypothesis, given some observed event or data. It is calculated using Bayes'
theorem, which combines the prior probability of the hypothesis with the likelihood of the
event to produce an updated or posterior probability.

The posterior probability is important in Bayesian inference because it reflects the latest
information about the hypothesis based on the observed data. It can be used to make
decisions or predictions and updated further as new data becomes available.
Formula Derivation

Bayes' theorem is derived from the definition of conditional probability. The conditional
probability of an event E given a hypothesis H is defined as the joint probability of E and H,
divided by the probability of H, as shown below -

P(E∣H)=P(E∩H)P(H)P(E∣H)=P(H)P(E∩H)

We can rearrange this equation to solve for the joint probability of E and H -

P(E∩H)=P(E∣H)∗P(H)P(E∩H)=P(E∣H)∗P(H)

Similarly, we can use the definition of conditional probability to write the conditional
probability of H given E, as shown below -

P(H∣E)=P(H∩E)P(E)P(H∣E)=P(E)P(H∩E)

Based on the commutative property of joint probability, we can write -

P(H∩E)=P(E∩H)P(H∩E)=P(E∩H)

We can substitute the expression for P(H∩E) from the first equation into the second equation
to obtain -

P(H∣E)=P(E∣H)∗P(H)P(E)P(H∣E)=P(E)P(E∣H)∗P(H)

This is the formula for Bayes' theorem for hypothesis H and event E. It states that the
probability of hypothesis H given event E is proportional to the likelihood of the event given
the hypothesis, multiplied by the prior probability of the hypothesis, and divided by the
probability of the event.

Applications of Bayes’ Theorem

Bayes' theorem or Bayesian classification in data mining has a wide range of applications in
many fields, including statistics, machine learning, artificial intelligence, natural language
processing, medical diagnosis, image and speech recognition, and more. Here are some
examples of its applications -

 Spam filtering - Bayes' theorem is commonly used in email spam filtering, where it
helps to identify emails that are likely to be spam based on the text content and
other features.
 Medical diagnosis - Bayes' theorem can be used to diagnose medical conditions
based on the observed symptoms, test results, and prior knowledge about the
prevalence and characteristics of the disease.
 Risk assessment - Bayes' theorem can be used to assess the risk of events such as
accidents, natural disasters, or financial market fluctuations based on historical data
and other relevant factors.
 Natural language processing - Bayes' theorem can be used to classify documents,
sentiment analysis, and topic modeling in natural language processing applications.
 Recommendation systems - Bayes' theorem can be used in recommendation
systems like e-commerce websites to suggest products or services to users based on
their previous behavior and preferences.
 Fraud detection - Bayes' theorem can be used to detect fraudulent behavior, such as
credit card or insurance fraud, by analyzing patterns of transactions and other data.

Examples

Problem - Suppose a medical test for a certain disease has a false positive rate of 5% and a
false negative rate of 2%. If a person has the disease, there is a 2% chance that the test will
come back negative; if a person does not, there is a 5% chance that the test will come back
positive. Suppose the disease affects 1% of the population. If a person tests positive for the
disease, what is the probability that they have the disease?

Solution - To solve this problem using Bayes' theorem, we can start by defining some events:

 D - the event that a person has the disease

 ~D - the event that a person does not have the disease
 T - the event that a person tests positive for the disease
 ~T - the event that a person tests negative for the disease

We are interested in the probability of event D given the event T, which we can write
as P(D∣T)P(D∣T). Using Bayes' theorem, we can write -

P(D∣T)=P(T∣D)∗P(D)/P(T)P(D∣T)=P(T∣D)∗P(D)/P(T)

The first term on the right-hand side of the equation is the probability of a positive test result
given that the person has the disease, which we can calculate as -

P(T∣D)=1−0.02=0.98P(T∣D)=1−0.02=0.98

(2% id FPR, which means that if a person has a disease, then there are 2% chance that the test
will come negative)

The second term is the prior probability of the person having the disease, which is given
as 1% -

P(D)=0.01P(D)=0.01 (prior probability of disease in given population)

The third term is the probability of a positive test result, which we can calculate using the law
of total probability, as shown below -

 P(T)=P(T∣D)∗P(D)+P(T∣ D)∗P( D)P(T)=P(T∣D)∗P(D)+P(T∣ D)∗P( D) (It is

the sum of the probability of both scenarios when a person tests positive and he may
or may not have the disease)
 P(T)=0.98∗0.01+0.05∗0.99=0.0593P(T)=0.98∗0.01+0.05∗0.99=0.
0593
Substituting these values into the first equation, we get -

 P(D∣T)=0.98∗0.01/0.0593=0.1652P(D∣T)=0.98∗0.01/0.0593=0.1652

So the probability that a person has the disease, given that they test positive for it, is
approximately 16.52%. This shows that even with a relatively high false positive rate, a
positive test result is not a guarantee of having the disease, and further testing or confirmation
may be necessary.

Rule Based Classification

IF-THEN Rules

Rule-based classifier makes use of a set of IF-THEN rules for classification.

We can express a rule in the following from −

IF condition THEN conclusion

Let us consider a rule R1,

R1: IF age = youth AND student = yes THEN buy_computer = yes

Points to remember −
 The IF part of the rule is called rule antecedent or precondition.
 The THEN part of the rule is called rule consequent.
 The antecedent part the condition consist of one or more attribute tests
and these tests are logically ANDed.
 The consequent part consists of class prediction.
Note − We can also write rule R1 as follows −
R1: (age = youth) ^ (student = yes))(buys computer = yes)

If the condition holds true for a given tuple, then the antecedent is
satisfied.

PauseSkip backward 5 secondsSkip forward 5 seconds

Mute

Fullscreen
Rule Extraction

Here we will learn how to build a rule-based classifier by extracting IF-

THEN rules from a decision tree.

Points to remember −

To extract a rule from a decision tree −

 One rule is created for each path from the root to the leaf node.
 To form a rule antecedent, each splitting criterion is logically ANDed.
 The leaf node holds the class prediction, forming the rule consequent.

Rule Induction Using Sequential Covering Algorithm

Sequential Covering Algorithm can be used to extract IF-THEN rules form

the training data. We do not require to generate a decision tree first. In
this algorithm, each rule for a given class covers many of the tuples of
that class.

Some of the sequential Covering Algorithms are AQ, CN2, and RIPPER. As
per the general strategy the rules are learned one at a time. For each time
rules are learned, a tuple covered by the rule is removed and the process
continues for the rest of the tuples. This is because the path to each leaf
in a decision tree corresponds to a rule.

Note − The Decision tree induction can be considered as learning a set of

rules simultaneously.

The Following is the sequential learning Algorithm where rules are learned
for one class at a time. When learning a rule from a class Ci, we want the
rule to cover all the tuples from class C only and no tuple form any other
class.

Classification Of Backpropagation

Classiﬁcation by backpropagation is a type of supervised learning

algorithm that is
used to train a neural network to classify data into different classes. The
backpropagation algorithm is based on the idea of adjusting the weights
and biases
of a network in order to minimize the error between the predicted output
and the
actual output.
The backpropagation algorithm works by taking a set of training examples
and
feeding them through the neural network. The output of the network is
compared to
the desired output, and the error is calculated using a cost function such
as mean
squared error

The error is then propagated backwards through the network, with each
neuron in
the network adjusting its weights and biases based on its contribution to
the error.
This is done using a gradient descent algorithm, where the weights and
biases are
adjusted in the direction that reduces the error.

Discuss on classiﬁcation by back propagation.

Classiﬁcation by backpropagation is a type of supervised learning
algorithm that is
used to train a neural network to classify data into different classes. The
backpropagation algorithm is based on the idea of adjusting the weights
and biases
of a network in order to minimize the error between the predicted output
and the
actual output.
The backpropagation algorithm works by taking a set of training examples
and
feeding them through the neural network. The output of the network is
compared to
the desired output, and the error is calculated using a cost function such
as mean
squared error.
The error is then propagated backwards through the network, with each
neuron in
the network adjusting its weights and biases based on its contribution to
the error.
This is done using a gradient descent algorithm, where the weights and
biases are
adjusted in the direction that reduces the error.
The backpropagation algorithm is an iterative process that continues until
the error
is minimized or until a predetermined number of iterations is reached. The
ﬁnal set
of weights and biases is then used to classify new data

Discuss on classiﬁcation by back propagation.

The backpropagation algorithm works by taking a set of training examples

and
feeding them through the neural network. The output of the network is
compared to
the desired output, and the error is calculated using a cost function such
as mean
squared error.
The error is then propagated backwards through the network, with each
neuron in
the network adjusting its weights and biases based on its contribution to
the error.
This is done using a gradient descent algorithm, where the weights and
biases are
adjusted in the direction that reduces the error.

Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different categories
that are classified using a decision boundary or hyperplane:

Types of SVM
SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear
SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM
classifier.

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in
n-dimensional space, but we need to find out the best decision boundary that helps to
classify the data points. This best boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.

How does SVM works?

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we
have a dataset that has two tags (green and blue), and the dataset has two features x1
and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either
green or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of
the lines from both the classes. These points are called support vectors. The distance
between the vectors and the hyperplane is called as margin. And the goal of SVM is to
maximize this margin. The hyperplane with maximum margin is called the optimal
hyperplane.

Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data,
we have used two dimensions x and y, so for non-linear data, we will add a third
dimension z. It can be calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:

So now, SVM will divide the datasets into classes in the following way. Consider the below

image:

Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:
Hence we get a circumference of radius 1 in case of non-linear data.

How to calculate the distance from a point to a line?

 In our case, w1x1+w2x2+b=0,

 thus, w=(w1,w2), x=(x1,x2)

🐢 Lazy Learners
Lazy learners are a type of learning algorithm that delays the generalization process until a query is
made. In simple terms, they do not build a model during training, but instead store the training
data and use it at prediction time.

Key Characteristics of Lazy Learners:

 No model is built in advance

 High query-time cost (slower prediction)
 Low training time
 Use instance-based learning – they compare new data with stored instances

How It Works:

Lazy learners memorize the training data. When a new instance needs to be classified, they
compare it with the stored instances (usually using a similarity or distance measure) and make a
prediction based on the closest matches.
Examples of Lazy Learners:

1. K-Nearest Neighbors (K-NN):

o Stores all training samples.
o At prediction, finds the k nearest neighbors and assigns the class based on majority
voting.
2. Case-Based Reasoning (CBR):
o Solves new problems based on the solutions of similar past problems.
3. Locally Weighted Regression (LWR):
o Performs regression based on local data near the query point.

Advantages:

 Simple to implement.
 Can adapt to changes in data easily (just add new data).
 No training phase → fast for updating.

Disadvantages:

 Slow classification time (has to search through data for each prediction).
 High memory usage (since all data is stored).
 Sensitive to irrelevant or redundant features (especially K-NN).

📌 Lazy Learner vs Eager Learner

Feature Lazy Learner Eager Learner

Model Building No Yes

Training Time Fast Slower

Prediction Time Slow Fast

Examples K-NN, CBR Decision Trees, Naive Bayes

Prediction Accuracy and Error Measures

In data mining, especially in classification and regression tasks, it’s essential to evaluate how
well a model performs. This is done using accuracy and error metrics that measure the
difference between predicted and actual values.

What is Rule-Based Classification?

Rule-based classification uses rules of the form:

IF <condition> THEN <class>

 The condition part (IF) is a conjunction of attribute tests (e.g., Age > 30 AND Income =
High).

 The class part (THEN) specifies the predicted class for instances satisfying the condition

Process of Rule-Based Classification

1. Rule Generation – Extract rules from the training data.

2. Rule Pruning – Eliminate overly specific or redundant rules.
3. Rule Ordering – Order the rules by priority or accuracy.
4. Classification – Apply rules to classify new instances.

xample

Let’s say you are classifying customers as “Buy” or “Don’t Buy” a product.

Rule 1:
IF Age > 25 AND Income = High THEN Class = Buy

Rule 2:
IF Age <= 25 AND Student = Yes THEN Class = Buy

Rule 3:
IF Income = Low THEN Class = Don’t Buy

Rule Generation Algorithms

Some popular algorithms that generate rule-based classifiers:

 RIPPER (Repeated Incremental Pruning to Produce Error Reduction)

 CN2
 PART (uses partial decision trees to generate rules)
 Decision Trees (can be converted into rules)

Advantages

 Easy to understand and interpret.

 Good for domains where explanation is important.
 Flexible – can handle both numeric and categorical data.

Disadvantages

 Might generate too many rules (overfitting).

 Rule conflicts need resolution (when multiple rules apply).
 Not always the most accurate compared to other classifiers like SVM or Random Forests.

Screenshot 2024-07-18 at 11.08.37
No ratings yet
Screenshot 2024-07-18 at 11.08.37
63 pages
statistical inference in data science_
No ratings yet
statistical inference in data science_
121 pages
4_22865_IS465_2019_1__2_1_08ClassBasic
No ratings yet
4_22865_IS465_2019_1__2_1_08ClassBasic
43 pages
Dwdm Unit IV Note
No ratings yet
Dwdm Unit IV Note
21 pages
Peter H. Westfall, Andrea L. Arias - Understanding Regression Analysis - A Conditional Distribution Approach-Chapman and Hall - CRC (2020)
No ratings yet
Peter H. Westfall, Andrea L. Arias - Understanding Regression Analysis - A Conditional Distribution Approach-Chapman and Hall - CRC (2020)
515 pages
DATA _FA 2024_dist
No ratings yet
DATA _FA 2024_dist
85 pages
CS3491 - Notes - Unit 2 - Probabilistic Reasoning
No ratings yet
CS3491 - Notes - Unit 2 - Probabilistic Reasoning
28 pages
Bayesian Statistics for Beginners: A Step-By-Step Approach Therese M Donovan - Own the ebook now with all fully detailed content
100% (1)
Bayesian Statistics for Beginners: A Step-By-Step Approach Therese M Donovan - Own the ebook now with all fully detailed content
62 pages
DM_06-Mar-2025
No ratings yet
DM_06-Mar-2025
13 pages
CH 4
No ratings yet
CH 4
21 pages
Large Scale Inverse Problems and Quantification of Uncertainty Wiley Series in Computational Statistics 1st Edition Lorenz Biegler download
No ratings yet
Large Scale Inverse Problems and Quantification of Uncertainty Wiley Series in Computational Statistics 1st Edition Lorenz Biegler download
42 pages
Unit-3 Ml Part-2 Ai&Ml r23
No ratings yet
Unit-3 Ml Part-2 Ai&Ml r23
15 pages
DWM_Module 3 (1)
No ratings yet
DWM_Module 3 (1)
22 pages
Module 3 Notes (1)
No ratings yet
Module 3 Notes (1)
31 pages
DWDM Module IV
No ratings yet
DWDM Module IV
57 pages
Data Mining Micro PGDM
No ratings yet
Data Mining Micro PGDM
40 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
AAM Unit 2 (1)
No ratings yet
AAM Unit 2 (1)
17 pages
3 Pattern Recognition 1
No ratings yet
3 Pattern Recognition 1
25 pages
bayes_R2_v3
No ratings yet
bayes_R2_v3
6 pages
Unit-III Classification
No ratings yet
Unit-III Classification
10 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
UNIT-IV
No ratings yet
UNIT-IV
34 pages
2023 Motor Competence Assessment in Physical Education
No ratings yet
2023 Motor Competence Assessment in Physical Education
16 pages
DWM UNIT-V NOTES
No ratings yet
DWM UNIT-V NOTES
15 pages
updated dm unit 3
No ratings yet
updated dm unit 3
28 pages
GRADE V - NOUNS (1)
No ratings yet
GRADE V - NOUNS (1)
6 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Agricultural Drought Periods Analysis by Using Nonhomogeneous Poisson Models and Regionalization of Appropriate Model Parameters
No ratings yet
Agricultural Drought Periods Analysis by Using Nonhomogeneous Poisson Models and Regionalization of Appropriate Model Parameters
17 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
Unit 2
No ratings yet
Unit 2
17 pages
DW&DM(Unit -4)
No ratings yet
DW&DM(Unit -4)
9 pages
UNIT 3-Bayesian Statistics
No ratings yet
UNIT 3-Bayesian Statistics
80 pages
Minimax
No ratings yet
Minimax
26 pages
Unit 3
No ratings yet
Unit 3
16 pages
4 Classification
No ratings yet
4 Classification
20 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
Chapter20 4e
No ratings yet
Chapter20 4e
36 pages
DWDM Unit 3 Part 2
No ratings yet
DWDM Unit 3 Part 2
8 pages
Integration of Nonprobability and Probability Samples Via Survey Weights
No ratings yet
Integration of Nonprobability and Probability Samples Via Survey Weights
17 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
FINA 4250 Applications of Risk Models
No ratings yet
FINA 4250 Applications of Risk Models
67 pages
Unit 4
No ratings yet
Unit 4
20 pages
Notes On Econometrics I: Grace Mccormack
No ratings yet
Notes On Econometrics I: Grace Mccormack
50 pages
Aiml 5 Units Notes
No ratings yet
Aiml 5 Units Notes
134 pages
ASTMA
No ratings yet
ASTMA
9 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
Bayes Theorem
100% (2)
Bayes Theorem
11 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
Miller Expert Report Re: Ziyad Yaghi Case No. 5:09-cr-00216-FL
No ratings yet
Miller Expert Report Re: Ziyad Yaghi Case No. 5:09-cr-00216-FL
10 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
WT ASP - Net Unit IV Notes
No ratings yet
WT ASP - Net Unit IV Notes
23 pages
Classification
No ratings yet
Classification
33 pages
Bayesian Methods For Social and Political Measurement: Simon Jackman Stanford University Jackman@stanford - Edu
No ratings yet
Bayesian Methods For Social and Political Measurement: Simon Jackman Stanford University Jackman@stanford - Edu
37 pages
CS8082-Machine Learning Techniques
No ratings yet
CS8082-Machine Learning Techniques
13 pages
Machine-Learning-Based Spectrum Sensing Enhancement For Software-Defined Radio Applications
No ratings yet
Machine-Learning-Based Spectrum Sensing Enhancement For Software-Defined Radio Applications
6 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
A Primer of Ecological Statistics, 2nd Edition pdf epub
100% (14)
A Primer of Ecological Statistics, 2nd Edition pdf epub
16 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Theory of Cognitive Pattern Recognition: Youguo Pi, Wenzhi Liao, Mingyou Liu and Jianping Lu
No ratings yet
Theory of Cognitive Pattern Recognition: Youguo Pi, Wenzhi Liao, Mingyou Liu and Jianping Lu
31 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Web Technologies Notes Unit 3
No ratings yet
Web Technologies Notes Unit 3
18 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Analysis of Classification Algorithm in Data Mining
No ratings yet
Analysis of Classification Algorithm in Data Mining
3 pages
CEES 5020 Applied Probability - Syllabus Preview
No ratings yet
CEES 5020 Applied Probability - Syllabus Preview
4 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
Module 5: Data Mining Algorithms: Classification
No ratings yet
Module 5: Data Mining Algorithms: Classification
34 pages
Post Op Weka Data Set Sample PDF
No ratings yet
Post Op Weka Data Set Sample PDF
8 pages
87-Article Text-320-1-10-20200229
No ratings yet
87-Article Text-320-1-10-20200229
6 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
A Parametric Approach To Nonparametric Statistics: Mayer Alvo Philip L. H. Yu
100% (4)
A Parametric Approach To Nonparametric Statistics: Mayer Alvo Philip L. H. Yu
277 pages
CH 15 Test
No ratings yet
CH 15 Test
21 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
On Intelligence From First Principles Guidelines For Inquiry Into The Hypothesis of Physical Intelligence PI
No ratings yet
On Intelligence From First Principles Guidelines For Inquiry Into The Hypothesis of Physical Intelligence PI
31 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
Predicting The Missing Value by Bayesian Classification: Abstract
No ratings yet
Predicting The Missing Value by Bayesian Classification: Abstract
5 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
95 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
0412055511MarkovChain
100% (5)
0412055511MarkovChain
508 pages
Question Bank - REINFORCEMENT LEARNING
75% (4)
Question Bank - REINFORCEMENT LEARNING
2 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

UNIT 5 NOTES DWM

Uploaded by

UNIT 5 NOTES DWM

Uploaded by

Classification and Predication

We use classification and prediction to extract a model,

Decision Tree Induction

The benefits of having a decision tree are as follows –

 It does not require any domain knowledge.

Decision Tree Induction Algorithm

A machine researcher named J. Ross Quinlan in 1980 developed a decision

Generating a decision tree form training tuples of data

if tuples in D are all of the same class, C then

if attribute_list is empty then

apply attribute_selection_method(D, attribute_list)

if splitting_attribute is discrete-valued and

attribute_list = splitting attribute; // remove splitting

// partition the tuples and grow subtrees for each

Introduction to Bayesian Classification in Data Mining

Bayes’ Theorem in Data Mining

What is Prior Probability?

What is Posterior Probability?

Based on the commutative property of joint probability, we can write -

Applications of Bayes’ Theorem

 D - the event that a person has the disease

P(D)=0.01P(D)=0.01 (prior probability of disease in given population)

 P(T)=P(T∣D)∗P(D)+P(T∣ D)∗P( D)P(T)=P(T∣D)∗P(D)+P(T∣ D)∗P( D) (It is

Rule Based Classification

Rule-based classifier makes use of a set of IF-THEN rules for classification.

IF condition THEN conclusion

Let us consider a rule R1,

R1: IF age = youth AND student = yes THEN buy_computer = yes

PauseSkip backward 5 secondsSkip forward 5 seconds

Here we will learn how to build a rule-based classifier by extracting IF-

To extract a rule from a decision tree −

Rule Induction Using Sequential Covering Algorithm

Sequential Covering Algorithm can be used to extract IF-THEN rules form

Note − The Decision tree induction can be considered as learning a set of

Classiﬁcation by backpropagation is a type of supervised learning

Discuss on classiﬁcation by back propagation.

Discuss on classiﬁcation by back propagation.

The backpropagation algorithm works by taking a set of training examples

Support Vector Machine Algorithm

Hyperplane and Support Vectors in the SVM algorithm:

How does SVM works?

How to calculate the distance from a point to a line?

 In our case, w1*x1+w2*x2+b=0,

Key Characteristics of Lazy Learners:

 No model is built in advance

1. K-Nearest Neighbors (K-NN):

📌 Lazy Learner vs Eager Learner

Model Building No Yes

Training Time Fast Slower

Prediction Time Slow Fast

Examples K-NN, CBR Decision Trees, Naive Bayes

Prediction Accuracy and Error Measures

What is Rule-Based Classification?

Rule-based classification uses rules of the form:

IF <condition> THEN <class>

Process of Rule-Based Classification

1. Rule Generation – Extract rules from the training data.

Rule Generation Algorithms

Some popular algorithms that generate rule-based classifiers:

 RIPPER (Repeated Incremental Pruning to Produce Error Reduction)

 Easy to understand and interpret.

 Might generate too many rules (overfitting).

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

 In our case, w1x1+w2x2+b=0,