0% found this document useful (0 votes)

16 views34 pages

Unit 3 DM

The document discusses classification and prediction in data mining, explaining their roles in extracting models to predict data trends. It details the processes of classification, including model creation and application, as well as the challenges of data preparation such as cleaning and normalization. Additionally, it covers decision trees and backpropagation as methods for classification, highlighting their advantages and issues.

Uploaded by

shalinisakthi42

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views34 pages

Unit 3 DM

Uploaded by

shalinisakthi42

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

CLASSIFICATION –UNIT 3

• Classification and Predication in Data Mining

• Classification
• Prediction
• We use classification and prediction to extract a model,
representing the data classes to predict future data
trends. Classification predicts the categorical labels of
data with the prediction models. Classification models
predict categorical class labels, and prediction models
predict continuous-valued functions
• What is Classification?
• Classification is to identify the category or the class label of a
new observation. First, a set of data is used as training data.
• The set of input data and the corresponding outputs are given
to the algorithm. So, the training data set includes the input
data and their associated class labels.
• Using the training dataset, the algorithm derives a model or
the classifier.
• How does Classification Works?
• The functioning of classification with the assistance of the
bank loan application has been mentioned above.
• There are two stages in the data classification system:
classifier or model creation and classification classifier.
• Developing the Classifier or model creation: This level is the
learning stage or the learning process. The classification algorithms
construct the classifier in this stage.
• A classifier is constructed from a training set composed of the
records of databases and their corresponding class names.
• Each category that makes up the training set is referred to as a
category or class. We may also refer to these records as samples,
objects, or data points.
• Applying classifier for classification:
• The classifier is used for classification at this level. The test data are
used here to estimate the accuracy of the classification algorithm. If
the consistency is deemed sufficient, the classification rules can be
expanded to cover new data records. It includes:
– Sentiment Analysis:
– Sentiment analysis is highly helpful in social media monitoring. We can use it to
extract social media insights.
– The accurate trained models provide consistently accurate outcomes and result
in a fraction of the time.
– Document Classification:
– We can use document classification to organize the documents into sections
according to the content.
– Document classification refers to text classification; we can classify the words in
the entire document. And with the help of machine learning classification
algorithms, we can execute it automatically.
– Image Classification: Image classification is used for the trained categories of an
image. These could be the caption of the image, a statistical value, a theme
– Machine Learning Classification: It uses the statistically demonstrable algorithm
rules to execute analytical tasks that would take humans hundreds of more
hours to perform.
What is Prediction ?

• What is Prediction?
• Another process of data analysis is prediction. It is used to
find a numerical output. Same as in classification, the training
dataset contains the inputs and corresponding numerical
output values.
• The algorithm derives the model or a predictor according to
the training dataset.
• The model should find a numerical output when the new data
is given. Unlike in classification, this method does not have a
class label. The model predicts a continuous-valued function
or ordered value.
Classification and Prediction Issues

• The major issue is preparing the data for Classification and

Prediction. Preparing the data involves the following activities,
such as:
• Data Cleaning: Data cleaning involves removing the noise and treatment
of missing values.
• The noise is removed by applying smoothing techniques, and the problem
of missing values is solved by replacing a missing value with the most
commonly occurring value for that attribute.
• Relevance Analysis: The database may also have irrelevant attributes.
Correlation analysis is used to know whether any two given attributes are
related.
• Data Transformation and reduction: The data can be transformed by any
of the following methods.
– Normalization: The data is transformed using normalization.
Normalization involves scaling all values for a given attribute to make
them fall within a small specified range. Normalization is used when
the neural networks or the methods involving measurements are used
in the learning step.
– Generalization: The data can also be transformed by generalizing it to
the higher concept. For this purpose, we can use the concept
hierarchies.
Difference between Classification and Prediction

Classification Prediction
ption is that the new data comes from a distribution similar to the data we used to construct our decision tree. In many instances, t

Classification is the process of identifying

which category a new observation belongs Predication is the process of identifying
to based on a training data set containing the missing or unavailable numerical data
observations whose category membership for a new observation.
is known.

In prediction, the accuracy depends on

In classification, the accuracy depends on how well a given predictor can guess the
finding the class label correctly. value of a predicated attribute for new
data.

In classification, the model can be known In prediction, the model can be known as
as the classifier. the predictor.

A model or a predictor will be constructed

A model or the classifier is constructed to
that predicts a continuous-valued function
find the categorical labels.
or ordered value.

For example, the grouping of patients For example, We can think of prediction
based on their medical records can be as predicting the correct treatment for a
considered a classification. particular disease for a person.
What are the various Issues regarding Classification and Prediction in data mining?

• Data cleaning −
• This defines the pre-processing of data to eliminate or reduce noise by using smoothing
methods and the operation of missing values (e.g., by restoring a missing value with the
most generally appearing value for that attribute, or with the best probable value
established on statistics).
• Relevance analysis − There are various attributes in the data that can be irrelevant to the
classification or prediction task. For instance, data recording the day of the week on
which a bank loan software was filled is improbable to be relevant to the success of the
software. Moreover, some different attributes can be redundant.
• Therefore, relevance analysis can be implemented on the data to delete some irrelevant
or redundant attributes from the learning procedure. In machine learning, this step is
referred to as feature selection. It contains such attributes that can otherwise slow
down, and likely mislead the learning step.
• Normalization includes scaling all values for a given attribute so that they decline inside a
small specified area, including -1.0 to 1.0, or 0 to 1.0. In these approaches that apply
distance measurements, for instance, this can avoid attributes with originally high ranges
(such as, income) from
• Data transformation − The data can be generalized to a larger-level
approach. Concept hierarchies can be used for these goals. This is
especially helpful for continuous-valued attributes. For instance,
mathematical values for the attribute income can be generalized to the
discrete field including low, medium, and high.
• Likewise, nominal-valued attributes, such as the street, can be generalized
to larger-level concepts, such as the city.
• Normalization includes scaling all values for a given attribute so that they
decline inside a small specified area, including -1.0 to 1.0, or 0 to 1.0
CLASSIFICATION BY DECISION TREE INDUCTION

A decision tree is a structure that includes a root node, branches,

and leaf nodes.
Each internal node denotes a test on an attribute, each branch
denotes the outcome of a test, and each leaf node holds a class
label.
The topmost node in the tree is the root node.

The following decision tree is for the concept buy_computer that

indicates whether a customer at a company is likely to buy a
computer or not.

Each internal node represents a test on an attribute. Each leaf

node represents a class.
• The benefits of having a decision tree are as
follows −
• It does not require any domain knowledge.
• It is easy to comprehend.
• The learning and classification steps of a
decision tree are simple and fast.
• Tree Pruning
• Tree pruning is performed in order to remove anomalies in the training
data due to noise or outliers. The pruned trees are smaller and less
complex.
• Tree Pruning Approaches
• There are two approaches to prune a tree −
• Pre-pruning − The tree is pruned by halting its construction early.
• Post-pruning - This approach removes a sub-tree from a fully grown tree.
• Decision Tree Induction
• Decision Tree is a supervised learning method used in data mining
for classification and regression methods. It is a tree that helps us in
decision-making purposes.
• The decision tree creates classification or regression models as a
tree structure. It separates a data set into smaller subsets, and at
the same time, the decision tree is steadily developed.
• The final tree is a tree with the decision nodes and leaf nodes. A
decision node has at least two branches. The leaf nodes show a
classification or decision..
• Why are decision trees useful?
• It enables us to analyze the possible consequences of a decision
thoroughly.
• It provides us a framework to measure the values of outcomes and the
probability of accomplishing them.
• It helps us to make the best decisions based on existing data and best
speculations.
• Decision tree Algorithm:
• The decision tree algorithm may appear long, but it is quite simply the
basis algorithm techniques is as follows:
• The algorithm is based on three parameters: D, attribute_list, and
Attribute _selection_method.
• Generally, we refer to D as a data partition.
• Initially, D is the entire set of training tuples and their related class
levels (input training data).
• The parameter attribute_list is a set of attributes defining the tuples.
• Attribute_selection_method specifies a heuristic process for choosing the
attribute that "best" discriminates the given tuples according to class.
• Attribute_selection_method process applies an attribute selection
measure.
• Advantages of using decision trees:
• A decision tree does not need scaling of information.
• Missing values in data also do not influence the process of building a
choice tree to any considerable extent.
• A decision tree model is automatic and simple to explain to the technical
team as well as stakeholders.
CLASSIFICATION BY BACK PROPOGATION

• Backpropagation in Data Mining

• Last Updated : 05 Jan, 2023
• Backpropagation is an algorithm that backpropagates the errors from the
output nodes to the input nodes. Therefore, it is simply referred to as the
backward propagation of errors. It uses in the vast applications of neural
networks in data mining like Character recognition, Signature verification, etc.
• Neural Network:
• Neural networks are an information processing paradigm inspired by the
human nervous system. Just like in the human nervous system, we have
biological neurons in the same way in neural networks we have artificial
neurons, artificial neurons are mathematical functions derived from biological
neurons. The human brain is estimated to have about 10 billion neurons, each
connected to an average of 10,000 other neurons. Each neuron receives a
signal through a synapse, which controls the effect of the signconcerning on
the neuron.
Backpropagation:
Backpropagation is a widely used algorithm for
training feedforward neural networks. It
computes the gradient of the loss function with
respect to the network weights. It is very
efficient, rather than naively directly computing
the gradient concerning each weight. This
efficiency makes it possible to use gradient
methods to train multi-layer networks and
update weights to minimize loss; variants such as
gradient descent or stochastic gradient descent
are often used.
Features of Backpropagation:
it is the gradient descent method as used in the case of
simple perceptron network with the differentiable unit.
it is different from other networks in respect to the
process by which the weights are calculated during the
learning period of the network.
training is done in the three stages :
the feed-forward of input training pattern
the calculation and backpropagation of the error
updation of the weight
• Working of Backpropagation:
• Neural networks use supervised learning to
generate output vectors from input vectors
that the network operates on. It Compares
generated output to the desired output and
generates an error report if the result does
not match the generated output vector. Then
it adjusts the weights according to the bug
report to get your desired output.
• Backpropagation Algorithm:
• Step 1: Inputs X, arrive through the preconnected path.
• Step 2: The input is modeled using true weights W. Weights are
usually chosen randomly.
• Step 3: Calculate the output of each neuron from the input
layer to the hidden layer to the output layer.
• Step 4: Calculate the error in the outputs
• Backpropagation Error= Actual Output – Desired OutputStep
5: From the output layer, go back to the hidden layer to adjust
the weights to reduce the error.
• Step 6: Repeat the process until the desired output is achieved.
• Need for Backpropagation:
• Backpropagation is “backpropagation of errors” and is very useful for training
neural networks. It’s fast, easy to implement, and simple. Backpropagation does
not require any parameters to be set, except the number of inputs.
Backpropagation is a flexible method because no prior knowledge of the network
is required.
• Types of Backpropagation
• There are two types of backpropagation networks.
• Static backpropagation: Static backpropagation is a network designed to map
static inputs for static outputs. These types of networks are capable of solving
static classification problems such as OCR (Optical Character Recognition).
• Recurrent backpropagation: Recursive backpropagation is another network used
for fixed-point learning. Activation in recurrent backpropagation is feed-forward
until a fixed value is reached. Static backpropagation provides an instant
mapping, while recurrent backpropagation does not provide an instant mapping.
• Advantages:
• It is simple, fast, and easy to program.
• Only numbers of the input are tuned, not any other parameter.
• It is Flexible and efficient.
• No need for users to learn any special functions.
• Disadvantages:
• It is sensitive to noisy data and irregularities. Noisy data can
lead to inaccurate results.
• Performance is highly dependent on input data.
• Spending too much time training.
• The matrix-based approach is preferred over a mini-batch

MRO Intelligence Report PDF
No ratings yet
MRO Intelligence Report PDF
9 pages
Dental Chair Manual
100% (3)
Dental Chair Manual
170 pages
EE4308: Project 2-Autonomous Hector Navigation and Control: John Tan Victor Tay
No ratings yet
EE4308: Project 2-Autonomous Hector Navigation and Control: John Tan Victor Tay
18 pages
DM Unit 4
No ratings yet
DM Unit 4
22 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Unit-5 3161610
No ratings yet
Unit-5 3161610
92 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
New Classification11
No ratings yet
New Classification11
98 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Module 04
No ratings yet
Module 04
75 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Unit 3
No ratings yet
Unit 3
16 pages
Down 4
No ratings yet
Down 4
83 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
Data Mining Jntuh Cse R18
No ratings yet
Data Mining Jntuh Cse R18
20 pages
Updated DM Unit 3
No ratings yet
Updated DM Unit 3
28 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Unit 3
No ratings yet
Unit 3
53 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Classification and Prediction
No ratings yet
Classification and Prediction
14 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
What Is Classification? What Is Prediction?
No ratings yet
What Is Classification? What Is Prediction?
36 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
Data Mining UNIT-2 Notes
No ratings yet
Data Mining UNIT-2 Notes
91 pages
Data Mining Module 3
No ratings yet
Data Mining Module 3
27 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
Unit 4 - Classification and Prediction
No ratings yet
Unit 4 - Classification and Prediction
72 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
U4 Clasification and Prediction
No ratings yet
U4 Clasification and Prediction
15 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
Classification and Predication in Data Mining
No ratings yet
Classification and Predication in Data Mining
6 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Classification Notes
No ratings yet
Classification Notes
14 pages
Data Mining Classification Prediction
No ratings yet
Data Mining Classification Prediction
3 pages
Unit 4
No ratings yet
Unit 4
20 pages
7 Classification
100% (3)
7 Classification
63 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
CH 5
No ratings yet
CH 5
84 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
Classifiction
No ratings yet
Classifiction
42 pages
Classification
No ratings yet
Classification
23 pages
9 Data Mining - Classification & Prediction
No ratings yet
9 Data Mining - Classification & Prediction
4 pages
4 Classification
No ratings yet
4 Classification
20 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
HW4 Pete550
No ratings yet
HW4 Pete550
5 pages
History Plan Week 6and 7. Term 1
No ratings yet
History Plan Week 6and 7. Term 1
2 pages
Energy and Cost Savings Through Pumping Stations Rehabilitation. Case Study in Bucharest
No ratings yet
Energy and Cost Savings Through Pumping Stations Rehabilitation. Case Study in Bucharest
8 pages
MSC Circ 0913
No ratings yet
MSC Circ 0913
11 pages
Laporan Daftar Pengguna GoodEva SmartSafety - Batch 1
No ratings yet
Laporan Daftar Pengguna GoodEva SmartSafety - Batch 1
3 pages
(English-Vietnamese) Bạn có nhiều hơn một cuộc đời - Marc Levy - Have A Sip EP98 (DownSub.com)
No ratings yet
(English-Vietnamese) Bạn có nhiều hơn một cuộc đời - Marc Levy - Have A Sip EP98 (DownSub.com)
46 pages
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
No ratings yet
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
12 pages
Mutations
No ratings yet
Mutations
48 pages
2 5348041922055258424
No ratings yet
2 5348041922055258424
26 pages
Project Aditya)
No ratings yet
Project Aditya)
82 pages
AX Series Hanyoung Brochure
No ratings yet
AX Series Hanyoung Brochure
6 pages
Data Class Nist SP 1800 39a Preliminary Draft
No ratings yet
Data Class Nist SP 1800 39a Preliminary Draft
4 pages
REPORT Contour
100% (3)
REPORT Contour
7 pages
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
No ratings yet
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
2 pages
Opticalsmokedetector Salwicoev P
No ratings yet
Opticalsmokedetector Salwicoev P
2 pages
CCM 303 Topic 8 PPT Gender and Communication in The Media PDF
No ratings yet
CCM 303 Topic 8 PPT Gender and Communication in The Media PDF
23 pages
Revision For Mid Term Test
No ratings yet
Revision For Mid Term Test
7 pages
ACPH Formula
No ratings yet
ACPH Formula
4 pages
8.design and Analysis of A Conformal MIMO Ingestible Bolus Sensor Antenna For Wireless Capsule Endoscopy For Animal Husbandry
No ratings yet
8.design and Analysis of A Conformal MIMO Ingestible Bolus Sensor Antenna For Wireless Capsule Endoscopy For Animal Husbandry
9 pages
CdS/Graphene Photocatalysts
No ratings yet
CdS/Graphene Photocatalysts
28 pages
Exam 2018 s1 Op2 New
No ratings yet
Exam 2018 s1 Op2 New
12 pages
Vertic
No ratings yet
Vertic
4 pages
Expt. No. 2 - Basic Operational Amplifier Circuit PDF
No ratings yet
Expt. No. 2 - Basic Operational Amplifier Circuit PDF
2 pages
Lecture 1
No ratings yet
Lecture 1
20 pages
Spe 201216 Ms Minifrac
No ratings yet
Spe 201216 Ms Minifrac
12 pages
Rectus Tema
No ratings yet
Rectus Tema
486 pages
Stability Analysis and Modelling of Unde
No ratings yet
Stability Analysis and Modelling of Unde
309 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 3 DM

Uploaded by

Unit 3 DM

Uploaded by

CLASSIFICATION –UNIT 3

• Classification and Predication in Data Mining

• The major issue is preparing the data for Classification and

Classification is the process of identifying

In prediction, the accuracy depends on

A model or a predictor will be constructed

A decision tree is a structure that includes a root node, branches,

The following decision tree is for the concept buy_computer that

Each internal node represents a test on an attribute. Each leaf

• Backpropagation in Data Mining

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.