0% found this document useful (0 votes)
16 views34 pages

Unit 3 DM

The document discusses classification and prediction in data mining, explaining their roles in extracting models to predict data trends. It details the processes of classification, including model creation and application, as well as the challenges of data preparation such as cleaning and normalization. Additionally, it covers decision trees and backpropagation as methods for classification, highlighting their advantages and issues.

Uploaded by

shalinisakthi42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views34 pages

Unit 3 DM

The document discusses classification and prediction in data mining, explaining their roles in extracting models to predict data trends. It details the processes of classification, including model creation and application, as well as the challenges of data preparation such as cleaning and normalization. Additionally, it covers decision trees and backpropagation as methods for classification, highlighting their advantages and issues.

Uploaded by

shalinisakthi42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

CLASSIFICATION –UNIT 3

• Classification and Predication in Data Mining

• Classification
• Prediction
• We use classification and prediction to extract a model,
representing the data classes to predict future data
trends. Classification predicts the categorical labels of
data with the prediction models. Classification models
predict categorical class labels, and prediction models
predict continuous-valued functions
• What is Classification?
• Classification is to identify the category or the class label of a
new observation. First, a set of data is used as training data.
• The set of input data and the corresponding outputs are given
to the algorithm. So, the training data set includes the input
data and their associated class labels.
• Using the training dataset, the algorithm derives a model or
the classifier.
• How does Classification Works?
• The functioning of classification with the assistance of the
bank loan application has been mentioned above.
• There are two stages in the data classification system:
classifier or model creation and classification classifier.
• Developing the Classifier or model creation: This level is the
learning stage or the learning process. The classification algorithms
construct the classifier in this stage.
• A classifier is constructed from a training set composed of the
records of databases and their corresponding class names.
• Each category that makes up the training set is referred to as a
category or class. We may also refer to these records as samples,
objects, or data points.
• Applying classifier for classification:
• The classifier is used for classification at this level. The test data are
used here to estimate the accuracy of the classification algorithm. If
the consistency is deemed sufficient, the classification rules can be
expanded to cover new data records. It includes:
– Sentiment Analysis:
– Sentiment analysis is highly helpful in social media monitoring. We can use it to
extract social media insights.
– The accurate trained models provide consistently accurate outcomes and result
in a fraction of the time.
– Document Classification:
– We can use document classification to organize the documents into sections
according to the content.
– Document classification refers to text classification; we can classify the words in
the entire document. And with the help of machine learning classification
algorithms, we can execute it automatically.
– Image Classification: Image classification is used for the trained categories of an
image. These could be the caption of the image, a statistical value, a theme
– Machine Learning Classification: It uses the statistically demonstrable algorithm
rules to execute analytical tasks that would take humans hundreds of more
hours to perform.
What is Prediction ?

• What is Prediction?
• Another process of data analysis is prediction. It is used to
find a numerical output. Same as in classification, the training
dataset contains the inputs and corresponding numerical
output values.
• The algorithm derives the model or a predictor according to
the training dataset.
• The model should find a numerical output when the new data
is given. Unlike in classification, this method does not have a
class label. The model predicts a continuous-valued function
or ordered value.
Classification and Prediction Issues

• The major issue is preparing the data for Classification and


Prediction. Preparing the data involves the following activities,
such as:
• Data Cleaning: Data cleaning involves removing the noise and treatment
of missing values.
• The noise is removed by applying smoothing techniques, and the problem
of missing values is solved by replacing a missing value with the most
commonly occurring value for that attribute.
• Relevance Analysis: The database may also have irrelevant attributes.
Correlation analysis is used to know whether any two given attributes are
related.
• Data Transformation and reduction: The data can be transformed by any
of the following methods.
– Normalization: The data is transformed using normalization.
Normalization involves scaling all values for a given attribute to make
them fall within a small specified range. Normalization is used when
the neural networks or the methods involving measurements are used
in the learning step.
– Generalization: The data can also be transformed by generalizing it to
the higher concept. For this purpose, we can use the concept
hierarchies.
Difference between Classification and Prediction

Classification Prediction
ption is that the new data comes from a distribution similar to the data we used to construct our decision tree. In many instances, t

Classification is the process of identifying


which category a new observation belongs Predication is the process of identifying
to based on a training data set containing the missing or unavailable numerical data
observations whose category membership for a new observation.
is known.

In prediction, the accuracy depends on


In classification, the accuracy depends on how well a given predictor can guess the
finding the class label correctly. value of a predicated attribute for new
data.

In classification, the model can be known In prediction, the model can be known as
as the classifier. the predictor.

A model or a predictor will be constructed


A model or the classifier is constructed to
that predicts a continuous-valued function
find the categorical labels.
or ordered value.

For example, the grouping of patients For example, We can think of prediction
based on their medical records can be as predicting the correct treatment for a
considered a classification. particular disease for a person.
What are the various Issues regarding Classification and Prediction in data mining?

• Data cleaning −
• This defines the pre-processing of data to eliminate or reduce noise by using smoothing
methods and the operation of missing values (e.g., by restoring a missing value with the
most generally appearing value for that attribute, or with the best probable value
established on statistics).
• Relevance analysis − There are various attributes in the data that can be irrelevant to the
classification or prediction task. For instance, data recording the day of the week on
which a bank loan software was filled is improbable to be relevant to the success of the
software. Moreover, some different attributes can be redundant.
• Therefore, relevance analysis can be implemented on the data to delete some irrelevant
or redundant attributes from the learning procedure. In machine learning, this step is
referred to as feature selection. It contains such attributes that can otherwise slow
down, and likely mislead the learning step.
• Normalization includes scaling all values for a given attribute so that they decline inside a
small specified area, including -1.0 to 1.0, or 0 to 1.0. In these approaches that apply
distance measurements, for instance, this can avoid attributes with originally high ranges
(such as, income) from
• Data transformation − The data can be generalized to a larger-level
approach. Concept hierarchies can be used for these goals. This is
especially helpful for continuous-valued attributes. For instance,
mathematical values for the attribute income can be generalized to the
discrete field including low, medium, and high.
• Likewise, nominal-valued attributes, such as the street, can be generalized
to larger-level concepts, such as the city.
• Normalization includes scaling all values for a given attribute so that they
decline inside a small specified area, including -1.0 to 1.0, or 0 to 1.0
CLASSIFICATION BY DECISION TREE INDUCTION

A decision tree is a structure that includes a root node, branches,


and leaf nodes.
Each internal node denotes a test on an attribute, each branch
denotes the outcome of a test, and each leaf node holds a class
label.
The topmost node in the tree is the root node.

The following decision tree is for the concept buy_computer that


indicates whether a customer at a company is likely to buy a
computer or not.

Each internal node represents a test on an attribute. Each leaf


node represents a class.
• The benefits of having a decision tree are as
follows −
• It does not require any domain knowledge.
• It is easy to comprehend.
• The learning and classification steps of a
decision tree are simple and fast.
• Tree Pruning
• Tree pruning is performed in order to remove anomalies in the training
data due to noise or outliers. The pruned trees are smaller and less
complex.
• Tree Pruning Approaches
• There are two approaches to prune a tree −
• Pre-pruning − The tree is pruned by halting its construction early.
• Post-pruning - This approach removes a sub-tree from a fully grown tree.
• Decision Tree Induction
• Decision Tree is a supervised learning method used in data mining
for classification and regression methods. It is a tree that helps us in
decision-making purposes.
• The decision tree creates classification or regression models as a
tree structure. It separates a data set into smaller subsets, and at
the same time, the decision tree is steadily developed.
• The final tree is a tree with the decision nodes and leaf nodes. A
decision node has at least two branches. The leaf nodes show a
classification or decision..
• Why are decision trees useful?
• It enables us to analyze the possible consequences of a decision
thoroughly.
• It provides us a framework to measure the values of outcomes and the
probability of accomplishing them.
• It helps us to make the best decisions based on existing data and best
speculations.
• Decision tree Algorithm:
• The decision tree algorithm may appear long, but it is quite simply the
basis algorithm techniques is as follows:
• The algorithm is based on three parameters: D, attribute_list, and
Attribute _selection_method.
• Generally, we refer to D as a data partition.
• Initially, D is the entire set of training tuples and their related class
levels (input training data).
• The parameter attribute_list is a set of attributes defining the tuples.
• Attribute_selection_method specifies a heuristic process for choosing the
attribute that "best" discriminates the given tuples according to class.
• Attribute_selection_method process applies an attribute selection
measure.
• Advantages of using decision trees:
• A decision tree does not need scaling of information.
• Missing values in data also do not influence the process of building a
choice tree to any considerable extent.
• A decision tree model is automatic and simple to explain to the technical
team as well as stakeholders.
CLASSIFICATION BY BACK PROPOGATION

• Backpropagation in Data Mining


• Last Updated : 05 Jan, 2023
• Backpropagation is an algorithm that backpropagates the errors from the
output nodes to the input nodes. Therefore, it is simply referred to as the
backward propagation of errors. It uses in the vast applications of neural
networks in data mining like Character recognition, Signature verification, etc.
• Neural Network:
• Neural networks are an information processing paradigm inspired by the
human nervous system. Just like in the human nervous system, we have
biological neurons in the same way in neural networks we have artificial
neurons, artificial neurons are mathematical functions derived from biological
neurons. The human brain is estimated to have about 10 billion neurons, each
connected to an average of 10,000 other neurons. Each neuron receives a
signal through a synapse, which controls the effect of the signconcerning on
the neuron.
Backpropagation:
Backpropagation is a widely used algorithm for
training feedforward neural networks. It
computes the gradient of the loss function with
respect to the network weights. It is very
efficient, rather than naively directly computing
the gradient concerning each weight. This
efficiency makes it possible to use gradient
methods to train multi-layer networks and
update weights to minimize loss; variants such as
gradient descent or stochastic gradient descent
are often used.
Features of Backpropagation:
it is the gradient descent method as used in the case of
simple perceptron network with the differentiable unit.
it is different from other networks in respect to the
process by which the weights are calculated during the
learning period of the network.
training is done in the three stages :
the feed-forward of input training pattern
the calculation and backpropagation of the error
updation of the weight
• Working of Backpropagation:
• Neural networks use supervised learning to
generate output vectors from input vectors
that the network operates on. It Compares
generated output to the desired output and
generates an error report if the result does
not match the generated output vector. Then
it adjusts the weights according to the bug
report to get your desired output.
• Backpropagation Algorithm:
• Step 1: Inputs X, arrive through the preconnected path.
• Step 2: The input is modeled using true weights W. Weights are
usually chosen randomly.
• Step 3: Calculate the output of each neuron from the input
layer to the hidden layer to the output layer.
• Step 4: Calculate the error in the outputs
• Backpropagation Error= Actual Output – Desired OutputStep
5: From the output layer, go back to the hidden layer to adjust
the weights to reduce the error.
• Step 6: Repeat the process until the desired output is achieved.
• Need for Backpropagation:
• Backpropagation is “backpropagation of errors” and is very useful for training
neural networks. It’s fast, easy to implement, and simple. Backpropagation does
not require any parameters to be set, except the number of inputs.
Backpropagation is a flexible method because no prior knowledge of the network
is required.
• Types of Backpropagation
• There are two types of backpropagation networks.
• Static backpropagation: Static backpropagation is a network designed to map
static inputs for static outputs. These types of networks are capable of solving
static classification problems such as OCR (Optical Character Recognition).
• Recurrent backpropagation: Recursive backpropagation is another network used
for fixed-point learning. Activation in recurrent backpropagation is feed-forward
until a fixed value is reached. Static backpropagation provides an instant
mapping, while recurrent backpropagation does not provide an instant mapping.
• Advantages:
• It is simple, fast, and easy to program.
• Only numbers of the input are tuned, not any other parameter.
• It is Flexible and efficient.
• No need for users to learn any special functions.
• Disadvantages:
• It is sensitive to noisy data and irregularities. Noisy data can
lead to inaccurate results.
• Performance is highly dependent on input data.
• Spending too much time training.
• The matrix-based approach is preferred over a mini-batch

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy