0% found this document useful (0 votes)

9 views13 pages

DM - 06 Mar 2025

The document discusses classification and prediction as data mining methods, detailing their processes, examples, and the data classification lifecycle. It outlines the steps involved in classification, including classifier model creation and application, as well as the challenges faced in data cleaning and transformation. Additionally, it introduces decision tree induction as a supervised learning method for classification, explaining its structure, benefits, and the algorithm used to generate decision trees.

Uploaded by

Smruti Somyak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views13 pages

DM - 06 Mar 2025

Uploaded by

Smruti Somyak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Classification and Prediction

Classification and prediction are two methods used to mine the data which helps to
analyze the new data and to explore more about unknown data.
Classification is the process of finding a good model that has the ability to categorize the
available data and to predict the class of unknown data.
Examples of Classification -
Abank loan officer wants to analyze the data in order to know which customer (loan
applicant) are risky or which are safe.
A marketing manager at a comparny needs to analyze a customer with a given profile,
who will buy a new computer.
Inprediction, we identify or predict the missing or unavailable data for a newobservation
based on the previous data that we have and based on the future assumptions. In
prediction. the output is a continuous value.
Example on prediction
o Suppose the marketing manager needs to predict how much a given customer will spend
during a sale at his company.
o Predicting the value of a house depending on the facts such as the number of rooms, the
total area, etc.,
Regression isgenerally used for prediction.
How does Classification Works?
There are two stages in the data classification system:

o Classifier model creation:- The classification algorithms construct the classifier in this
stage. Aclassifier is constructed from atraining set composed of the records of databases
and their corresponding class names. Each category that makes up the training set is
referredto as acategory or class. We may also refer to these records as samples, objects,
or data points.
o Application of classifier for classification:- The test data are used here to estimate the
accuracy of the classification algorithm. If the consistency is deemed sutficient, the
classification rules can be expanded to cover new data records. Example applications
are:- document classification, sentiment analysis, image classification etc.
Data Classification Process: The data classification process can
be categorized into five steps:
reate the goals of dataclassification, strategy,
data classification. workflows, and architecture of
Classify confidential details that we store.
Using marks by data labelling.
To imprOVe
Dala is protection and obedience, use effects.
complex, and a continuous method is a
classitication.
What is Data Classification Lifecycle?

The data classification lifc cycle produces an excellent struclure for controlling the tlow of data
to an enterprise. Businesses necd to account for data security and compliance at each level. With
the help of dataclassification, we can perform it at every stage, from origin to deletion. The data
life-cycle has the following stages, such as:
I. Origin: It produces sensitive data in various formats, with emails, Excel, Word, Google
documents, social media, and websites.
2. Role-based practice: Role-based security restrictions apply to all delicate data by
tagging based on in-house protection policies and agreement ruleS.
J. Storage: Here, we have the obtained data, including access controls and
4. Sharing: Data is continuallydistributed among agents, consumers, and encryption.
various devices and platforms. co-workers from
5. Archive: Here, data is eventually
archived within an industry's storage systems.
6. Publication: Through the publication of data,
and download in the form of
it can reach customers. They can then view
dashboards.
Issues regarding classification and
prediction:
1. Data Cleaning: Data cleaning involves removing the noise
values. The noise is removed and treatment of mising
by applying smoothing
missing value withtechniques,
missing values is solved by replacing a and the problem of
tvalue for that attribute.
2. Relevance
the most commonly
occurring
Analysis: The database may also have
analysis is used to know whether any two given irrelevant attributes. Correlation
3. Data
Transformation attributes
and reduction: The data can are related.
following methods. be transformed by any of the
o
Normalization: The data is
involves scaling all values for atransformed using normalization.
given attribute to make Normalization
specified range. Normalization is used when the them fall within a small
neural
involving measurements are used in the learning step. networks or the methods
Generalization: The data can also be transformed by generalizing it to the higher
concept. For this purpose, we can use the concept hierarchies.
Comparison of Classification and Prediction Methods
Here are the criteria for comparing the methods
ofClassification and Prediction, such as:
Accuracy: The accuracy of the classifier can be referred to as the ability of
to predict the class label correctly, and the the classitier
accuracy
how well a given predictor can estimate the unknown
of the predictor can be referred to as
value.
Speed: The speed of the method depends on the
using the classifier or predictor. computational cost of generating and
Robustncss: Robustness is the ability to make correct predictions or classifications, In
the context of data mining. robustness is the ability of the classifier or predictor to make
correct predictions from incoming unknown data.
Scalability: Scalability refers to an increase or decrease in the performance of the
classifier or predictor based on the given data.
Interpretability: Interpretability is how readily we can understand the reasoning behind
predictionsor classification made by the predictor or classifier.

Classification by Decision Tree Induction

Decision Tree is a supervised learning method used in data mining for classification and
regression methods. lIt is a tree that helps us in decision-making purposes.
The decision tree creates classification or regression models as a tree structure. It
separates a data set into smaller subsets, and at the same time, the decision tree is steadily
developed. The final tree is a tree with the decision nodes and leaf nodes. A decision
node has at least two branches. The leaf nodes showa classification or decision. We can't
accomplish more split on leaf nodes-The uppermost decision node in a tree that relates to
the best predictor called the root node.
Decision trees can deal with both categorical and numerical data.
The benefits of having a decision tree are as follows -
It does not require any domain knowledge.
It is easy to comprehend.
The learning and classification steps of a decision tree are simple and
an fast.
hs CART
age

young senior
middie
aged

Student? yes Credit_rating?

yes fair excellent

no yes no yes

(Decision Tree)

Tree may be binary or non-binary.

I Key factors:
1. Entropy:
and C4
Tniropy refers to acommonway to measure impurity. In the decision tree, it measures the
randomncss or impurity in data sets.
constru

Alg
par
Very random
dataset

High Entropy

Less random
dataset

Less Entropy

2. Information Gain:

Information Gain refers to the decline in entropy after

Entropy Reduction. Building a decision tree is the dataset is split. It is
highest data gain. all about discovering attributes thatalso called
return the

Entropy-E1
Information
gain(l1)-E1-E2
Size Where E1>E2

Entropy-E2

Decision Tree Induction Algorithm

A machine rescarcher namned J. Ross
Quinlan in 1980 developed a decision tree algorithm known
as iD3 (iterative Dichotomiser). ater, he presented C4.5, which was the
successor of |D3. ID3
there is no backtracking: the trees are
and C4.5 adopt a grecdy approach. In this algorithm, manner.
constructed in a top-downrecursive divide-and-conquer
data
Algorithm: ienerate adecision tree. Generate adecision tree from the training tuples of
partition D.

Input:
1) Data partition. D. which is a set of training tuples and their associated class labels:

2) Attribute list. the set of candidate attributes:

3) Attribute selection method. a procedure to determine the splitting criterion that "best"
partitions the datatuples into individual classes. This criterion consists of a splitting attribute
and. possibly.a split point or splitting subset.
Output: Adecision tree.
Method:

g Create anode N;
(step 2ftuples in D
are all of the same class, Cthen
2.(Sep3Return'N as a leaf node labeled with the class C;
4iSsep 4) If attribyte list is empty then
S(STer 5) Return Nas a leaf node labeled with the majority class in D; I/ majority voting
L.Step) ApplyAtribute selection method(D, attribute list) to find the "best" splitting criterion:

7 ($ryp ) Label node Nwith splitting criterion;

S.5top 8) If splitting attribute is discrete-valued and multiway splits allowed, then l/ not restricted
(ho binary trees
4. (6tep#Atribute list - attribute list - splitting attribute; // remove splitting attribute
to.(typ for each outcome j of splittingcriterion // partition the tuples and growsubtrees for
cach partition

| step )let D} be the set of data tuples in D satisfying outcomej;l/a partition

1). 8ep te) 1fD, is empty then

(3 Step13} Attach a leaf labeled with the majority class in D to node N;
Stent4) Else attach the node returned by Generate decision tree(Dj, attribute list) to node N:
end for

IS. Step 15) Return N;

There are three inputs to the algorithm:
The first one is dimension (D), a list of attributes, and a for sel
Sird
stra

R
HT

M
Ron fes

Hot
13
stn

s L t S-]
f
9
o1C5)
o.97(
Sovrncsth,oJ, rtnay (Saerat)-e
Rainy
ertrrj(Snr-Enray
hi

(s
x-97/

Seosrt3+, i-J, nirpylsaoa)-g

EntrpISHet)

o.9183 - x o , |3
.o289

valirss(HoniA1)

7
Cins, RS

D
conny Sol, D, DE, D,
t SD3, D7, Dl2, D I 3 ] t, o
Rein: b4, 3,DE, DIO DIY3 3t,-] L
DI43

Roin

(3t,2
finsA
P1nnis
D

Hh
Nannstn
At*nibte Ten
V o l T a : Hot,
EndnrgSsna*
Ssung lit,3-J,
Sustot, AJ, Entep1Suat)
S ielt,I-)

(Ssnr
(Syr:
S

votus(Hit
t,3- ,Ent
ot,3-J, Entrg(Smiga)
et,o-],
Enir)
'ectnsniyh)-énapyl's
b.97

2t,3-] EntpO
*, -], trarpg(stnar)
9r83
Now, se

)o. 97
Laind) : 6.o 9
AS
Gain(Ssury
Aformetn of
be
Vots f Norno
a.

for D
Nrna

Lot,3 - Norny

plo Tans

Nertma

Nort

of Re-in
3

Skcins+,-]
Attaibte

ErdnàShct ) -
( Smie ): : = o . 9 ( s 3
tdtey (ey) I.e
Seosy t*, 1-],
(s| Entrerylv)
Endneny(SRein) .)
Etrang (syot) (Sai)
(Seac)
e47- xo.b-3

F R e i n ) - a zo-17

Sxon,-], Entnarg( n e n ) - e g e
Gqain( Saain Hniditg) Entrsry
teugh, Aionaot
o.97- X1.0- x 3

AHrhute:

GesnlSRoin, wine): -97 - xo.0-xo:= 17

hain(sRain
qein(xain
nee of
Beth
l9t, s J

Monr

Deeistn Tree
bie
)
(i7)

DSM 5 Chart
93% (30)
DSM 5 Chart
2 pages
215 PDF
No ratings yet
215 PDF
7 pages
Abs Bendix
No ratings yet
Abs Bendix
72 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Updated DM Unit 3
No ratings yet
Updated DM Unit 3
28 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Module 04
No ratings yet
Module 04
75 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Classification Notes
No ratings yet
Classification Notes
14 pages
Classifiction
No ratings yet
Classifiction
42 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
Unit 3
No ratings yet
Unit 3
16 pages
Unit 4
No ratings yet
Unit 4
20 pages
7 Classification
100% (3)
7 Classification
63 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Down 4
No ratings yet
Down 4
83 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
CH 5
No ratings yet
CH 5
84 pages
Classification
No ratings yet
Classification
33 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
CS402 Mod 3
No ratings yet
CS402 Mod 3
2 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
DWDM 4
No ratings yet
DWDM 4
58 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
3 Module DWM
No ratings yet
3 Module DWM
16 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Class Basic
No ratings yet
Class Basic
75 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
4 Classification
No ratings yet
4 Classification
20 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Physics: Motion in One Direction: Instantaneous Velocity and Acceleration
No ratings yet
Physics: Motion in One Direction: Instantaneous Velocity and Acceleration
11 pages
BY:-Walabuma Lenjiso: Advisor
No ratings yet
BY:-Walabuma Lenjiso: Advisor
22 pages
Agri Surfactants Handbook - V14 - 280225 - ENGLISH
No ratings yet
Agri Surfactants Handbook - V14 - 280225 - ENGLISH
35 pages
Automatic Power Switching Mains, Solar, Inverter
No ratings yet
Automatic Power Switching Mains, Solar, Inverter
14 pages
HAI Knowledge Questionnaire
No ratings yet
HAI Knowledge Questionnaire
3 pages
Value Creation Through Mergers and Acquistion - Eicher Motors
No ratings yet
Value Creation Through Mergers and Acquistion - Eicher Motors
21 pages
Year 10 Balancing Equations - Level 2 Year 10 Balancing Equations - Level 2
No ratings yet
Year 10 Balancing Equations - Level 2 Year 10 Balancing Equations - Level 2
2 pages
VTM Introduction
No ratings yet
VTM Introduction
17 pages
Attitude Is Everything
No ratings yet
Attitude Is Everything
27 pages
The Big Arm Guide
100% (6)
The Big Arm Guide
21 pages
Government College of Engineering and Technology Jammu
No ratings yet
Government College of Engineering and Technology Jammu
20 pages
Oscar Ccoa Codes v1
No ratings yet
Oscar Ccoa Codes v1
247 pages
Control System Configuration PDF
100% (1)
Control System Configuration PDF
2 pages
Grade 8 and 9 Workbook
No ratings yet
Grade 8 and 9 Workbook
155 pages
Manual Feedback Assembly (PFW) : Valco Instruments Co. Inc
No ratings yet
Manual Feedback Assembly (PFW) : Valco Instruments Co. Inc
2 pages
Shri Chinai College of Commerce and Economics Andheri (East), Mumbai-400 069 Bachlor of Management Studies Project Report On "Marketing Strategy of Samsung" Submitted by Pinak Varu Tybms B (Sem. V
No ratings yet
Shri Chinai College of Commerce and Economics Andheri (East), Mumbai-400 069 Bachlor of Management Studies Project Report On "Marketing Strategy of Samsung" Submitted by Pinak Varu Tybms B (Sem. V
33 pages
Prediction of Compressive Strength of Concrete With Agricultural Waste and Natural Fibre 2024
No ratings yet
Prediction of Compressive Strength of Concrete With Agricultural Waste and Natural Fibre 2024
5 pages
IPR Gandhinagar Apprentice (Diploma Degree) Recruitment 2020RIJADEJAcom
No ratings yet
IPR Gandhinagar Apprentice (Diploma Degree) Recruitment 2020RIJADEJAcom
3 pages
G12 Phy Sci P2 June 2025 Marking Guidelines
No ratings yet
G12 Phy Sci P2 June 2025 Marking Guidelines
13 pages
Module 4 - Provide Valet Services To Guest
No ratings yet
Module 4 - Provide Valet Services To Guest
5 pages
Norm Referenced Interpretation
No ratings yet
Norm Referenced Interpretation
1 page
Working of A Human Ear: PHASE:-#02. Chapter: - Sound
No ratings yet
Working of A Human Ear: PHASE:-#02. Chapter: - Sound
14 pages
Abstract WCPC - The 'I Think' As Gluon
No ratings yet
Abstract WCPC - The 'I Think' As Gluon
2 pages
What Is Failure Mode Effects Analysis
No ratings yet
What Is Failure Mode Effects Analysis
6 pages
Data Engineer - Ireland
No ratings yet
Data Engineer - Ireland
3 pages
Euphoria User Manual
0% (1)
Euphoria User Manual
795 pages
LogicEditor enUS
No ratings yet
LogicEditor enUS
254 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DM - 06 Mar 2025

Uploaded by

DM - 06 Mar 2025

Uploaded by

Classification and Prediction

Classification by Decision Tree Induction

Student? yes Credit_rating?

yes fair excellent

Tree may be binary or non-binary.

Information Gain refers to the decline in entropy after

Decision Tree Induction Algorithm

2) Attribute list. the set of candidate attributes:

7 ($ryp ) Label node Nwith splitting criterion;

| step )let D} be the set of data tuples in D satisfying outcomej;l/a partition

1). 8ep te) 1fD, is empty then

IS. Step 15) Return N;

Seosrt3+, i-J, nirpylsaoa)-g

GesnlSRoin, wine): -97 - xo.0-xo:= 17

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.