Lec.7.intro.D.S. Fall 2023
Lec.7.intro.D.S. Fall 2023
Introduction
to
Classification
2
•Classification is (supervised learning): a
form of data analysis that extracts models escribing
important data classes.
Classification
Training
Algorithms
Data
Classifier
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
T om A ssistant P rof 2 no Tenured?
M erlisa A ssociate P rof 7 no
G eorge P rofessor 5 yes
Joseph A ssistant P rof 7 yes
8
Classification
- The primary task performed by classifiers is to assign
labels to objects.
- Labels in classifiers are pre-determined unlike in
clustering where we discover the structure and assign
labels.
- Classifier problems are supervised learning methods.
9
Classification Basic concepts:
Decision Trees
10
Decision Trees
• Decision Trees are a flexible method very commonly deployed in
classification applications.
simply a classification.
12
Root &C
parent
node
childe
branches
13
Trees
Gender
Female Male
Branch – outcome of test
18
• Branches refer to the outcome of a decision .When
the decision is numerical, the “greater than” branch
is usually shown on the right and “less than” on the
left.
20
Advantages of Decision Trees
• Easy to understand.
• Map nicely to a set of production rules.
• Applied to real problems.
• Able to process both numerical and
categorical data.
Disadvantages of Decision Trees
• Output attribute must be categorical.
• Limited to one output attribute.
• Decision tree algorithms are unstable( slight
variations in the training set can results in
different attribute selections).
• Trees created from numeric datasets can be
complex as attribute splits for numeric data are
typically binary)
From Trees to rules
Decision trees can be nicely mapped to a set of production rules ─
one advantage of DTs
25
From Trees to rules (please write the suitable rules
for the next tree and deliver printed copy for these
rules Next Lecture)
IF Temperature is not between -10 and
60 THEN Survival Difficult
Whether water is
present or not?