100% found this document useful (1 vote)

181 views12 pages

Decision Tree

This document discusses decision trees and their use for classification and prediction. It provides the following key points: 1) Decision trees are flowchart-like structures that can be used to classify or predict outcomes. They work by splitting the data into partitions at each node based on an attribute value test. 2) Decision trees are constructed through a top-down recursive process that selects the optimal attribute to split the data on at each node. The goal is to create pure partitions that contain only one class. 3) Common algorithms for building decision trees include ID3 and C4.5. They select the splitting attribute using measures like information gain or gain ratio, which assess how well an attribute separates the classes.

Uploaded by

Umar Arshad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

181 views12 pages

Decision Tree

Uploaded by

Umar Arshad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Decision Tree

https://www.saedsayad.com/decision_tree.htm

1
1
Classification and Prediction
 Classification is the process of finding a model (or function)
that describes and distinguishes data classes or concepts.
 The model are derived based on the analysis of a set of
training data (i.e., data objects for which the class labels are
known).
 The model is used to predict the class label of objects for
which the class label is unknown.

2
Decision Tree Induction
 A decision tree is a flowchart-like tree structure, where each
node denotes a test on an attribute value, each branch
represents an outcome of the test, and tree leaves represent
classes or class distributions.
 At each node, the algorithm chooses the “best” attribute to
partition the data into individual classes.
 The construction of decision tree classifiers does not require
any domain knowledge or parameter setting, and therefore is
appropriate for exploratory knowledge discovery.
 Decision trees can easily be converted to classification rules.
 Decision trees can handle multidimensional data.

4
Decision Tree Induction: Training Dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
This >40 medium no fair yes
follows an >40 low yes fair yes
example of >40 low yes excellent no
31…40 low yes excellent yes
Quinlan’s <=30 medium no fair no
ID3 <=30 low yes fair yes
>40 medium yes fair yes
(Playing <=30 medium yes excellent yes
Tennis) 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

5
Output: A Decision Tree for “buys_computer”

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes yes

6
Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer
manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are discretized
in advance)
– Examples are partitioned recursively based on selected attributes
– Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)

7
Attribute Selection Measure for ID3: Information Gain

 Select the attribute with the highest information gain

 Let pi be the probability that an arbitrary tuple in D belongs
to class Ci
 Expected information (entropy) needed to classify a tuple
in D: m
Info(D)   pi log2 ( pi )
i1

 Information needed (after using A to split D into v

v |D |
partitions) to classify D:
InfoA (D )    I(Dj )
j

j1 | D|
 Information gained by branching on attribute A
Gain(A) Info(D) InfoA(D)
9
Attribute Selection: Information Gain

 Class P: buys_computer = “yes” 5 4

Info age (D)  I (2,3)  I (4,0)
 Class N: buys_computer = “no” 14 14
Info(D)  I (9,5)   9 log 2 ( 9 )  5 log 2 ( 5 ) 0.940  5 I (3,2)  0.694
14 14 14 14 14
age pi ni I(pi, ni) 5
I (2,3)means “age <=30” has 5 out of
<=30 2 3 0.971 14 14 samples, with 2 yes’es and 3
31…40 4 0 0 no’s. Hence
>40 3 2 0.971
age income student credit_rating buys_computer Gain(age)  Info(D)  Infoage(D)  0.246
<=30 high no fair no
<=30 high no excellent no Similarly,
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40
31…40 low
low yes
yes
excellent
excellent
no
yes Gain(income)  0.029
<=30
<=30
medium
low
no
yes
fair
fair
no
yes Gain(student)  0.151
Gain(credit_ rating)  0.048
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40February
high 18, 2016 yes fair yes 10
>40 medium no excellent no
Gain Ratio for Attribute Selection (C4.5)

• Information gain measure is biased towards attributes with

a large number of values
• C4.5 (a successor of ID3) uses gain ratio to overcome the
problem.
v | Dj |
| Dj |
SplitInfoA (D)    log2 ( )
j1 | D | | D|
– GainRatio(A) = Gain(A)/SplitInfo(A)
4 4 6 6 4 4
• Ex. SplitInfo A(D)    log 2( )  log 2 ( )  log 2 ( )  0.926
14 14 14 14 14 14
– gain_ratio(income) = 0.029/0.926 = 0.031
• The attribute with the maximum gain ratio is selected as
the splitting attribute
11
Comparing Attribute Selection Measures

• The three measures, in general, return good results but

– Information gain:
• biased towards multivalued attributes
– Gain ratio:
• tends to prefer unbalanced splits in which one
partition is much smaller than the others

12
Thank you for your attention.

Any Question?

Copy of Classification-1
No ratings yet
Copy of Classification-1
48 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
_08ClassBasic_v1
No ratings yet
_08ClassBasic_v1
46 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Slide 07 Chapter8 Classification Basic Concept
No ratings yet
Slide 07 Chapter8 Classification Basic Concept
55 pages
Classification
No ratings yet
Classification
73 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
K Means Clustering
100% (1)
K Means Clustering
13 pages
SeriesM 77ver1 1E PDF
No ratings yet
SeriesM 77ver1 1E PDF
776 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
32 pages
An Assessment of The Applications of Artificial Intelligence AI in Remote Sensing and Geographical Information System GIS
No ratings yet
An Assessment of The Applications of Artificial Intelligence AI in Remote Sensing and Geographical Information System GIS
7 pages
Slide 11 Spark ML
No ratings yet
Slide 11 Spark ML
153 pages
Unit 2 Neural Networks
No ratings yet
Unit 2 Neural Networks
52 pages
Project Report Shruti 2
No ratings yet
Project Report Shruti 2
66 pages
CH 20
No ratings yet
CH 20
46 pages
PHD Thesis Genser IVT SVT 2022
No ratings yet
PHD Thesis Genser IVT SVT 2022
173 pages
DWDM RECORD PRINT1
No ratings yet
DWDM RECORD PRINT1
100 pages
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
No ratings yet
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
5 pages
Decision Tree Classifier Project
100% (1)
Decision Tree Classifier Project
20 pages
Unit-V Deep Generative Models Part-01
No ratings yet
Unit-V Deep Generative Models Part-01
41 pages
2
0% (1)
2
36 pages
Stock Market Prediction Using Machine Learning
100% (1)
Stock Market Prediction Using Machine Learning
7 pages
KPMG Data
50% (2)
KPMG Data
3,723 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Forecasting of Stock Prices Using Multi Layer Perceptron
100% (1)
Forecasting of Stock Prices Using Multi Layer Perceptron
6 pages
Measuring Sex Differences and Similarities
No ratings yet
Measuring Sex Differences and Similarities
37 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
12th Databases
No ratings yet
12th Databases
32 pages
Lead Scoring Case Study Presentation
100% (2)
Lead Scoring Case Study Presentation
11 pages
AI QB For All 5 Units - 2 Marks
No ratings yet
AI QB For All 5 Units - 2 Marks
28 pages
CFS Based Feature Subset Selection For Software Maintainance Prediction
No ratings yet
CFS Based Feature Subset Selection For Software Maintainance Prediction
11 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
9th Database
No ratings yet
9th Database
26 pages
Week05 - Naive Bayes Tutorial - Solutions
No ratings yet
Week05 - Naive Bayes Tutorial - Solutions
23 pages
bioinformatics_28_7_991
No ratings yet
bioinformatics_28_7_991
10 pages
10th Database
No ratings yet
10th Database
24 pages
CPE412 Pattern Recognition (Week 8)
100% (1)
CPE412 Pattern Recognition (Week 8)
25 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
No ratings yet
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
33 pages
Big Data Analytics Tutorial
100% (15)
Big Data Analytics Tutorial
101 pages
Poly
100% (1)
Poly
108 pages
computer_vision_questions.docx
No ratings yet
computer_vision_questions.docx
16 pages
Financial Fraud Detection in Healthcare Using Machine and Deep Learning
No ratings yet
Financial Fraud Detection in Healthcare Using Machine and Deep Learning
25 pages
8th Database
No ratings yet
8th Database
13 pages
Machine Learning - Customer Segment Project. Approved by UDACITY
100% (1)
Machine Learning - Customer Segment Project. Approved by UDACITY
19 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
Tutor
100% (1)
Tutor
309 pages
Enhancing Industrial IoT Security: A Comprehensive Approach To Intrusion Detection Using PCA-Driven Decision Trees
No ratings yet
Enhancing Industrial IoT Security: A Comprehensive Approach To Intrusion Detection Using PCA-Driven Decision Trees
11 pages
Introduction To Variant Configuration With An Example Model
100% (2)
Introduction To Variant Configuration With An Example Model
12 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Vinee
100% (1)
Vinee
28 pages
Spectral Clustering: Eyal David Image Processing Seminar May 2008
No ratings yet
Spectral Clustering: Eyal David Image Processing Seminar May 2008
52 pages
Risk Return Summery
100% (1)
Risk Return Summery
85 pages
Mu 7th Semester Question Papers December 22
No ratings yet
Mu 7th Semester Question Papers December 22
7 pages
Teleco Cutomer Churn
100% (1)
Teleco Cutomer Churn
5 pages
Predictive Modeling Project Report
100% (2)
Predictive Modeling Project Report
31 pages
Ms Information Science Machine Learning Online University Arizona
No ratings yet
Ms Information Science Machine Learning Online University Arizona
19 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Enhancing Hospital Resource Management: Predicting Patient Length of Stay Using Machine Learning
No ratings yet
Enhancing Hospital Resource Management: Predicting Patient Length of Stay Using Machine Learning
7 pages
Literature Review On Flower Classification IJERTV4IS020561
100% (1)
Literature Review On Flower Classification IJERTV4IS020561
3 pages
1 The Role of Statistics and The Data Analysis Process
100% (1)
1 The Role of Statistics and The Data Analysis Process
30 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
191CSC503T - Data Mining-Cat 2-Question Bank
No ratings yet
191CSC503T - Data Mining-Cat 2-Question Bank
6 pages
Decision Tree
No ratings yet
Decision Tree
52 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
SVM Notes
No ratings yet
SVM Notes
8 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Customer Churn: by Dinesh Nair Adrien Le Doussal Fiona Tait Fatma Ahmadi Fulya Percin
100% (1)
Customer Churn: by Dinesh Nair Adrien Le Doussal Fiona Tait Fatma Ahmadi Fulya Percin
20 pages
Cocomo
No ratings yet
Cocomo
30 pages
Understanding Organizations
No ratings yet
Understanding Organizations
6 pages
Vehicle Detection Using Normalized Color and Edge
No ratings yet
Vehicle Detection Using Normalized Color and Edge
5 pages
Decision Trees For Predictive Modeling (Neville)
100% (1)
Decision Trees For Predictive Modeling (Neville)
24 pages
Fake News Detection Using ML: Srishti Agrawal, Vaishali Arora, Ruchika Arora, Pronika Chawla, Madhumita Kathuria
No ratings yet
Fake News Detection Using ML: Srishti Agrawal, Vaishali Arora, Ruchika Arora, Pronika Chawla, Madhumita Kathuria
6 pages
Taller Practica Churn
50% (2)
Taller Practica Churn
6 pages
DataMining - Workbook TF
No ratings yet
DataMining - Workbook TF
8 pages
Implications of Predictive Analytics
No ratings yet
Implications of Predictive Analytics
9 pages
FinalPaper SalesPredictionModelforBigMart
No ratings yet
FinalPaper SalesPredictionModelforBigMart
14 pages
Isss602 Data Analytics Lab: Assignment 2: Be Customer Wise or Otherwise
No ratings yet
Isss602 Data Analytics Lab: Assignment 2: Be Customer Wise or Otherwise
34 pages
Churn Analysis in Telecommunication Using Logistic Regression
No ratings yet
Churn Analysis in Telecommunication Using Logistic Regression
6 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
25 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Cluster Training PDF (Compatibility Mode)
No ratings yet
Cluster Training PDF (Compatibility Mode)
21 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
Tutorial On "R" Programming Language
No ratings yet
Tutorial On "R" Programming Language
25 pages
Study Smarter Not Harder
92% (24)
Study Smarter Not Harder
161 pages
Pricing Pricing Strategies Strategies: Dr. Vandana Tandon Khanna
No ratings yet
Pricing Pricing Strategies Strategies: Dr. Vandana Tandon Khanna
7 pages
SYSTEMANTICS. THE SYSTEMS BIBLE by John Gall (Gall, John)
90% (10)
SYSTEMANTICS. THE SYSTEMS BIBLE by John Gall (Gall, John)
249 pages
b074dg9lqf Thinkinginbetsbyannieduke 181101210315
5% (73)
b074dg9lqf Thinkinginbetsbyannieduke 181101210315
273 pages
Learning - How To Learn Faster, Become A Ge - Alex Right
100% (1)
Learning - How To Learn Faster, Become A Ge - Alex Right
177 pages
Thinking in Systems and Mental Models Think Like A Super Thinker by Marcus P. Dawson
91% (11)
Thinking in Systems and Mental Models Think Like A Super Thinker by Marcus P. Dawson
271 pages
Study Skills Strategies
100% (13)
Study Skills Strategies
101 pages
Problem Solving
100% (17)
Problem Solving
145 pages
Ultralearning by Scott Young Summary and Worksheet
100% (1)
Ultralearning by Scott Young Summary and Worksheet
12 pages
Becoming A Critical Thinker Master Student
96% (27)
Becoming A Critical Thinker Master Student
241 pages
The Systems Thinking Playbook
100% (8)
The Systems Thinking Playbook
213 pages
Think! - Edward de Bono
97% (30)
Think! - Edward de Bono
155 pages
Predictable Irrationality
100% (4)
Predictable Irrationality
360 pages
NickBostrom Superintelligence PDF
96% (54)
NickBostrom Superintelligence PDF
323 pages
Storytelling With Data Cole Nussbaumer Knaflic
100% (45)
Storytelling With Data Cole Nussbaumer Knaflic
291 pages
The Polymath Reading List
88% (17)
The Polymath Reading List
48 pages
Student Successes With Thinking Maps
98% (45)
Student Successes With Thinking Maps
249 pages
The Art of ChatGPT Prompting - A Guide To Crafting Clear and Effective Prompts December 2022
94% (34)
The Art of ChatGPT Prompting - A Guide To Crafting Clear and Effective Prompts December 2022
31 pages
The Managers Guide To Systems Thinking
100% (23)
The Managers Guide To Systems Thinking
227 pages
101 Creative Problem Solving Techniques by James M. Higgins
97% (61)
101 Creative Problem Solving Techniques by James M. Higgins
241 pages
Strategic Thinking in Complex Problem Solving
100% (16)
Strategic Thinking in Complex Problem Solving
300 pages
Make It Stick
100% (37)
Make It Stick
328 pages
Fundamentals of Artificial Intelligence PDF
100% (13)
Fundamentals of Artificial Intelligence PDF
730 pages
Speed Memory - Tony Buzan
100% (50)
Speed Memory - Tony Buzan
162 pages
Infographic - Learning How To Learn
100% (8)
Infographic - Learning How To Learn
1 page
Getting Things Done Personal Workflow Map (MM-EN-SB)
89% (75)
Getting Things Done Personal Workflow Map (MM-EN-SB)
3 pages
Design Thinking Methodology Book
88% (24)
Design Thinking Methodology Book
119 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
96% (27)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
Systems Thinking
100% (10)
Systems Thinking
62 pages
10WaysToChangeYourMindset PDF
97% (30)
10WaysToChangeYourMindset PDF
7 pages
Logic Made Easy (2004)
100% (32)
Logic Made Easy (2004)
260 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Decision Tree

Uploaded by

Decision Tree

Uploaded by

Decision Tree

student? yes credit rating?

no yes excellent fair

 Select the attribute with the highest information gain

 Information needed (after using A to split D into v

 Class P: buys_computer = “yes” 5 4

• Information gain measure is biased towards attributes with

• The three measures, in general, return good results but

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.