0% found this document useful (0 votes)

4 views20 pages

Lecture 3 - Decision Trees and Random Forest

This document provides an overview of Decision Trees and Random Forests, including their structure, classification methods, and algorithms like CART. It explains how decision trees work, the importance of splitting criteria such as the Gini Index, and the advantages of using Random Forests for improved prediction accuracy. Additionally, it discusses the differences between decision trees and random forests, highlighting their applications in machine learning.

Uploaded by

2025032

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views20 pages

Lecture 3 - Decision Trees and Random Forest

Uploaded by

2025032

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Machine Learning

H Dip in DAB/ AI
CCT College Dublin

Decision Trees and Random Forest

Week 3

Lecturer: Dr. Muhammad Iqbal *

©CCT College Dublin 2022

Email: miqbal@cct.ie 1
Agenda
• Decision Tree Classifier
• Classification and Regression Trees
• Example of a Decision Tree
• General Structure based on Recursive
approach
• Decision Trees Splitting
• Categorical Attributes: Computing Gini Index
• CART Algorithm
• Random Forest
• How does the algorithm work?
2
Decision Tree Classifier
• A decision tree is a graph in the shape of a tree, a
sequential diagram that shows all of the potential
decision options and their associated results.

• Starting from the root of a tree, every internal node

represents what a decision is made based on; each
branch of a node represents how a choice may lead
to the next nodes; and finally, each terminal node,
the leaf, represents an outcome yielded.

• For example, we have made a couple of decisions

that brought us to the action of learning decision
tree to solve our advertising problem.

3
Decision Trees
• Decision Tree is collection of decision nodes, connected by branches, extending downward from
root node to terminating leaf nodes.
• Beginning with root node, attributes tested at decision nodes, and each possible outcome results
in branch.
• Each branch leads to decision node or leaf node.

• Example Custome Saving Asset Income Credit Risk

r s s ($1000s)
• Credit Risk is the target variable. 1 Medium High 75 Good
2 Low Low 50 Bad
• Customers are classified as either “Good 3 High Mediu 25 Bad
Risk” or “Bad Risk”. m
4 Medium Mediu 50 Good
• Predictor variables are Savings (Low, Med, m

High), Assets (Low, High) and Income. 5 Low Mediu

m
100 Good

6 High High 60 Good

7 Low Low 25 Bad 4
Classification and Regression Trees

• Example
• Predict whether customer is classified “Good” or “Bad” credit risk using
three predictor fields, according to data in Table.
• All records enter root node, and CART evaluates possible binary splits.

Custome Saving Asset Income Credit Risk

r s s ($1000s) Root Node
Savings = Low, Med,
1 Medium High 75 Good High?
2 Low Low 50 Bad
Savings = Low Savings = Med Savings = High
3 High Mediu 25 Bad
m Assets = Low? Income <=
Good
4 Medium Mediu 50 Good Risk $30K?
m
Yes No Yes No
5 Low Mediu 100 Good
m
Bad Risk Good Bad Risk Good
6 High High 60 Good Risk Risk
7 Low Low 25 Bad
5
Decision Trees
• For example, all branches terminate at pure leaf nodes. This describes all subsets of records arriving at leaf
nodes with same target class value.
• Diverse leaf node has records with different target class values (“Good Risk” and “Bad Risk”). Algorithm
possibly unable to split
• For example, subset of records has Savings = “High”, Income <= $30,000, and Assets = “Low”. Leaf node
contains 2 “Good Risk”, and 3 “Bad Risk” records
• All records contain same predictor values. No way to split further leading to pure leaf node
• 3/5 records are classified “Bad Risk” with 60% confidence

Custome Saving Asset Income ($1000s) Credit Risk

r s s Root Node
1 Medium High 75 Good Savings = Low, Med,
High?
2 Low Low 50 Bad
Savings = Low Savings = Med Savings = High
3 High Mediu 25 Bad
m Income <=
Assets = Low? Good
4 Medium Mediu 50 Good Risk $30K?
m
Yes No Yes No
5 Low Mediu 100 Good
m
Bad Risk Good Bad Risk Good
6
6 High High 60 Good Risk Risk
Decision Trees
Requirements for Classification

• Decision Tree is supervised classification method.

• The target variable must be categorical.
• Pre-classified target variable must be included in training set.
• Decision trees learn by example, so training set should contain records with
varied attribute values.
• If training set systematically lacks definable subsets, classification becomes
problematic.
• There are different measures for leaf node purity.
• Classification and Regression Trees (CART) and C4.5 are two leading algorithms
used in data analytics. 7
Decision Trees
Best Splitting
• A decision tree is constructed by partitioning the training
samples into successive subsets. The partitioning process is
repeated in a recursive fashion on each subset.

• For each partitioning at a node, a condition test is conducted

based on a value of a feature of the subset.

• When the subset shares the same class label, or no further

splitting can improve the class purity of this subset, recursive
partitioning on this node is finished.

• For a partitioning on a feature (numerical or categorical) with n

different values, there are n different ways of binary splitting
(yes or no to the condition test), not to mention other ways of • CART (Classification and Regression
splitting. Tree): which we will discuss in detail

• Without considering the order of features partitioning takes

place on, there are possible trees for an m-dimensional dataset. 8
Measures of Node Impurity
Gini Index Criteria

• Gini Index (NOTE: p( i | t) is the relative frequency of class i

at node t).
Finding the Best Split
1. Compute impurity measure (P) before splitting
2. Compute impurity measure (M) after splitting
• Compute impurity measure of each child node
• M is the weighted impurity of children

3. Choose the attribute test condition that produces the highest gain

Gain = P – M
or equivalently, lowest impurity measure after splitting (M)

9
Measures of Node Impurity
Gini Index
• Gini Index for a given node t:

(NOTE: p( i | t) is the relative frequency of class i at node t).

• Maximum (1 - 1/nc) when records are equally distributed among all classes
• Minimum (0.0) when all records belong to one class
• For 2-class or binary problem (p, 1 – p):
• GINI = 1 – p2 – (1 – p)2 = 2p (1 – p)
C1 0 C1 1 C1 2 C1 3
C2 6 C2 5 C2 4 C2 3
Gini=0.000 Gini=0.278 Gini=0.444 Gini=0.500

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 P(C1) = 1/6 P(C2) = 5/6 P(C1) = 2/6 P(C2) = 4/6 P(C1) = 3/6 P(C2) = 3/6

Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0 Gini = 1 – (1/6)2 – (5/6)2 = 0.278 Gini = 1 – (2/6)2 – (4/6)2 = 0.444 Gini = 1 – (3/6)2 – (3/6)2 = 0.5
10
CART Algorithm
Example

• Data Set
• There are 14 instances of golf playing
decisions based on outlook,
temperature, humidity and wind factors.

• Gini index
• Gini index is a metric for classification
tasks in CART. It stores sum of squared
probabilities of each class. We can
formulate it as illustrated below.
• Gini = 1 – Σ (Pi)2 for i = 1 to number of
classes https://dataaspirant.com/how-decision-tree-algorithm-works/ 11
12
https://dataaspirant.com/how-decision-tree-algorithm-works/
Splitting is occurred at minimum gini
index

All are Yes

https://dataaspirant.com/how-decision-tree-algorithm-works/ 13
https://dataaspirant.com/how-decision-tree-algorithm-works/ 14
https://dataaspirant.com/how-decision-tree-algorithm-works/ 15
https://dataaspirant.com/how-decision-tree-algorithm-works/ 16
Random Forest
• Random forests is a supervised learning algorithm. It can be used both
for classification and regression. It is the most flexible and easy to use
algorithm.

• A forest is comprised of trees. We can get a robust forest if we have

more trees.

• Random forests create decision trees on randomly selected data

samples, gets prediction from each tree and selects the best solution by
means of voting.

• It provides a pretty good indicator of the feature importance.

• Random forests have a variety of applications, such as recommendation

engines, image classification and feature selection.

• It can be used to classify loyal loan applicants, identify fraudulent

17
Random Forest: Example
• Suppose you want to go on a trip and you would like to travel to a place which you will enjoy.

• So what do you do to find a place that you will like? You can search online, read reviews on travel blogs and
portals, or you can also ask your friends.

• Let’s suppose you have decided to ask your friends, and talked with them about their past travel experience
to various places. You will get some recommendations from every friend. Now you have to make a list of
those recommended places. Then, you ask them to vote (or select one best place for the trip) from the list of
recommended places you made. The place with the highest number of votes will be your final choice for the
trip.

• In the above decision process, there are two parts. First, asking your friends about their individual travel
experience and getting one recommendation out of multiple places they have visited.

• This part is like using the decision tree algorithm. In this scenario, each friend makes a selection of the places
he or she has visited so far.

• The second part, after collecting all the recommendations, is the voting procedure for selecting the best place
in the list of recommendations. This whole process of getting recommendations from friends and voting on 18
How does the algorithm work?
• It works in four steps:
1. Select random samples from a given dataset.
2. Construct a decision tree for each sample and get a prediction
result from each decision tree.
3. Perform a vote for each predicted result.
4. Select the prediction result with the most votes as the final
prediction.
Random Forests vs Decision Trees
• Random forests is a set of multiple decision trees. Working diagram of random forest

• Deep decision trees may suffer from overfitting, but the random forests prevent overfitting by creating trees
on random subsets.

• Decision trees are computationally faster.

• Random forests is difficult to interpret, while a decision tree is easily interpretable and can be converted to19
Resources/ References
• Introduction to Machine Learning with Python, Andreas C. Müller and
Sarah Guido, O'Reilly Media, Inc. October 2016.
• Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow,
2nd Edition, Aurélien Géron, O'Reilly Media, September 2019, ISBN: Copyright Notice
9781492032649. The following material has been
communicated to you by or on behalf of
• Python Machine Learning - Third Edition, Sebastian Raschka, Vahid CCT College Dublin in accordance with the
Mirjalili, Copyright © 2017 Packt Publishing. Copyright and Related Rights Act 2000 (the
Act).
• Discovering Knowledge In Data: An Introduction To Data Exploration, The material may be subject to copyright
Second Edition, By Daniel Larose And Chantal Larose, John Wiley And under the Act and any further
Sons, Inc., 2014. reproduction, communication or
• UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html distribution of this material must be in
accordance with the Act.
• Understanding Autoencoders. (Part I) | by Jelaleddin Sultanov | AI³ |
Do not remove this notice
Theory, Practice, Business | Medium
• Statlib: http://lib.stat.cmu.edu
• Some images are used from Google search repository
(https://www.google.ie/search) to enhance the level of learning.
20

Decision Tree
0% (1)
Decision Tree
24 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Lesson Plan For CBSE Class 12 Comp. Sci.
No ratings yet
Lesson Plan For CBSE Class 12 Comp. Sci.
7 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
SQL Deep Guide
No ratings yet
SQL Deep Guide
236 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
Decision Trees
No ratings yet
Decision Trees
77 pages
6 DecisionTrees ID3 CART
No ratings yet
6 DecisionTrees ID3 CART
24 pages
Classification and Regression Trees CART
No ratings yet
Classification and Regression Trees CART
40 pages
Module 5 Machine Learning
No ratings yet
Module 5 Machine Learning
36 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
1822 B.E Cse Batchno 149
No ratings yet
1822 B.E Cse Batchno 149
66 pages
Week 8 - Understanding The Decision Tree
No ratings yet
Week 8 - Understanding The Decision Tree
28 pages
Decision Trees: Make A Decision (Represent An Outcome
No ratings yet
Decision Trees: Make A Decision (Represent An Outcome
4 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Tree 7
No ratings yet
Tree 7
31 pages
Decision Trees
67% (3)
Decision Trees
14 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Trees
No ratings yet
Trees
19 pages
Q1 LE TLE-7 Lesson-1 Week-1
No ratings yet
Q1 LE TLE-7 Lesson-1 Week-1
16 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Decision Tree Introduction
No ratings yet
Decision Tree Introduction
14 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Smartfalcon - Campus Hiring - 2026 Batch - Notification With Task Details
No ratings yet
Smartfalcon - Campus Hiring - 2026 Batch - Notification With Task Details
1 page
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Java Programming (MindTap Course List), 10th Edition Joyce Farrell - Ebook PDF Instant Download
100% (2)
Java Programming (MindTap Course List), 10th Edition Joyce Farrell - Ebook PDF Instant Download
75 pages
Chapter-2-Data Structures and Algorithms Analysis
100% (2)
Chapter-2-Data Structures and Algorithms Analysis
44 pages
Module 04
No ratings yet
Module 04
75 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Data Warehousing and Data Mining: Classification, Trees
No ratings yet
Data Warehousing and Data Mining: Classification, Trees
26 pages
Unit 4
No ratings yet
Unit 4
33 pages
Siemens S7 1200 Symbolic Addressing Ethernet
No ratings yet
Siemens S7 1200 Symbolic Addressing Ethernet
14 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Classification and Regression Trees (CART) Theory and Applications
No ratings yet
Classification and Regression Trees (CART) Theory and Applications
40 pages
DBT Analytics Engineering Exam Questions
No ratings yet
DBT Analytics Engineering Exam Questions
12 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
FMLanswerkey-IT 2
No ratings yet
FMLanswerkey-IT 2
11 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
AIML Final Cpy Word
No ratings yet
AIML Final Cpy Word
15 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
The 8051 Microcontroller and Embedded Systems: Jump, Loop, and Call Instructions
No ratings yet
The 8051 Microcontroller and Embedded Systems: Jump, Loop, and Call Instructions
26 pages
DBMS Lab Manual Editing New
No ratings yet
DBMS Lab Manual Editing New
44 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
Bootstrap Corewar
100% (1)
Bootstrap Corewar
4 pages
A Complete Guide To LLVM For Programming Language Creators
No ratings yet
A Complete Guide To LLVM For Programming Language Creators
22 pages
Lecture Human Computer Interaction Note
No ratings yet
Lecture Human Computer Interaction Note
37 pages
Java Beans Tutorial
No ratings yet
Java Beans Tutorial
38 pages
Poly Metamorphic
No ratings yet
Poly Metamorphic
40 pages
GC-Students - 10 - AI - Record File 2024-25
No ratings yet
GC-Students - 10 - AI - Record File 2024-25
15 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Informatica
No ratings yet
Informatica
32 pages
SAP HANA 2.0: Deprecations Reported by The HANA Statistics Server
No ratings yet
SAP HANA 2.0: Deprecations Reported by The HANA Statistics Server
3 pages
Exp 4
No ratings yet
Exp 4
2 pages
Client Side and Server Side
No ratings yet
Client Side and Server Side
3 pages
Sqlfordevscom Next Level Database Techniques For Developers 1 4
No ratings yet
Sqlfordevscom Next Level Database Techniques For Developers 1 4
4 pages
Testing Framework Using Selenium
No ratings yet
Testing Framework Using Selenium
3 pages
Journal of Object Oriented Programming and Data Structure
No ratings yet
Journal of Object Oriented Programming and Data Structure
2 pages
170 sp17 mt2
No ratings yet
170 sp17 mt2
16 pages
Rivan CV
No ratings yet
Rivan CV
1 page
Feedback - Quiz 1: You Submitted This Quiz On Sat 30 Jan 2016 10:32 PM PET. You Got A Score of 100.00 Out of 100.00
No ratings yet
Feedback - Quiz 1: You Submitted This Quiz On Sat 30 Jan 2016 10:32 PM PET. You Got A Score of 100.00 Out of 100.00
9 pages
Verification of Producer-Consumer Synchronization in GPU Programs
No ratings yet
Verification of Producer-Consumer Synchronization in GPU Programs
11 pages
Bulacan State University - Meneses Campus: It113 - Computer Programming I
No ratings yet
Bulacan State University - Meneses Campus: It113 - Computer Programming I
2 pages
Bom Inventory Comps Interface
No ratings yet
Bom Inventory Comps Interface
1 page
Prop Firm Trading: A professional outlook
From Everand
Prop Firm Trading: A professional outlook
Presley
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 3 - Decision Trees and Random Forest

Uploaded by

Lecture 3 - Decision Trees and Random Forest

Uploaded by

Machine Learning

Decision Trees and Random Forest

Lecturer: Dr. Muhammad Iqbal *

©CCT College Dublin 2022

• Starting from the root of a tree, every internal node

• For example, we have made a couple of decisions

• Example Custome Saving Asset Income Credit Risk

High), Assets (Low, High) and Income. 5 Low Mediu

6 High High 60 Good

Custome Saving Asset Income Credit Risk

Custome Saving Asset Income ($1000s) Credit Risk

• Decision Tree is supervised classification method.

• For each partitioning at a node, a condition test is conducted

• When the subset shares the same class label, or no further

• For a partitioning on a feature (numerical or categorical) with n

• Without considering the order of features partitioning takes

• Gini Index (NOTE: p( i | t) is the relative frequency of class i

(NOTE: p( i | t) is the relative frequency of class i at node t).

All are Yes

• A forest is comprised of trees. We can get a robust forest if we have

• Random forests create decision trees on randomly selected data

• It provides a pretty good indicator of the feature importance.

• Random forests have a variety of applications, such as recommendation

• It can be used to classify loyal loan applicants, identify fraudulent

• Decision trees are computationally faster.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.