0% found this document useful (0 votes)
21 views55 pages

Lecture 11 Slides - After

The document outlines decision trees and random forests, discussing their structures, terminology, and applications in regression and classification tasks. It explains the process of constructing decision trees using criteria like Gini impurity and the concept of ensemble methods, particularly bagging, to improve model accuracy. Additionally, it highlights the use of random forests in optimizing experimental designs by predicting outcomes based on limited experimental data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views55 pages

Lecture 11 Slides - After

The document outlines decision trees and random forests, discussing their structures, terminology, and applications in regression and classification tasks. It explains the process of constructing decision trees using criteria like Gini impurity and the concept of ensemble methods, particularly bagging, to improve model accuracy. Additionally, it highlights the use of random forests in optimizing experimental designs by predicting outcomes based on limited experimental data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Decision-trees,

Random forests
Outline

▪ Review

▪ Decision-tree

▪ Random forest
3
Introduction Linear regression Logistic regression

Feature engineering Data statistics Naive Bayes

KNN Clustering Dimensionality reduction

Neural networks Convolutional neural Decision-trees


networks
Announcements
Last topic of the course: Reinforcement learning

Quiz: 13.12, topics: dimensionality reduction, clustering, neural network, convolutional neural network
Last time - CNN

-1 -1 -1

-1 8 -1

-1 -1 -1
Decision trees
Brief recap - data statistics
Empirical distribution
Brief recap - data statistics
Variance, covariance, correlation
Decision Trees
Introduction
What’s a decision tree?

Student attends lecture

True False

Student gets high grade Student gets average grade

Split data based on the answer to a question


Decision Trees
Terminology

Root Node

▪ Nodes are checked on a single feature


Internal Node Internal Node
▪ Branches are feature values
▪ Leaves indicate prediction of the tree Internal Node Leaf Node Internal Node Leaf Node

Leaf Node Leaf Node Leaf Node Leaf Node


Decision tree for regression
Called regression tree
Regression tree example
Regression tree example
Regression tree
Loss function
Regression Tree
Optimising the loss function - greedy algorithm
Regression tree
Stopping criteria
Decision Trees for classification
Called classification trees

Continuous Categorical Class


Feature Feature label
Age < 27.5
Age Car type Risk
true false
23 family high
17 sports high High Car type sports
43 sports high
true false
68 family low
32 family low High Low
20 family high
Classification trees
Performance metric and loss function
Classification trees
Greedy approach: choose a feature and the split sequentially based on a metric,
such as Gini impurity

Gini impurity of a leaf

Gini impurity of a node

Note: Criteria other than gini index (such as entropy) are also used for node split
Classification Trees
Example - constructing the tree
We will apply Gini impurity to construct the classification tree
Continuous Categorical Class
Feature Feature label

Age Car type Risk


23 family high
17 sports high
43 sports high
68 family low
32 family low
20 family high
Classification trees
Car type sports
Criteria for choosing a feature and a split
true false

High Low High Low

2 0 2 2

Continuous Categorical Class


Feature Feature label

Age Car type Risk


23 family high
17 sports high
43 sports high
68 family low
32 family low
20 family high
Age < 18.5

Decision Trees true false

Splitting continuous features High Low High Low

tid Age Risk tid Age Risk 1 0 3 2


0 23 high 1 17 high
Reorder the
1 17 high data depending 5 20 high
on Age
2 43 high 0 23 high
3 68 low 4 32 low
4 32 low 2 43 high
5 20 high 3 68 low
Decision Trees Age < 21.5

Splitting continuous features true false

High Low High Low


tid Age Risk tid Age Risk
2 0 2 2
0 23 high 1 17 high
Reorder the
1 17 high data depending 5 20 high
on Age
2 43 high 0 23 high
3 68 low 4 32 low
4 32 low 2 43 high
5 20 high 3 68 low
Decision Trees Age < 27.5

true false
Splitting continuous features
High Low High Low
tid Age Risk tid Age Risk
3 0 1 2
0 23 high 1 17 high
Reorder the
1 17 high data depending 5 20 high
on Age
2 43 high 0 23 high
3 68 low 4 32 low
4 32 low 2 43 high
5 20 high 3 68 low

verity :

g
:
v

impunty of this mode +


split 2
9
Decision Trees Age < 37.5

true false
Splitting continuous features
High Low High Low
tid Age Risk tid Age Risk
3 1 1 1
0 23 high 1 17 high
Reorder the
1 17 high data depending 5 20 high
on Age
2 43 high 0 23 high
3 68 low 4 32 low
4 32 low 2 43 high
5 20 high 3 68 low

of mode
I
imponty
.
G n .
:
Decision Trees Age < 55.5

true false
Splitting continuous features
High Low High Low
tid Age Risk tid Age Risk
4 1 0 1
0 23 high 1 17 high
Reorder the
1 17 high data depending 5 20 high
on Age
2 43 high 0 23 high
3 68 low 4 32 low
4 32 low 2 43 high
5 20 high 3 68 low

of nece &
Gin ,

imprnhy '

IS
Decision Trees for classification Car type sports

true false

High Low High Low

Continuous Categorical Class


Feature Feature label

Age Car type Risk


23 family high
17 sports high
43 sports high
68 family low
32 family low
Age < 27.5
20 family high
true false

High Low High Low


Classification tree
Example - Final tree
Age < 27.5

true false

Continuous Categorical Class


High Low Car type sports
Feature Feature label

product
Age Car type Risk 3 0 true false

23 family high nigh


High Low High Low
17 sports high
visk 1 0 0 2
43 sports high
A
68 family low
I I
32 family low c
I
20 family high
.

pres
k preciat
high
ris

low risk
here
Classification tree example
penguin data set
Decision Trees
Penguins example

Culmen length ≤ 43.25

Adelie Gentoo
Decision Trees
Penguins example

Culmen length ≤ 43.25

Culmen depth ≤ 15.3 Culmen depth ≤ 16.55

Gentoo Adelie Gentoo Chinstrap


Decision Trees
Penguins example
Python output for sklearn.tree

Culmen length ≤ 43.25


Culmen length ≤ 43.25

Culmen depth ≤ 16.55

Culmen depth ≤ 15.3 Culmen depth ≤ 16.55

Gentoo Adelie Gentoo Chinstrap

[Adelie, Chinstrap, Gentoo]


Summary - decision tree construction
▪ We select the most discriminative Feature and Threshold

• Discriminative power based on a criteria:

• Regression tree: Average Squared Loss

• Classification tree: Gini impurity

• We create a node based on this feature

▪ We repeat for each new branch until a stopping criteria


▪ To ensure the tree-depth is not too high (avoid overfitting) “pruning” is done..

More examples on StatQuestion!!!: https://www.youtube.com/watch?v=_L39rN6gz7Y


Random forests
Recall - Decision Trees
Regression trees
Root Node

Classification trees
Internal Node Internal Node

For the “split” in each node Internal Node Leaf Node Internal Node Leaf Node

need to determine:
• The feature Leaf Node Leaf Node Leaf Node Leaf Node

• The threshold
using some criteria
Decision Trees
Characteristics of decision tree induction

Easy to interpret and explain


Tend to overfit if deep

Minimal data preparation


Don’t always perform so well
Automatic feature selection
Fundamental trade-off in learning
37

Goal of supervised ML models: generalise well on new data (based on the


patterns learned from known data).
Hence, we want test error to be small: e_test
Ensemble methods
Overview
based on the hypothesis that combining multiple models together can reduce
one type of error while not significantly increasing the other

Idea :
▪ Take a collection of predictors (decision trees for example)
▪ Combine their results to make a single predictor

Types :
▪ Bagging: train predictors in parallel on different samples of the data, then
combine outputs through voting or averaging
▪ Stacking: combine model outputs using a second-stage predictor like linear
regression
▪ Boosting: train learners on the filtered output of other learners
Optional: https://towardsdatascience.com/ensemble-methods-bagging-
boosting-and-stacking-c9214a10a205
Ensemble methods
Bagging
Bagging = Bootstrap aggregating

See: https://en.wikipedia.org/wiki/Bootstrap_aggregating
Ensemble methods
Intuition on why ensemble methods could work
Suppose we have ensemble of 25 classifiers
→ Each classifier has error rate = 0.35
→ Assume classifiers are independent

Probability that the ensemble classifier makes a wrong prediction


= probably of majority (at least 13) makes wrong prediction

∑( i )
25
P(wrong prediction) = 25 ϵ i(1 − ϵ)25−i = 0.06
i=13
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset

Original data
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset

Original data

Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset

Original data

Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset

Original data

Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset

Original data

Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset

Original data

Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset

Original data

Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset

Original data

Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset

Original data

Random sampling

Bootstrap sample 1
Ensemble methods
Bootstrap

Method to generate multiple datasets from an original dataset

Original data

Random sampling

Bootstrap sample 1 Bootstrap sample 2


Random forest
Randomization to reduce correlation

Two randomization strategies are used to select the data on which the classifier is trained

Sampling data :
→ Select a subset of the data → Each tree is trained on different data

Sampling features :
→ Select a subset of features → corresponding nodes in different trees (usually) don’t
use the same feature to split

These help the tree to have low correlation with each


other so we average them, errors can cancel out and
get better prediction
Random forest algorithm
Construction
# of classifiers we train

1. Draw K bootstrap samples of size from the original dataset, with replacement
(bootstrapping)

2. While constructing the decision tree, select a random set of m features out of the p features
available to infer split
Random forest
Prediction

Prediction as average for regression

Prediction as majority vote for classification.


Case study - application in physical experiment design
Section 3.4 of ML4Engineers book

Goal: optimize shock break-out time by varying 5 parameters (laser energy, disc thickness, etc)

Challenge: experiments are costly (human operators, potential breaks, time)

Machine learning approach: Run a few (~100 experiments) and use random forest to predict
outcome for other possible set of parameters

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy