Lecture 11 Slides - After
Lecture 11 Slides - After
Random forests
Outline
▪ Review
▪ Decision-tree
▪ Random forest
3
Introduction Linear regression Logistic regression
Quiz: 13.12, topics: dimensionality reduction, clustering, neural network, convolutional neural network
Last time - CNN
-1 -1 -1
-1 8 -1
-1 -1 -1
Decision trees
Brief recap - data statistics
Empirical distribution
Brief recap - data statistics
Variance, covariance, correlation
Decision Trees
Introduction
What’s a decision tree?
True False
Root Node
Note: Criteria other than gini index (such as entropy) are also used for node split
Classification Trees
Example - constructing the tree
We will apply Gini impurity to construct the classification tree
Continuous Categorical Class
Feature Feature label
2 0 2 2
true false
Splitting continuous features
High Low High Low
tid Age Risk tid Age Risk
3 0 1 2
0 23 high 1 17 high
Reorder the
1 17 high data depending 5 20 high
on Age
2 43 high 0 23 high
3 68 low 4 32 low
4 32 low 2 43 high
5 20 high 3 68 low
verity :
g
:
v
true false
Splitting continuous features
High Low High Low
tid Age Risk tid Age Risk
3 1 1 1
0 23 high 1 17 high
Reorder the
1 17 high data depending 5 20 high
on Age
2 43 high 0 23 high
3 68 low 4 32 low
4 32 low 2 43 high
5 20 high 3 68 low
of mode
I
imponty
.
G n .
:
Decision Trees Age < 55.5
true false
Splitting continuous features
High Low High Low
tid Age Risk tid Age Risk
4 1 0 1
0 23 high 1 17 high
Reorder the
1 17 high data depending 5 20 high
on Age
2 43 high 0 23 high
3 68 low 4 32 low
4 32 low 2 43 high
5 20 high 3 68 low
of nece &
Gin ,
imprnhy '
IS
Decision Trees for classification Car type sports
true false
true false
product
Age Car type Risk 3 0 true false
pres
k preciat
high
ris
low risk
here
Classification tree example
penguin data set
Decision Trees
Penguins example
Adelie Gentoo
Decision Trees
Penguins example
Classification trees
Internal Node Internal Node
For the “split” in each node Internal Node Leaf Node Internal Node Leaf Node
need to determine:
• The feature Leaf Node Leaf Node Leaf Node Leaf Node
• The threshold
using some criteria
Decision Trees
Characteristics of decision tree induction
Idea :
▪ Take a collection of predictors (decision trees for example)
▪ Combine their results to make a single predictor
Types :
▪ Bagging: train predictors in parallel on different samples of the data, then
combine outputs through voting or averaging
▪ Stacking: combine model outputs using a second-stage predictor like linear
regression
▪ Boosting: train learners on the filtered output of other learners
Optional: https://towardsdatascience.com/ensemble-methods-bagging-
boosting-and-stacking-c9214a10a205
Ensemble methods
Bagging
Bagging = Bootstrap aggregating
See: https://en.wikipedia.org/wiki/Bootstrap_aggregating
Ensemble methods
Intuition on why ensemble methods could work
Suppose we have ensemble of 25 classifiers
→ Each classifier has error rate = 0.35
→ Assume classifiers are independent
∑( i )
25
P(wrong prediction) = 25 ϵ i(1 − ϵ)25−i = 0.06
i=13
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Original data
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Original data
Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Original data
Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Original data
Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Original data
Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Original data
Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Original data
Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Original data
Random sampling
Ensemble methods
Bootstrap
Method to generate multiple datasets with good statistical properties from an
original dataset
Original data
Random sampling
Bootstrap sample 1
Ensemble methods
Bootstrap
Original data
Random sampling
Two randomization strategies are used to select the data on which the classifier is trained
Sampling data :
→ Select a subset of the data → Each tree is trained on different data
Sampling features :
→ Select a subset of features → corresponding nodes in different trees (usually) don’t
use the same feature to split
1. Draw K bootstrap samples of size from the original dataset, with replacement
(bootstrapping)
2. While constructing the decision tree, select a random set of m features out of the p features
available to infer split
Random forest
Prediction
Goal: optimize shock break-out time by varying 5 parameters (laser energy, disc thickness, etc)
Machine learning approach: Run a few (~100 experiments) and use random forest to predict
outcome for other possible set of parameters