ML-Unit I - Ensemble Methods
ML-Unit I - Ensemble Methods
27 8 2 7 0.16 0
… …. … … … ….
Without
replacement 14 13 32 6 0.87 0
x1 x2 x3 x4 x5 Vote
With 2 25 8 5 0.34 0
replacement
40 42 2 9 0.26 1
… …. … … … ….
40 42 2 9 0.26 1
Original Dataset Bootstrapped Dataset- Shape: (50*6)
Random Forest: Bootstrapping x1 x3 x4 Vote
27 2 7 0
… … … ….
Dataset 1
14 32 6 0
x2 x3 x5 Vote
Dataset 2
25 8 0.34 0
42 2 0.26 1
…. … … ….
42 2 0.26 1
Original Dataset Bootstrapped Dataset- Shape: (1000*3)
Random Forest: Bootstrapping x1 x3 x4 Vote
27 2 7 0
… … … ….
Without
replacement 14 32 6 0
x1 x2 x5 Vote
With 2 25 0.34 0
replacement
40 42 0.26 1
… …. … ….
40 42 0.26 1
Original Dataset Bootstrapped Dataset- Shape: (50*6)
Random Forest: Bootstrapping
Aggregation:
● We aggregate all individual decision trees for prediction.
○ In Classification: we do voting
○ In Regression: we do averaging
Random Forest: Bootstrapping
Random Forest: How it performs so well?
● Each individual weak
learner is exposed to only
1k instances.
● Other 2k instances are
unseen for every DT.
4 0 0 0 0 20 11 0 0 0
9 0 0 0 0 186 0 0 0 0
0 0 0 0 0 90 0 0 0 0
6 0 0 0 0 54 90 0 0 0
0 0 0 0 0 255 0 0 0 0
h(x)= ∝1h1(x)+∝2h2(x)+∝3h3(x)
h(x)= ∝1h1(x)+∝2h2(x)+∝3h3(x)
= 2*(-1)+3*(+1)+2(-1)
Weight =∝1=2 Weight =∝2=3 Weight =∝3=2
= -2+3-2 = -1
Adaboost: working example
Consider the initial dataset as:
X1 X2 Y
3 7 1
2 9 0
1 4 1
9 8 0
3 7 0
Original Dataset
Adaboost: working example
Consider the initial dataset as:
X1 X2 Y X1 X2 Y Initial weight (=1/n)
3 7 1 3 7 1 0.2
2 9 0 2 9 0 0.2
1 4 1 1 4 1 0.2
9 8 0 9 8 0 0.2
3 7 0 3 7 0 0.2
1 4 1 0.2
3 7 1 1 0.2
Therefore, error rate (model 1)= 0.2+0.2 = 0.4
1 4 1 0 0.2 0.24
9 8 0 0 0.2 0.16
3 7 0 0 0.2 0.16
Original Dataset
Adaboost: working example
3 7 1 0.166 0 - 0.166
● For new data points, the final prediction will
be made on following formula:
1 4 1 0.25 0.416 - 0.666
iq cgpa salary
We will create three estimators for this simple dataset.
90 8 3
110 6 8 4.8
120 9 6 4.8
80 5 3 4.8
Gradient Boosting
● Model 1: is an average of output variable also known as leaf
● Therefore, Model 1 prediction = (3+4+8+6+3)/5 = 4.8
Calculate: Pseudo_residual = actual - prediction
iq cgpa salary Pred1 res1
● Next, we will transfer these errors to Model 2.
90 8 3 4.8 -1.8
● We will build Model 2 using decision tree on
100 7 4 4.8 -0.8 following dataset:
110 6 8 4.8 3.2 Input: iq and cgpa
120 9 6 4.8 1.2
Output: res1
80 5 3 4.8 -1.8
Gradient Boosting
● Model 1 prediction = 4.8 iq<=105
● Model 2: construction using DT
90 8 -1.8
-1.8 -0.8 3.2 1.2
100 7 -0.8
110 6 3.2
120 9 1.2
80 5 -1.8
Gradient Boosting
● Model 1 prediction = 4.8 iq<=105
● Model 2: construction using DT
90 8 -1.8 -1.8
iq cgpa res2
iq<=105
90 8 -1.62
100 7 -0.72
iq<=95 cgpa<=7.5
110 6 2.88
120 9 1.08
-1.62 -0.72 2.88 1.08
80 5 -1.62
Model 3 DT
Gradient Boosting
● Model 1 prediction = 4.8
● Model 2: Our DT is ready.
● Model 3: construct DT Model 2 DT
iq cgpa salary res1 Pred2 PredBoost res2 pred3 Final PredBoost res3
(M1+0.1*M2) (M1+0.1*M2+0.1*M3)