lecture slide 12
lecture slide 12
Ensemble Methods
Original
D Training data
Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets
Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers
Step 3:
Combine C*
Classifiers
Why Ensemble Methods work?
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
True False
yleft yright
Bagging Example
Bagging Round 1:
x 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.9 0.9 x <= 0.35 y = 1
y 1 1 1 1 -1 -1 -1 -1 1 1 x > 0.35 y = -1
Bagging Round 2:
x 0.1 0.2 0.3 0.4 0.5 0.5 0.9 1 1 1
y 1 1 1 -1 -1 -1 1 1 1 1
Bagging Round 3:
x 0.1 0.2 0.3 0.4 0.4 0.5 0.7 0.7 0.8 0.9
y 1 1 1 -1 -1 -1 -1 -1 1 1
Bagging Round 4:
x 0.1 0.1 0.2 0.4 0.4 0.5 0.5 0.7 0.8 0.9
y 1 1 1 -1 -1 -1 -1 -1 1 1
Bagging Round 5:
x 0.1 0.1 0.2 0.5 0.6 0.6 0.6 1 1 1
y 1 1 1 -1 -1 -1 -1 1 1 1
Bagging Example
Bagging Round 1:
x 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.9 0.9 x <= 0.35 y = 1
y 1 1 1 1 -1 -1 -1 -1 1 1 x > 0.35 y = -1
Bagging Round 2:
x 0.1 0.2 0.3 0.4 0.5 0.5 0.9 1 1 1 x <= 0.7 y = 1
y 1 1 1 -1 -1 -1 1 1 1 1 x > 0.7 y = 1
Bagging Round 3:
x 0.1 0.2 0.3 0.4 0.4 0.5 0.7 0.7 0.8 0.9 x <= 0.35 y = 1
y 1 1 1 -1 -1 -1 -1 -1 1 1 x > 0.35 y = -1
Bagging Round 4:
x 0.1 0.1 0.2 0.4 0.4 0.5 0.5 0.7 0.8 0.9 x <= 0.3 y = 1
y 1 1 1 -1 -1 -1 -1 -1 1 1 x > 0.3 y = -1
Bagging Round 5:
x 0.1 0.1 0.2 0.5 0.6 0.6 0.6 1 1 1 x <= 0.35 y = 1
x > 0.35 y = -1
y 1 1 1 -1 -1 -1 -1 1 1 1
Bagging Example
Bagging Round 6:
x 0.2 0.4 0.5 0.6 0.7 0.7 0.7 0.8 0.9 1 x <= 0.75 y = -1
y 1 -1 -1 -1 -1 -1 -1 1 1 1 x > 0.75 y = 1
Bagging Round 7:
x 0.1 0.4 0.4 0.6 0.7 0.8 0.9 0.9 0.9 1 x <= 0.75 y = -1
y 1 -1 -1 -1 -1 1 1 1 1 1 x > 0.75 y = 1
Bagging Round 8:
x 0.1 0.2 0.5 0.5 0.5 0.7 0.7 0.8 0.9 1 x <= 0.75 y = -1
y 1 1 -1 -1 -1 -1 -1 1 1 1 x > 0.75 y = 1
Bagging Round 9:
x 0.1 0.3 0.4 0.4 0.6 0.7 0.7 0.8 1 1 x <= 0.75 y = -1
y 1 1 -1 -1 -1 -1 -1 1 1 1 x > 0.75 y = 1
A cartoon depiction of how bagging works. Suppose we train an ‘8’ detector on the dataset depicted above, containing an ‘8’, a ‘6’ and a
‘9’. Suppose we make two different resampled datasets. The bagging training procedure is to construct each of these datasets by
sampling with replacement. The first dataset omits the ‘9’ and repeats the ‘8’. On this dataset, the detector learns that a loop on top of the
digit corresponds to an ‘8’. On the second dataset, we repeat the ‘9’ and omit the ‘6’. In this case, the detector learns that a loop on the
bottom of the digit corresponds to an ‘8’. Each of these individual classification rules is brittle, but if we average their output then the
detector is robust, achieving maximal confidence only when both loops of the ‘8’ are present.
Boosting
Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4
Error rate:
w (C ( x ) y )
N
1
i = j i j j
N j =1
Importance of a classifier:
1 1 − i
i = ln
2 i
AdaBoost Algorithm
Weight update:
− j
( j +1)
w exp
( j)
if C j ( xi ) = yi
wi =i
Z j exp j if C j ( xi ) yi
where Z j is the normalization factor
If any intermediate rounds produce error rate
higher than 50%, the weights are reverted back
to 1/n and the resampling procedure is repeated
Classification:
C * ( x ) = arg max j (C j ( x ) = y )
T
y j =1
AdaBoost Algorithm
AdaBoost Example
True False
yleft yright
AdaBoost Example
Boosting Round 2:
x 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3
y 1 1 1 1 1 1 1 1 1 1
Boosting Round 3:
x 0.2 0.2 0.4 0.4 0.4 0.4 0.5 0.6 0.6 0.7
y 1 1 -1 -1 -1 -1 -1 -1 -1 -1
Summary:
Round Split Point Left Class Right Class alpha
1 0.75 -1 1 1.738
2 0.05 1 1 2.7784
3 0.3 1 -1 4.1195
AdaBoost Example
Weights
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
2 0.311 0.311 0.311 0.01 0.01 0.01 0.01 0.01 0.01 0.01
3 0.029 0.029 0.029 0.228 0.228 0.228 0.228 0.009 0.009 0.009
Classification
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 -1 -1 -1 -1 -1 -1 -1 1 1 1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 -1 -1 -1 -1 -1 -1 -1
Sum 5.16 5.16 5.16 -3.08 -3.08 -3.08 -3.08 0.397 0.397 0.397
Predicted Sign 1 1 1 -1 -1 -1 -1 1 1 1
Class