Bagging and Boosting
Bagging and Boosting
Bias-Variance trade-off
The next chart might be familiar to some of you, but it represents quite
well the relationship and the trade-off between bias and variance on the
test error rate.
You might be familiar with the following concept, but I posit that it
effectively illustrates the correlation and compromise between bias and
variance with respect to the testing error rate.
The relationship between the variance and bias of a model is such that a
reduction in variance results in an increase in bias, and vice versa. To
achieve optimal performance, the model must be positioned at an
equilibrium point, where the test error rate is minimized, and the variance
and bias are appropriately balanced.
With this purpose, the idea is to have multiple models of the same
learning algorithm that are trained by random subsets of the original
training data. Those random subsets are called bags and can contain any
combination of the data. Each of those datasets is then used to fit an
individual model which produces individual predictions for the given data.
Those predictions are then aggregated into one final classifier. The idea of
this method is really close to our initial toy example with the cats and
dogs.
The calculation of the final ensemble aggregation uses either the simple
average for regression problems or a simple majority vote for
classification problems. For that, each model from each random sample
produces a prediction for that given subset. For the average, those
predictions are just summed up and divided by the number of created
bags.
A simple majority voting works similarly but uses the predicted classes
instead of numeric values. The algorithm identifies the class with the most
predictions and assumes that the majority is the final aggregation. This is
again very similar to our toy example, where two out of three algorithms
predicted a picture to be a dog and the final aggregation was therefore a
dog prediction.
Random Forest
A famous extension to the bagging method is the random forest
algorithm, which uses the idea of bagging but uses also subsets of the
features and not only subsets of the entries. Bagging, on the other hand,
takes all given features into account.
base_estimator: You have to provide the underlying algorithm that
should be used by the random subsets in the bagging procedure in
the first parameter. This could be for example Logistic Regression,
Support Vector Classification, Decision trees, or many more.
After setting the scenes, this model object works like many other models
and can be trained using the fit()procedure including X and y data from
the training set. The corresponding predictions on test data can be done
using predict().
#3: Boosting
For that, it uses the miss-classified entries of the weak model with some
other random data to create a new model. Therefore, the different models
are not randomly chosen but are mainly influenced by wrong classified
entries of the previous model. The steps for this technique are the
following:
In the following, we will look at a similar code example but for boosting.
Obviously, there exist multiple boosting algorithms. Besides
the GradientDescent methodology, the AdaBoost is one of the most
popular.
base_estimator: Similar to Bagging, you need to define which
underlying algorithm you would like to use.
learning_rate: Finally, the learning rate controls how much the new
model is going to contribute to the previous one. Normally there is a
trade-off between the number of iterations and the value of the
learning rate. In other words: when taking smaller values of the
learning rate, you should consider more estimators, so that your
base model (the weak classifier) continues to improve.
Since we learned briefly how bagging and boosting work, I would like to
put the focus now on comparing both methods against each other.
Similarities
Ensemble methods
In a general view, the similarities between both techniques start
with the fact that both are ensemble methods with the aim to use
multiple learners over a single model to achieve better results.
Purpose
Finally, it is reasonable that both aim to produce higher stability and
better prediction for the data.
Differences
Implications
It was shown that the main idea of both methods is to use multiple models
together to achieve better predictions compared so single learning
models. However, there is no one-over-the-other statement to choose
between bagging and boosting since both have advantages and
disadvantages.
While bagging decreases the variance and reduces overfitting, it will only
rarely produce better bias. Boosting on the other hand side decreases the
bias but might be more overfitted that bagged models.
Bagging and boosting both uses all given features and select only the
entries randomly. Random forest on the other side is an extension to
bagging that creates also random subsets of the features. Therefore,
random forest is used more often in practice than bagging.