0% found this document useful (0 votes)
32 views8 pages

Bagging and Boosting

The document discusses ensemble learning techniques, specifically bagging and boosting, which combine multiple models to improve prediction accuracy and reduce overfitting or underfitting. Bagging uses random subsets of data to create independent models that are aggregated, while boosting sequentially updates models based on misclassified entries to reduce bias. Both methods aim to enhance prediction stability, but they differ in their approach to data partitioning, model independence, and the goals of variance reduction versus bias reduction.

Uploaded by

jdzalm157
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views8 pages

Bagging and Boosting

The document discusses ensemble learning techniques, specifically bagging and boosting, which combine multiple models to improve prediction accuracy and reduce overfitting or underfitting. Bagging uses random subsets of data to create independent models that are aggregated, while boosting sequentially updates models based on misclassified entries to reduce bias. Both methods aim to enhance prediction stability, but they differ in their approach to data partitioning, model independence, and the goals of variance reduction versus bias reduction.

Uploaded by

jdzalm157
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Ensemble Learning: Bagging and Boosting

#1: Introduction and main idea: ensemble learning

So when should we use it? Cleary, when we see overfitting or underfitting


of our models. Let’s begin with the key concept of bagging and boosting,
which both belong to the family of ensemble learning techniques:

The main idea behind ensemble learning is the usage of multiple


algorithms and models that are used together for the same task. While
single models use only one algorithm to create prediction models, bagging
and boosting methods aim to combine several of those to achieve better
prediction with higher consistency compared to individual learnings.

Example: Image classification

The essential concept is encapsulated by means of a didactic illustration


involving image classification. Supposing a collection of images, each
accompanied by a categorical label corresponding to the kind of animal, is
available for the purpose of training a model. In a traditional modelling
approach, we would try several techniques and calculate the accuracy to
choose one over the other. Imagine we used logistic regression, decision
tree, and support vector machines here that perform differently on the
given data set.
In the above example, it was observed that a specific record was
predicted as a dog by the logistic regression and decision tree models,
while a support vector machine identified it as a cat. As various models
have their distinct advantages and disadvantages for particular records, it
is the key idea of ensemble learning to combine all three models instead
of selecting only one approach that showed the highest accuracy.

The procedure is called aggregation or voting and combines the


predictions of all underlying models, to come up with one prediction that
is assumed to be more precise than any sub-model that would stay alone.

Bias-Variance trade-off

The next chart might be familiar to some of you, but it represents quite
well the relationship and the trade-off between bias and variance on the
test error rate.

You might be familiar with the following concept, but I posit that it
effectively illustrates the correlation and compromise between bias and
variance with respect to the testing error rate.
The relationship between the variance and bias of a model is such that a
reduction in variance results in an increase in bias, and vice versa. To
achieve optimal performance, the model must be positioned at an
equilibrium point, where the test error rate is minimized, and the variance
and bias are appropriately balanced.

Ensemble learning can help to balance both extreme cases to a more


stable prediction. One method is called bagging and the other is called
boosting.

#2: Bagging (bootstrap aggregation)

Let us focus first on the Bagging technique called bootstrap aggregation.


Bootstrap aggregation aims to solve the right extreme of the previous
chart by reducing the variance of the model to avoid overfitting.

With this purpose, the idea is to have multiple models of the same
learning algorithm that are trained by random subsets of the original
training data. Those random subsets are called bags and can contain any
combination of the data. Each of those datasets is then used to fit an
individual model which produces individual predictions for the given data.
Those predictions are then aggregated into one final classifier. The idea of
this method is really close to our initial toy example with the cats and
dogs.

Using random subsets of data, the risk of overfitting is reduced and


flattened by averaging the results of the sub-models. All models are
calculated in parallel and then aggregated together afterward.

The calculation of the final ensemble aggregation uses either the simple
average for regression problems or a simple majority vote for
classification problems. For that, each model from each random sample
produces a prediction for that given subset. For the average, those
predictions are just summed up and divided by the number of created
bags.

A simple majority voting works similarly but uses the predicted classes
instead of numeric values. The algorithm identifies the class with the most
predictions and assumes that the majority is the final aggregation. This is
again very similar to our toy example, where two out of three algorithms
predicted a picture to be a dog and the final aggregation was therefore a
dog prediction.

Random Forest
A famous extension to the bagging method is the random forest
algorithm, which uses the idea of bagging but uses also subsets of the
features and not only subsets of the entries. Bagging, on the other hand,
takes all given features into account.
 base_estimator: You have to provide the underlying algorithm that
should be used by the random subsets in the bagging procedure in
the first parameter. This could be for example Logistic Regression,
Support Vector Classification, Decision trees, or many more.

 n_estimators: The number of estimators defines the number of bags


you would like to create here and the default value for that is 10.

 max_samples: The maximum number of samples defines how many


samples should be drawn from X to train each base estimator. The
default value here is one point zero which means that the total
number of existing entries should be used. You could also say that
you want only 80% of the entries by setting it to 0.8.

After setting the scenes, this model object works like many other models
and can be trained using the fit()procedure including X and y data from
the training set. The corresponding predictions on test data can be done
using predict().

#3: Boosting

Boosting is a little variation of the bagging algorithm and uses sequential


processing instead of parallel calculations. While bagging aims to reduce
the variance of the model, the boosting method tries aims to reduce the
bias to avoid underfitting the data. With that idea in mind, boosting also
uses a random subset of the data to create an average-performing model
on that.

For that, it uses the miss-classified entries of the weak model with some
other random data to create a new model. Therefore, the different models
are not randomly chosen but are mainly influenced by wrong classified
entries of the previous model. The steps for this technique are the
following:

1. Train initial (weak) model


You create a subset of the data and train a weak learning model
which is assumed to be the final ensemble model at this stage. You
then analyze the results on the given training data set and can
identify those entries that were misclassified.

2. Update weights and train a new model


You create a new random subset of the original training data but
weight those misclassified entries higher. This dataset is then used
to train a new model.

3. Aggregate the new model with the ensemble model


The next model should perform better on the more difficult entries
and will be combined (aggregated) with the previous one into the
new final ensemble model.

Essentially, we can repeat this process multiple times and continuously


update the ensemble model until our prediction power is good enough.
The key idea here is clearly to create models that are also able to predict
the more difficult data entries. This can then lead to a better fit of the
model and reduces the bias.

In comparison to Bagging, this technique uses weighted voting or


weighted averaging based on the coefficients of the models that are
considered together with their predictions. Therefore, this model can
reduce underfitting, but might also tend to overfit sometimes.

Code example for boosting

In the following, we will look at a similar code example but for boosting.
Obviously, there exist multiple boosting algorithms. Besides
the GradientDescent methodology, the AdaBoost is one of the most
popular.
 base_estimator: Similar to Bagging, you need to define which
underlying algorithm you would like to use.

 n_estimators: The amount of estimators defines the maximum


number of iterations at which the boosting is terminated. It is called
the “maximum” number, because the algorithm will stop on its own,
in case good performance is achieved earlier.

 learning_rate: Finally, the learning rate controls how much the new
model is going to contribute to the previous one. Normally there is a
trade-off between the number of iterations and the value of the
learning rate. In other words: when taking smaller values of the
learning rate, you should consider more estimators, so that your
base model (the weak classifier) continues to improve.

The fit()and predict()procedures work similarly to the previous bagging


example. As you can see, it is easy to use such functions from existing
libraries. But of course, you can also implement your own algorithms to
build both techniques.

#4: Conclusion: differences & similarities

Since we learned briefly how bagging and boosting work, I would like to
put the focus now on comparing both methods against each other.

Similarities

 Ensemble methods
In a general view, the similarities between both techniques start
with the fact that both are ensemble methods with the aim to use
multiple learners over a single model to achieve better results.

 Multiple samples & aggregation


To do that, both methods generate random samples and multiple
training data sets. It is also similar that Bagging and Boosting both
arrive at the end decision by aggregation of the underlying models:
either by calculating average results or by taking a voting rank.

 Purpose
Finally, it is reasonable that both aim to produce higher stability and
better prediction for the data.

Differences

 Data partition | whole data vs. bias


While bagging uses random bags out of the training data for all
models independently, boosting puts higher importance on
misclassified data of the upcoming models. Therefore, the data
partition is different here.
 Models | independent vs. sequences
Bagging creates independent models that are aggregated together.
However, boosting updates the existing model with the new ones in
a sequence. Therefore, the models are affected by previous builds.

 Goal | variance vs. bias


Another difference is the fact that bagging aims to reduce the
variance, but boosting tries to reduce the bias. Therefore, bagging
can help to decrease overfitting, and boosting can reduce
underfitting.

 Function | weighted vs. non-weighted


The final function to predict the outcome uses equally weighted
average or equally weighted voting aggregations within the bagging
technique. Boosting uses weighted majority vote or weighted
average functions with more weight to those with better
performance on training data.

Implications

It was shown that the main idea of both methods is to use multiple models
together to achieve better predictions compared so single learning
models. However, there is no one-over-the-other statement to choose
between bagging and boosting since both have advantages and
disadvantages.

While bagging decreases the variance and reduces overfitting, it will only
rarely produce better bias. Boosting on the other hand side decreases the
bias but might be more overfitted that bagged models.

Coming back to the variance-bias tradeoff figure, I tried to visualize the


extreme cases when each method seems appropriate. However, this does
not mean that they achieve the results without any drawbacks. The aim
should always be to keep bias and variance in a reasonable balance.

Bagging and boosting both uses all given features and select only the
entries randomly. Random forest on the other side is an extension to
bagging that creates also random subsets of the features. Therefore,
random forest is used more often in practice than bagging.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy