Ensemble Learning in Machine Learning
Ensemble Learning in Machine Learning
Introduction
Machine learning is great! But there’s one thing that makes it even better: ensemble
Bagging, boosting and stacking are the three most popular ensemble learning
predictive accuracy. Each technique is used for a different purpose, with the use of
each depending on varying factors. Although each technique is different, many of us
find it hard to distinguish between them. Knowing when or why we should use each
technique is difficult.
In this blog, I’ll explain the difference between bagging, boosting and stacking. I’ll
disadvantages. So that by the end of this article, you will understand about
Ensemble learning in Machine Learning and how each technique works and which
By understanding the differences, you’ll be able to choose the best method for
Table of contents
What is Ensemble Learning in Machine Learning?
improved results across various tasks in machine learning and data analysis.
example of the bagging technique is the random forest algorithm. The random forest
overfitting. Because of this, a single decision tree can’t be relied on for making
employed to form a random forest. The resulting random forest has a lower variance
The success of bagging led to the development of other ensemble techniques such
The many real-life machine learning applications show these ensemble methods’
and many others. These systems are crucial because they have the ability to impact
human lives and business revenues. Therefore ensuring the accuracy of machine
consequences for many businesses or organizations. At worst, they can lead to the
learning models.
A problem in machine learning is that individual models tend to perform poorly. In
other words, they tend to have low prediction accuracy. To mitigate this problem, we
The individual models that we combine are known as weak learners. We call them
weak learners because they either have a high bias or high variance. Because they
either have high bias or variance, weak learners cannot learn efficiently and perform
poorly.
A high-bias model results from not learning data well enough. It is not related
to the distribution of the data. Hence future predictions will be unrelated to the
A high variance model results from learning the data too well. It varies with
each data point. Hence it is impossible to predict the next point accurately.
Both high bias and high variance models thus cannot generalize properly. Thus,
themselves.
As we know from the bias-variance trade-off, an underfit model has high bias and
low variance, whereas an overfit model has high variance and low bias. In either
case, there is no balance between bias and variance. For there to be a balance,
both the bias and variance need to be low. Ensemble learning tries to balance this
It aims to reduce the bias if we have a weak model with high bias and low
variance. Ensemble learning will aim to reduce the variance if we have a weak
model with high variance and low bias. This way, the resulting model will be much
more balanced, with low bias and variance. Thus, the resulting model will be known
as a strong learner. This model will be more generalized than the weak learners. It
Bagging is used to reduce the variance of weak learners. Boosting is used to reduce
the bias of weak learners. Stacking is used to improve the overall accuracy of strong
learners.
We use bagging for combining weak learners of high variance. Bagging aims to
produce a model with lower variance than the individual weak models. These weak
Bootstrapping
Involves resampling subsets of data with replacement from an initial dataset. In
other words, subsets of data are taken from the initial dataset. These subsets
of data are called bootstrapped datasets or, simply, bootstraps. Resampled ‘with
replacement’ means an individual data point can be sampled multiple times. Each
Aggregating
The individual weak learners are trained independently from each other. Each
aggregated at the end to get the overall prediction. The predictions are aggregated
Max Voting
It is a commonly used for classification problems that consists of taking the mode of
the predictions (the most occurring prediction). It is called voting because like in
election voting, the premise is that ‘the majority rules’. Each model makes a
prediction. A prediction from each model counts as a single ‘vote’. The most
Averaging
It is generally used for regression problems. It involves taking the average of the
predictions. The resulting average is used as the overall prediction for the combined
model.
Steps of Bagging
The steps of bagging are as follows:
subset of N sample points from the initial dataset for each subset. Each
subset is taken with replacement. This means that a specific data point can
independently. These models are homogeneous, meaning that they are of the
same type.
5. The predictions are aggregated into a single prediction. For this, either max
We use boosting for combining weak learners with high bias. Boosting aims to
produce a model with a lower bias than that of the individual models. Like in
learner improves the errors of previous learners in the sequence. A sample of data
is first taken from the initial dataset. This sample is used to train the first model, and
the model makes its prediction. The samples can either be correctly or incorrectly
predicted. The samples that are wrongly predicted are reused for training the next
model. In this way, subsequent models can improve on the errors of previous
models.
Unlike bagging, which aggregates prediction results at the end, boosting aggregates
the results at each step. They are aggregated using weighted averaging.
Weighted averaging involves giving all models different weights depending on their
predictive power. In other words, it gives more weight to the model with the highest
predictive power. This is because the learner with the highest predictive power is
Steps of Boosting
Boosting works with the following steps:
3. We test the trained weak learner using the training data. As a result of the
4. Each data point with the wrong prediction is sent into the second subset of
5. Using this updated subset, we train and test the second weak learner.
6. We continue with the following subset until the total number of subsets is
reached.
7. We now have the total prediction. The overall prediction has already been
aims to create a single robust model from multiple heterogeneous strong learners.
new dataset.
Individual heterogeneous models are trained using an initial dataset. These models
make predictions and form a single new dataset using those predictions. This new
data set is used to train the metamodel, which makes the final prediction. The
models.
Steps of Stacking
The steps of Stacking are as follows:
4. Using the results of the meta-model, we make the final prediction. The results
if you are looking to reduce underfitting or bias, you use boosting. However, if you
Bagging and boosting both works with homogeneous weak learners. Stacking works
All three of these methods can work with either classification or regression
problems.
not advisable to use boosting for reducing variance. Boosting will do a worse job in
bias or underfitting. This is because bagging is more prone to bias and does not
Stacked models have the advantage of better prediction accuracy than bagging or
boosting. But because they combine bagged or boosted models, they have the
disadvantage of needing much more time and computational power. If you are
looking for faster results, it’s advisable not to use stacking. However, stacking is the
Conclusion
machine learning. They play a crucial role in enhancing model accuracy and
mitigating the risks associated with inaccurate predictions. Here are the key insights
Bagging trains models in parallel and boosting trains the models sequentially.