0% found this document useful (0 votes)

26 views15 pages

Ensemble Learning in Machine Learning

The document discusses ensemble learning in machine learning, focusing on three main techniques: bagging, boosting, and stacking. Each technique has its unique purpose, with bagging reducing variance, boosting reducing bias, and stacking improving overall accuracy by combining strong learners. Understanding these methods is essential for enhancing model performance and making informed choices in machine learning applications.

Uploaded by

Sakthi Vel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views15 pages

Ensemble Learning in Machine Learning

Uploaded by

Sakthi Vel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Ensemble Learning in Machine

Learning: Bagging, Boosting and

Stacking

Introduction

Machine learning is great! But there’s one thing that makes it even better: ensemble

learning. Ensemble learning in machine learning helps enhance the performance of

machine learning models. The concept behind it is simple. Multiple machine

learning models are combined to obtain a more accurate model.

Bagging, boosting and stacking are the three most popular ensemble learning

techniques. Each of these techniques offers a unique approach to improving

predictive accuracy. Each technique is used for a different purpose, with the use of
each depending on varying factors. Although each technique is different, many of us

find it hard to distinguish between them. Knowing when or why we should use each

technique is difficult.

In this blog, I’ll explain the difference between bagging, boosting and stacking. I’ll

explain their purposes, their processes, as well as their advantages and

disadvantages. So that by the end of this article, you will understand about

Ensemble learning in Machine Learning and how each technique works and which

technique to use and when.

By understanding the differences, you’ll be able to choose the best method for

improving your model’s accuracy.

This article was published as a part of the Data Science Blogathon.

Table of contents
What is Ensemble Learning in Machine Learning?

Ensemble learning in machine learning combines multiple individual models to

create a stronger, more accurate predictive model. By leveraging the diverse

strengths of different models, ensemble learning aims to mitigate errors, enhance

performance, and increase the overall robustness of predictions, leading to

improved results across various tasks in machine learning and data analysis.

How Did Ensemble Learning Come into Existence?

One of the first uses of ensemble methods was the bagging technique. This

technique was developed to overcome instability in decision trees. In fact, an

example of the bagging technique is the random forest algorithm. The random forest

is an ensemble of multiple decision trees. Decision trees tend to be prone to

overfitting. Because of this, a single decision tree can’t be relied on for making

predictions. To improve the prediction accuracy of decision trees, bagging is

employed to form a random forest. The resulting random forest has a lower variance

compared to the individual trees.

The success of bagging led to the development of other ensemble techniques such

as boosting, stacking, and many others. Today, these developments are an

important part of machine learning.

The many real-life machine learning applications show these ensemble methods’

importance. These applications include many critical systems. These include

decision-making systems, spam detection, autonomous vehicles, medical diagnosis,

and many others. These systems are crucial because they have the ability to impact

human lives and business revenues. Therefore ensuring the accuracy of machine

learning models is paramount. An inaccurate model can lead to disastrous

consequences for many businesses or organizations. At worst, they can lead to the

endangerment of human lives.

How Ensemble Learning Works?

Ensemble learning is a learning method that consists of combining multiple machine

learning models.
A problem in machine learning is that individual models tend to perform poorly. In

other words, they tend to have low prediction accuracy. To mitigate this problem, we

combine multiple models to get one with a better performance.

The individual models that we combine are known as weak learners. We call them

weak learners because they either have a high bias or high variance. Because they

either have high bias or variance, weak learners cannot learn efficiently and perform

poorly.

High-bias and High-variance Models

 A high-bias model results from not learning data well enough. It is not related

to the distribution of the data. Hence future predictions will be unrelated to the

data and thus incorrect.

 A high variance model results from learning the data too well. It varies with

each data point. Hence it is impossible to predict the next point accurately.
Both high bias and high variance models thus cannot generalize properly. Thus,

weak learners will either make incorrect generalizations or fail to generalize

altogether. Because of this, the predictions of weak learners cannot be relied on by

themselves.

As we know from the bias-variance trade-off, an underfit model has high bias and

low variance, whereas an overfit model has high variance and low bias. In either

case, there is no balance between bias and variance. For there to be a balance,

both the bias and variance need to be low. Ensemble learning tries to balance this

bias-variance trade-off by reducing either the bias or the variance.

It aims to reduce the bias if we have a weak model with high bias and low

variance. Ensemble learning will aim to reduce the variance if we have a weak

model with high variance and low bias. This way, the resulting model will be much

more balanced, with low bias and variance. Thus, the resulting model will be known

as a strong learner. This model will be more generalized than the weak learners. It

will thus be able to make accurate predictions.

Monitoring Ensemble Learning Models

Ensemble learning improves a model’s performance in mainly three ways:

 By reducing the variance of weak learners

 By reducing the bias of weak learners,

 By improving the overall accuracy of strong learners.

Bagging is used to reduce the variance of weak learners. Boosting is used to reduce

the bias of weak learners. Stacking is used to improve the overall accuracy of strong

learners.

Reducing Variance with Bagging

We use bagging for combining weak learners of high variance. Bagging aims to

produce a model with lower variance than the individual weak models. These weak

learners are homogenous, meaning they are of the same type.

Bagging is also known as Bootstrap aggregating. It consists of two steps:

bootstrapping and aggregation.

Bootstrapping
Involves resampling subsets of data with replacement from an initial dataset. In

other words, subsets of data are taken from the initial dataset. These subsets

of data are called bootstrapped datasets or, simply, bootstraps. Resampled ‘with

replacement’ means an individual data point can be sampled multiple times. Each

bootstrap dataset is used to train a weak learner.

Aggregating

The individual weak learners are trained independently from each other. Each

learner makes independent predictions. The results of those predictions are

aggregated at the end to get the overall prediction. The predictions are aggregated

using either max voting or averaging.

Max Voting

It is a commonly used for classification problems that consists of taking the mode of

the predictions (the most occurring prediction). It is called voting because like in

election voting, the premise is that ‘the majority rules’. Each model makes a

prediction. A prediction from each model counts as a single ‘vote’. The most

occurring ‘vote’ is chosen as the representative for the combined model.

Averaging

It is generally used for regression problems. It involves taking the average of the

predictions. The resulting average is used as the overall prediction for the combined

model.

Steps of Bagging
The steps of bagging are as follows:

1. We have an initial training dataset containing n-number of instances.

2. We create a m-number of subsets of data from the training set. We take a

subset of N sample points from the initial dataset for each subset. Each

subset is taken with replacement. This means that a specific data point can

be sampled more than once.

3. For each subset of data, we train the corresponding weak learners

independently. These models are homogeneous, meaning that they are of the

same type.

4. Each model makes a prediction.

5. The predictions are aggregated into a single prediction. For this, either max

voting or averaging is used.

Reducing Bias by Boosting

We use boosting for combining weak learners with high bias. Boosting aims to

produce a model with a lower bias than that of the individual models. Like in

bagging, the weak learners are homogeneous.

Boosting involves sequentially training weak learners. Here, each subsequent

learner improves the errors of previous learners in the sequence. A sample of data

is first taken from the initial dataset. This sample is used to train the first model, and

the model makes its prediction. The samples can either be correctly or incorrectly

predicted. The samples that are wrongly predicted are reused for training the next

model. In this way, subsequent models can improve on the errors of previous

models.

Unlike bagging, which aggregates prediction results at the end, boosting aggregates

the results at each step. They are aggregated using weighted averaging.

Weighted averaging involves giving all models different weights depending on their

predictive power. In other words, it gives more weight to the model with the highest

predictive power. This is because the learner with the highest predictive power is

considered the most important.

Steps of Boosting
Boosting works with the following steps:

1. We sample m-number of subsets from an initial training dataset.

2. Using the first subset, we train the first weak learner.

3. We test the trained weak learner using the training data. As a result of the

testing, some data points will be incorrectly predicted.

4. Each data point with the wrong prediction is sent into the second subset of

data, and this subset is updated.

5. Using this updated subset, we train and test the second weak learner.

6. We continue with the following subset until the total number of subsets is

reached.

7. We now have the total prediction. The overall prediction has already been

aggregated at each step, so there is no need to calculate it.

Improving Model Accuracy with Stacking

We use stacking to improve the prediction accuracy of strong learners. Stacking

aims to create a single robust model from multiple heterogeneous strong learners.

Stacking differs from bagging and boosting in that:

 It combines strong learners

 It combines heterogeneous models

 It consists of creating a Metamodel. A metamodel is a model created using a

new dataset.

Individual heterogeneous models are trained using an initial dataset. These models

make predictions and form a single new dataset using those predictions. This new

data set is used to train the metamodel, which makes the final prediction. The

prediction is combined using weighted averaging.

Because stacking combines strong learners, it can combine bagged or boosted

models.

Steps of Stacking
The steps of Stacking are as follows:

1. We use initial training data to train m-number of algorithms.

2. Using the output of each algorithm, we create a new training set.

3. Using the new training set, we create a meta-model algorithm.

4. Using the results of the meta-model, we make the final prediction. The results

are combined using weighted averaging.

When to use Bagging vs Boosting vs Stacking?

If you want to reduce the overfitting or variance of your model, you use bagging and

if you are looking to reduce underfitting or bias, you use boosting. However, if you

want to increase predictive accuracy, use stacking.

Bagging and boosting both works with homogeneous weak learners. Stacking works

using heterogeneous solid learners.

All three of these methods can work with either classification or regression

problems.

One disadvantage of boosting is that it is prone to variance or overfitting. It is thus

not advisable to use boosting for reducing variance. Boosting will do a worse job in

reducing variance as compared to bagging.

On the other hand, the converse is true. It is not advisable to use bagging to reduce

bias or underfitting. This is because bagging is more prone to bias and does not

help reduce bias.

Stacked models have the advantage of better prediction accuracy than bagging or

boosting. But because they combine bagged or boosted models, they have the

disadvantage of needing much more time and computational power. If you are

looking for faster results, it’s advisable not to use stacking. However, stacking is the

way to go if you’re looking for high accuracy.

Conclusion

Bagging, boosting, and stacking are vital techniques in ensemble learning in

machine learning. They play a crucial role in enhancing model accuracy and

mitigating the risks associated with inaccurate predictions. Here are the key insights

gleaned from the article:

 Ensemble learning combines multiple machine learning models into a single

model. The aim is to increase the performance of the model.

 Bagging aims to decrease variance, boosting aims to decrease bias, and

stacking aims to improve prediction accuracy.

 Bagging and boosting combine homogenous weak learners. Stacking

combines heterogeneous solid learners.

 Bagging trains models in parallel and boosting trains the models sequentially.

Stacking creates a meta-model.

If you want to know more about machine learning and AI concepts then enroll in

our blackbelt plus program !

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
UMl - unit 3
No ratings yet
UMl - unit 3
50 pages
Ensemble Learning-Bagging-Boosting-Stacking
No ratings yet
Ensemble Learning-Bagging-Boosting-Stacking
12 pages
learning algorithms
No ratings yet
learning algorithms
24 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
unit 5 ML
No ratings yet
unit 5 ML
14 pages
7 - Ensemble Techniques-Converted Updated
No ratings yet
7 - Ensemble Techniques-Converted Updated
8 pages
UNIT3_class
No ratings yet
UNIT3_class
30 pages
B43 Exp4 ML
No ratings yet
B43 Exp4 ML
6 pages
Ai ML Unit 4 Notes
No ratings yet
Ai ML Unit 4 Notes
42 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Ensembling in Python
No ratings yet
Ensembling in Python
20 pages
Ensemble Methods_ Bagging, Boosting and Stacking _ by Joseph Rocca _ Towards Data Science
No ratings yet
Ensemble Methods_ Bagging, Boosting and Stacking _ by Joseph Rocca _ Towards Data Science
20 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
Ensemble Methods (Final)
No ratings yet
Ensemble Methods (Final)
16 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Unit 3
No ratings yet
Unit 3
99 pages
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
No ratings yet
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
37 pages
Ensemble_Techniques_Presentation
No ratings yet
Ensemble_Techniques_Presentation
17 pages
Unit IV Aiml
No ratings yet
Unit IV Aiml
32 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Unit 4
No ratings yet
Unit 4
17 pages
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
No ratings yet
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
21 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
39 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
ensemble
No ratings yet
ensemble
33 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
24 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Classification Through Ensembling Techniques
No ratings yet
Classification Through Ensembling Techniques
10 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
UNIT-5 ML notes
No ratings yet
UNIT-5 ML notes
24 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Ensemble TBL Notes
No ratings yet
Ensemble TBL Notes
2 pages
UNIT IV
No ratings yet
UNIT IV
18 pages
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
6 pages
Improving Model Performance
No ratings yet
Improving Model Performance
11 pages
Bagging vs Boosting in Machine Learning
No ratings yet
Bagging vs Boosting in Machine Learning
5 pages
Ensemble Learning
No ratings yet
Ensemble Learning
46 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
UNIT 3 AML
No ratings yet
UNIT 3 AML
9 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Ensemble Learning
No ratings yet
Ensemble Learning
30 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Jntuk R20 ML Unit-Iii
100% (1)
Jntuk R20 ML Unit-Iii
21 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Bagging & Boosting
No ratings yet
Bagging & Boosting
10 pages
E4fbc2f-C755-Ed1a-C18-F18ec25eb0d Ensemble Learning Bagging Boosting and Stacking
No ratings yet
E4fbc2f-C755-Ed1a-C18-F18ec25eb0d Ensemble Learning Bagging Boosting and Stacking
6 pages
Unit V -Multiple Learners
No ratings yet
Unit V -Multiple Learners
54 pages
unit 4
No ratings yet
unit 4
45 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Healthcare Simulation in Practice
From Everand
Healthcare Simulation in Practice
Mark Hellaby
5/5 (1)
Spam Email Classifier_Ramsanjay
No ratings yet
Spam Email Classifier_Ramsanjay
2 pages
1810 09479
No ratings yet
1810 09479
6 pages
Feature Selection based on F-score for Enhancing CTG Data Classification
No ratings yet
Feature Selection based on F-score for Enhancing CTG Data Classification
5 pages
2409.19390v1
No ratings yet
2409.19390v1
7 pages
Logbook
No ratings yet
Logbook
23 pages
Artificial Intelligence and Training: Opportunities and Challenges in The Zimbabwean Mining Industry
No ratings yet
Artificial Intelligence and Training: Opportunities and Challenges in The Zimbabwean Mining Industry
13 pages
modeling process in machine learning - Google Search
No ratings yet
modeling process in machine learning - Google Search
2 pages
Inception-V3 For Flower Classification
No ratings yet
Inception-V3 For Flower Classification
5 pages
SMS SPAM FILTERING report
No ratings yet
SMS SPAM FILTERING report
38 pages
compit2022_pontignano
No ratings yet
compit2022_pontignano
460 pages
DenseNet For Brain Tumor Classification in MRI Images
100% (1)
DenseNet For Brain Tumor Classification in MRI Images
9 pages
Unit 4
No ratings yet
Unit 4
27 pages
Crop
No ratings yet
Crop
53 pages
Best Websites for Learning Machine Learning
No ratings yet
Best Websites for Learning Machine Learning
5 pages
AI Project Cycle_Question Bank
No ratings yet
AI Project Cycle_Question Bank
34 pages
Labor Market Prediction Using Machine Learning Methods A Systematic Literature Review
No ratings yet
Labor Market Prediction Using Machine Learning Methods A Systematic Literature Review
5 pages
Early Detection of Citrus Leaf Disease Using Deep Learning Model
No ratings yet
Early Detection of Citrus Leaf Disease Using Deep Learning Model
17 pages
Advanced_Detection_of_AI-Generated_Images_Through_Vision_Transformers
No ratings yet
Advanced_Detection_of_AI-Generated_Images_Through_Vision_Transformers
9 pages
Facial Recognition of Cattle Based On SK-ResNet
No ratings yet
Facial Recognition of Cattle Based On SK-ResNet
10 pages
CHP 1
No ratings yet
CHP 1
47 pages
2-Notes - Linear Regression-1
No ratings yet
2-Notes - Linear Regression-1
4 pages
Latex
No ratings yet
Latex
4 pages
Generalized Binary Interaction Parameters For The Peng-Robinson Equation of State
No ratings yet
Generalized Binary Interaction Parameters For The Peng-Robinson Equation of State
58 pages
AIML Assignment 7
No ratings yet
AIML Assignment 7
3 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
007-Discrete Dynamics in Nature and Society - 2022 - Alkhammash - Optimized Multivariate Adaptive Regression Splines For
No ratings yet
007-Discrete Dynamics in Nature and Society - 2022 - Alkhammash - Optimized Multivariate Adaptive Regression Splines For
9 pages
Generative AI For Synthetic Data Generation Methods, Challenges and The Future
No ratings yet
Generative AI For Synthetic Data Generation Methods, Challenges and The Future
8 pages
Crab Identify Shell
No ratings yet
Crab Identify Shell
12 pages
ML Practicals
No ratings yet
ML Practicals
10 pages
A Novel Image Style Transfer Model Using Generative AI
No ratings yet
A Novel Image Style Transfer Model Using Generative AI
72 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.