0% found this document useful (0 votes)
5 views26 pages

Ensemble Learning

The document provides an overview of Ensemble Learning, highlighting its motivation, basic and advanced techniques, including Bagging and Boosting. It explains how combining multiple models can enhance predictive accuracy and outlines various methods such as Max Voting, Averaging, Stacking, and Blending. Additionally, it details specific algorithms like AdaBoost, Gradient Boosting, and XGBoost, emphasizing their applications and differences in handling data.

Uploaded by

nu726322
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views26 pages

Ensemble Learning

The document provides an overview of Ensemble Learning, highlighting its motivation, basic and advanced techniques, including Bagging and Boosting. It explains how combining multiple models can enhance predictive accuracy and outlines various methods such as Max Voting, Averaging, Stacking, and Blending. Additionally, it details specific algorithms like AdaBoost, Gradient Boosting, and XGBoost, emphasizing their applications and differences in handling data.

Uploaded by

nu726322
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Ensemble Learning

Instructor: Dr. Umara Zahid


MSCS Fall 2022
Agenda
• Motivation • Algorithms based on Bagging
• Introduction to Ensemble and Boosting
Learning • Bagging meta-estimator
• Basic Ensemble Techniques • Random Forest
• Max Voting • AdaBoost
• Averaging • GBM
• Weighted Average • XGB
• Light GBM
• Advanced Ensemble Techniques
• CatBoost
• Stacking
• Blending
• Bagging
• Boosting
Motivation
• You have to buy something? How you do it?
• Example: You want to buy a new car (Two approaches)
1. You will walk up to the first car shop and purchase one
based on the advice of the dealer. Is it so?
2. You would likely browser a few web portals where
people have posted their reviews and compare different
car models, checking for their features and prices. You
will also probably ask your friends and colleagues for
their opinion. (In short, you wouldn’t directly reach a
conclusion, but will instead make a decision considering
the opinions of other people as well)
• Review or opinion-based decisions
Another Example (Its not just
buying/ selling related problem)
• You are a movie director, you created a short movie
• Now, you want to take preliminary feedback (ratings)
on the movie before
Inferences : making it public.
• What
1. Aare the possible
diverse group ofways by which
people youtocan
are likely do better
make that? decisions as
1. Ask compared
one of yourtofriends to rate the movie
individuals
• Biased Reviewa diverse set of machine learning models are likely to
2. Similarly,
make better decisions as compared to single models
2. Asking 5 colleagues to rate the movie
This diversification
• Unbiased/ in Machine Learning is achieved by a
less number of people/ Subject Matter Experts
technique
3. Asking 50 people to rate thecalled
movieEnsemble Learning
• More Generalized Review
What is Ensemble Learning?
• Ensemble methods are machine learning techniques that combines
several base models in order to produce one optimal predictive model
(What?)
• Ensemble learning techniques attempt to make the performance of the
predictive models better by improving their accuracy (Why?)
• Ensemble Learning is a process using which multiple machine learning
models (such as classifiers) are strategically constructed to solve a
particular problem (How?)
• In another way (Problem Symptom):
• To reduce the variance of certain ML models, such as, (neural networks),
multiple models are trained instead of a single model to combine the
predictions from these models.
Basic Ensemble Techniques
1. Max Voting
2. Averaging
3. Weighted Average
Max Voting
• In this technique, multiple models are used to make predictions for each
data point. The predictions by each model are considered as a ‘vote’. The
predictions which we get from the majority of the models are used as the
final prediction
• Considering previous example, you asked 5 of your colleagues to rate your
movie (out of 5); suppose three of them rated it as 4 while two of them
gave it a 5. Since the majority gave a rating of 4, the final rating will be
taken as 4

• The max voting method is generally used for classification problems


Averaging
• Similar to the max voting technique, multiple predictions are made
for each data point in averaging
• In this method, we take an average of predictions from all the models
and use it to make the final prediction
• For example, the averaging method would take the average of all the
values
• (5+4+5+4+4)/5 = 4.4
• Averaging can be used for making predictions in regression problems
or while calculating probabilities for classification problems
Weighted Average
• This is an extension of the averaging method
• All models are assigned different weights defining the importance of
each model for prediction.
• For instance, if two of your colleagues are critics, while others have no
prior experience in this field, then the answers by these two friends
are given more importance as compared to the other people

• The result is calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18) +


(4*0.18)/5] = 4.41.
Advanced Ensemble Techniques
• Stacking
• Blending
• Bagging
• Boosting
Stacking
• Train multiple base models (Level-0 models) on your training data.
• Then, you use their predictions to create a new dataset (called meta-features).
• A new model — called the meta-model (Level-1 model) — is trained on these
meta-features to make the final prediction
• Important Detail:
• To avoid data leakage, stacking typically uses k-fold cross-validation when
generating predictions for the meta-model.
• Example:
• Base models: Decision Tree, SVM, Logistic Regression
• Meta-model: Random Forest trained on predictions from the base models
Stacking
• Stacking uses predictions from multiple
models (for example decision tree, knn or
svm) to build a new model. This model is
used for making predictions on the test set.
• Step 1: The train set is split into 10 parts
• Step 2: A base model (suppose a decision
tree) is fitted on 9 parts and predictions are
made for the 10th part. This is done for
each part of the train set
• Step 3: Using this model, predictions are
made on the test set
Continued…
• Steps 2 to 3 are repeated for another base
model (say knn) resulting in another set of
predictions for the train set and test set
• The predictions from the train set are used as
features to build a new model
• This model is used to make final predictions on
the test prediction set
Blending
• Split the training set into two parts:
• Train set (e.g., 70%)
• Holdout set (e.g., 30%)
• Train base models on the train set.
• Use these trained models to predict on the holdout set.
• Train the meta-model on these holdout predictions and use it for final
predictions.
• Important Detail:
• Blending is simpler and faster, but it wastes some data (the holdout set is
not used to train base models).
Blending
• Blending follows the same approach as stacking but uses only a
holdout (validation) set from the train set to make predictions
• In other words, unlike stacking, the predictions are made on the
holdout set only
• The holdout set and the predictions are used to build a model which
is run on the test set.
Blending
• Step 1: The train set is split into training
and validation sets
• Step 2: Model(s) are fitted on the
training set. The predictions are made
on the validation set and the test set
• Step 3: The validation set and its
predictions are used as features to build
a new model. This model is used to
make final predictions on the test and
meta-features.
• Tranining set 70%, Validation 10%
(Tuning) test set 20%
Key Differences Between Stacking
and Blending
•Use blending if you're
quickly experimenting or
working with large datasets.

•Use stacking for more


robust, high-performance
models especially in
competitions or production.
Bagging
• The idea behind bagging is combining the results of multiple models (for
instance, all decision trees) to get a generalized result
• Here’s a question: If you create all the models on the same set of data and
combine it, will it be useful? There is a high chance that these models will
give the same result since they are getting the same input. So how can we
solve this problem? One of the techniques is bootstrapping.
• Bootstrapping is a sampling technique in which we create subsets of
observations from the original dataset, with replacement. The size of the
subsets is the same as the size of the original set.
• Bagging (or Bootstrap Aggregating) technique uses these subsets (bags) to
get a fair idea of the distribution (complete set). The size of subsets created
for bagging may be less than the original set.
Steps of Bagging
• Multiple subsets are created from the
original dataset, selecting observations
with replacement.
• A base model (weak model) is created on
each of these subsets.
• The models run in parallel and are
independent of each other.
• The final predictions are determined by
combining the predictions from all the
models
Boosting
• Boosting Machine Learning is one such
technique that can be used to solve
complex, data-driven, real-world
problems
• Boosting is an ensemble learning
technique that uses a set of Machine
Learning algorithms to convert weak
learner to strong learners in order to
increase the accuracy of the model
Difference between Bagging and
Boosting
• Bagging Vs. Boosting
• Parallel ensemble, popularly known as bagging
• The weak learners are produced parallelly during the
training phase
• The performance of the model can be increased by
parallelly training a number of weak learners on
bootstrapped data sets
• Examples: Random Forest algorithm, Bagging meta-
estimator
• Sequential ensemble, popularly known as boosting
• The weak learners are sequentially produced during the
training phase
• The performance of the model is improved by assigning
a higher weightage to the previous, incorrectly classified
samples.
• AdaBoost, Gradient Boosting, XGBoost
How Boosting Works?
• Step 1: The base algorithm reads the data
and assigns equal weight to each sample
observation.
• Step 2: False predictions made by the base
learner are identified. In the next iteration,
these false predictions are assigned to the
next base learner with a higher weightage
on these incorrect predictions.
• Step 3: Repeat step 2 until the algorithm
can correctly classify the output.
• Therefore, the main aim of Boosting is to
focus more on miss-classified predictions
but, it can be used for regression problems
as well
Adaptive Boosting (Adaboost)
• AdaBoost is implemented by combining several weak learners into a single strong
learner.
• The weak learners in AdaBoost take into account a single input feature and draw
out a single split decision tree called the decision stump.
• Each observation is weighed equally while drawing out the first decision stump.
• The results from the first decision stump are analyzed and if any observations are
wrongfully classified, they are assigned higher weights.
• Post this, a new decision stump is drawn by considering the observations with
higher weights as more significant.
• Again if any observations are misclassified, they’re given higher weight and this
process continues until all the observations fall into the right class.
• Adaboost can be used for both classification and regression-based problems,
however, it is more commonly used for classification purpose.
Gradient Boosting
• The difference in this type of boosting is that the weights for
misclassified outcomes are not incremented, instead, Gradient
Boosting method tries to optimize the loss function of the previous
learner by adding a new model that adds weak learners in order to
reduce the loss function.
• The main idea here is to overcome the errors in the previous learner’s
predictions. This type of boosting has three main components:
• Loss function that needs to be ameliorated.
• Weak learner for computing predictions and forming strong learners.
• An Additive Model that will regularize the loss function.
• Like AdaBoost, Gradient Boosting can also be used for both
classification and regression problems.
XGBoost
• Motivation
• GBoost is an advanced version of Gradient boosting method, it literally
means eXtreme Gradient Boosting. XGBoost developed by Tianqi Chen,
falls under the category of Distributed Machine Learning Community
(DMLC).
• The main aim of this algorithm is to increase the speed and efficiency of
computation. The Gradient Descent Boosting algorithm computes the
output at a slower rate since they sequentially analyze the data set,
therefore XGBoost is used to boost or extremely boost the performance of
the model.
XGBoost
• XGBoost is designed to focus on
computational speed and model
efficiency. The main features provided
by XGBoost are:
• Parallelly creates decision trees.
• Implementing distributed computing
methods for evaluating large and complex
models.
• Using Out-of-Core Computing to analyze
huge datasets.
• Implementing cache optimization to make
the best use of resources.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy