0% found this document useful (0 votes)
17 views10 pages

Classification Through Ensembling Techniques

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views10 pages

Classification Through Ensembling Techniques

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Classification through Ensembling Techniques:

 Ensemble learning is a machine learning technique that enhances accuracy in


forecasting by merging predictions from multiple models.
 It aims to mitigate errors or biases that may exist in individual models by
leveraging the collective intelligence of the ensemble.
 The underlying concept behind ensemble learning is to combine the outputs of
diverse models to create a more precise prediction.
 By considering multiple perspectives and utilizing the strengths of different
models, ensemble learning improves the overall performance of the learning
system.
 By effectively merging predictions from multiple models, ensemble learning has
proven to be a powerful tool in various domains, offering more robust and reliable
forecasts.

Let’s understand the concept of ensemble learning with an example.


 Suppose you are a movie director and you have created a short movie on a very
important and interesting topic. Now, you want to take preliminary feedback
(ratings) on the movie before making it public. What are the possible ways by
which you can do that?

A: You may ask one of your friends to rate the movie for you.
Now it’s entirely possible that the person you have chosen loves you very much and
doesn’t want to break your heart by providing a 1-star rating to the horrible work you
have created.

B: Another way could be by asking 5 colleagues of yours to rate the movie.


This should provide a better idea of the movie. This method may provide honest ratings
for your movie. But a problem still exists. These 5 people may not be “Subject Matter
Experts” on the topic of your movie. Sure, they might understand the cinematography,
the shots, or the audio, but at the same time may not be the best judges of dark humour.

C: How about asking 50 people to rate the movie?


Some of which can be your friends, some of them can be your colleagues and some may
even be total strangers.
The responses, in this case, would be more generalized and diversified since now you
have people with different sets of skills. And as it turns out – this is a better approach to
get honest ratings than the previous cases we saw.
With these examples, you can infer that a diverse group of people are likely to make better
decisions as compared to individuals.
Similar is true for a diverse set of models in comparison to single models. This
diversification in Machine Learning is achieved by a technique called Ensemble Learning.

Some Ensemble Techniques:


 Max Voting
 Averaging
 Weighted Averaging
 Stacking
 Blending
 Bagging
 Boosting
 Error Correcting Output Codes

Max Voting:
 The max voting method is generally used for classification problems.
 In this technique, multiple models are used to make predictions for each data
point. The predictions by each model are considered as a ‘vote’.
 The predictions which we get from the majority of the models are used as the final
prediction.

For example, when you asked 5 of your colleagues to rate your movie (out of 5); we’ll
assume three of them rated it as 4 while two of them gave it a 5. Since the majority gave
a rating of 4, the final rating will be taken as 4. You can consider this as taking the mode
of all the predictions.

The result of max voting would be something like this:

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating


5 4 5 4 4 4

Averaging:
 Similar to the max voting technique, multiple predictions are made for each data
point in averaging.
 In this method, we take an average of predictions from all the models and use it to
make the final prediction.
 Averaging can be used for making predictions in regression problems or while
calculating probabilities for classification problems.

For example, in the below case, the averaging method would take the average of all the
values.
i.e. (5+4+5+4+4)/5 = 4.4

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating


5 4 5 4 4 4.4
Weighted Average:
 This is an extension of the averaging method. All models are assigned different
weights defining the importance of each model for prediction.
 For instance, if two of your colleagues are critics, while others have no prior
experience in this field, then the answers by these two friends are given more
importance as compared to the other people.

The result is calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18) + (4*0.18)] = 4.41.

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating


weight 0.23 0.23 0.18 0.18 0.18
rating 5 4 5 4 4 4.41

Stacking:
Stacking is an ensemble learning technique that uses predictions from multiple models
(for example decision tree, knn or svm) to build a new model. This model is used for
making predictions on the test set. Below is a step-wise explanation for a simple stacked
ensemble:
1. The train set is split into 10 parts.

2. A base model (suppose a decision tree) is fitted on 9 parts and predictions are
made for the 10th part. This is done for each part of the train set.

3. The base model (in this case, decision tree) is then fitted on the whole train
dataset.
4. Using this model, predictions are made on the test set.
5. Steps 2 to 4 are repeated for another base model (say knn) resulting in another
set of predictions for the train set and test set.

6. The predictions from the train set are used as features to build a new model.

7. This model is used to make final predictions on the test prediction set.

Blending
Blending follows the same approach as stacking but uses only a holdout (validation) set
from the train set to make predictions. In other words, unlike stacking, the predictions
are made on the holdout set only. The holdout set and the predictions are used to build a
model which is run on the test set. Here is a detailed explanation of the blending process:
1. The train set is split into training and validation sets

2. Model(s) are fitted on the training set.


3. The predictions are made on the validation set and the test set

4. The validation set and its predictions are used as features to build a new model.
5. This model is used to make final predictions on the test and meta-features.

Bagging:
The idea behind bagging is combining the results of multiple models (for instance, all
decision trees) to get a generalized result.
Here’s a question: If you create all the models on the same set of data and combine it,
will it be useful? There is a high chance that these models will give the same result since
they are getting the same input. So how can we solve this problem? One of the techniques
is bootstrapping.
Bootstrapping is a sampling technique in which we create subsets of observations from
the original dataset, with replacement. The size of the subsets is the same as the size of
the original set.
Bagging (or Bootstrap Aggregating) technique uses these subsets (bags) to get a fair
idea of the distribution (complete set). The size of subsets created for bagging may be less
than the original set.
1. Multiple subsets are created from the original dataset, selecting observations with
replacement.
2. A base model (weak model) is created on each of these subsets.
3. The models run in parallel and are independent of each other.
4. The final predictions are determined by combining the predictions from all the
models

Boosting:
If a data point is incorrectly predicted by the first model, and then the next (probably all
models), will combining the predictions provide better results? Such situations are taken
care of by boosting.
Boosting is a sequential process, where each subsequent model attempts to correct the
errors of the previous model. The succeeding models are dependent on the previous
model. Let’s understand the way boosting works in the below steps.
1. A subset is created from the original dataset.
2. Initially, all data points are given equal weights.
3. A base model is created on this subset.
4. This model is used to make predictions on the whole dataset
1. Errors are calculated using the actual values and predicted values.
2. The observations which are incorrectly predicted, are given higher weights.
3. (Here, the three misclassified blue-plus points will be given higher weights)
4. Another model is created and predictions are made on the dataset.
5. (This model tries to correct the errors from the previous model)

1. Similarly, multiple models are created, each correcting the errors of the previous
model.
2. The final model (strong learner) is the weighted mean of all the models (weak
learners)

Thus, the boosting algorithm combines a number of weak learners to form a strong
learner. The individual models would not perform well on the entire dataset, but they
work well for some part of the dataset. Thus, each model actually boosts the performance
of the ensemble.

Error Correcting Output Codes (ECOC):


Error Correcting Output Codes (ECOC) is a method used in ensemble learning to improve
the performance of multiclass classification problems. It is particularly useful when
dealing with complex classification tasks where a single base classifier might struggle to
accurately classify all classes. ECOC allows us to break down the multiclass problem
into multiple binary classification sub problems, which can be solved using simpler
binary classifiers.

Here's how the Error Correcting Output Codes method works:


Encoding Classes: Each class in the original multiclass problem is assigned a unique
binary code. The length of the binary code is usually equal to the number of base
classifiers (binary classifiers) to be used in the ensemble. Each base classifier will be
responsible for distinguishing between two classes based on the binary codes.
Training Base Classifiers: For each binary classification sub problem, a separate base
classifier is trained using the corresponding binary code. The binary code for the actual
class is set to 1, and for all other classes, it is set to 0. This way, each base classifier learns
to differentiate between one class and the rest (one-vs-rest strategy).
Decoding Predictions: When making predictions, each base classifier produces a binary
output. These binary outputs are then combined using the binary codes to form a
multiclass code for each instance.
Final Prediction: The final prediction is made by finding the class whose binary code is
closest to the multiclass code obtained from the base classifiers. The "closest" code is
determined using Hamming distance or other appropriate distance metrics.
The advantages of using ECOC in ensemble learning are:
 It allows the use of simpler binary classifiers, which can be computationally more
efficient and easier to train than complex multiclass classifiers.
 It provides a natural way to handle multiclass imbalances, as the binary classifiers
can be designed to focus on specific class distinctions.
 It can improve overall classification accuracy by exploiting the diversity of base
classifiers, especially when they have complementary strengths and weaknesses.

However, ECOC also has some limitations:


 Designing appropriate binary codes for each class can be challenging and might
require domain knowledge or additional optimization steps.
 The performance of ECOC heavily depends on the choice of base classifiers and the
encoding scheme.
 It may not always outperform other ensemble methods like Random Forest or
Gradient Boosting on all types of datasets.

Random Forest Classification:


Random Forest is a popular machine learning algorithm used for both classification
and regression tasks. It's an ensemble method that combines multiple decision trees
to make more accurate predictions. Each decision tree is built on a random subset of the
data and features, and the final prediction is based on the majority vote (classification)
or average (regression) of the individual trees.

Working of Random Forest Algorithm:


The following steps explain the working Random Forest Algorithm:
Step 1: Select random samples from a given data or training set.
Step 2: This algorithm will construct a decision tree for every training data.
Step 3: Voting will take place by averaging the decision tree.
Step 4: Finally, select the most voted prediction result as the final prediction result.

This combination of multiple models is called Ensemble. Ensemble uses two methods:
Bagging: Creating a different training subset from sample training data with replacement
is called Bagging. The final output is based on majority voting.
Boosting: Combing weak learners into strong learners by creating sequential models
such that the final model has the highest accuracy is called Boosting. Example: ADA
BOOST, XG BOOST.

Bagging: From the principle mentioned above, we can understand Random forest uses
the Bagging code. Now, let us understand this concept in detail. Bagging is also known as
Bootstrap Aggregation used by random forest. The process begins with any original
random data. After arranging, it is organised into samples known as Bootstrap Sample.
This process is known as Bootstrapping. Further, the models are trained individually,
yielding different results known as Aggregation. In the last step, all the results are
combined, and the generated output is based on majority voting. This step is known as
Bagging and is done using an Ensemble Classifier.

Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset
is given to the Random forest classifier. The dataset is divided into subsets and given to
each decision tree. During the training phase, each decision tree produces a prediction
result, and when a new data point occurs, then based on the majority of results, the
Random Forest classifier predicts the final decision. Consider the below image:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy