0% found this document useful (0 votes)
7 views14 pages

ML Unit 3-1

The document covers various ensemble learning techniques, including Voting Classifiers, Bagging, Boosting, and Stacking, emphasizing their roles in improving prediction accuracy. It explains Random Forests as a specific application of Bagging, detailing its advantages and steps for implementation. Additionally, it discusses the concepts of Pasting and the boosting algorithm AdaBoost, along with the challenges of overfitting in Stacking methods.

Uploaded by

nunepavan2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

ML Unit 3-1

The document covers various ensemble learning techniques, including Voting Classifiers, Bagging, Boosting, and Stacking, emphasizing their roles in improving prediction accuracy. It explains Random Forests as a specific application of Bagging, detailing its advantages and steps for implementation. Additionally, it discusses the concepts of Pasting and the boosting algorithm AdaBoost, along with the challenges of overfitting in Stacking methods.

Uploaded by

nunepavan2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

UNIT - III

Ensemble Learning and Random Forests: Introduction, Voting


Classifiers, Bagging and Pasting, Random Forests, Boosting,
Stacking.

Support Vector Machine: Linear SVM Classification, Nonlinear


SVM Classification SVM Regression, Naïve Bayes Classifiers.

Ensemble learning

Ensemble learning is a powerful technique in machine learning where


multiple models are combined to improve the overall performance
and accuracy of predictions. It is based on the idea that a group of
models working together can often outperform a single model.
Types of Ensemble Learning Techniques

1. Bagging (Bootstrap Aggregating)

2. Boosting

3.

4. Generalization)

5. Voting

6. Random forest
Voting classifier

A Voting Classifier is an ensemble learning technique that combines


multiple machine learning models to improve classification accuracy.
It works by aggregating predictions from different models and making
the final decision based on majority voting (hard voting) or
probability averaging (soft voting).

Types of Voting Classifiers

 Hard Voting
 Soft Voting

Hard Voting

Hard voting (also known as majority voting). The models predict the
output class independent of each other. The output class is a class with
the highest majority of votes.

EXAMPLE:

Suppose three classifiers predicted the output class (A, A, B), so the
majority predicted A as an output. Therefore A will be the final
prediction.
Soft Voting

In soft voting, the output class is the prediction based on the average
of probability given to that class. Soft voting entails combining the
probabilities of each prediction in each model and picking the
prediction with the highest total probability.

Each base model classifier independently assigns the probability of


occurrence of each type. In the end, the average of the possibilities of
each class is calculated, and the final output is the class having the
highest probability.

EXAMPLE:

Suppose given some input to three models, the prediction probability


for class A = (0.30, 0.47, 0.53) and B = (0.20, 0.32, 0.40). So the
average for class A is 0.4333 and B is 0.3067, the winner is clearly
class A because it had the highest probability averaged by each
classifier.

RANDOM FOREST ALGORITHM

 To make predictions and then we do voting of all the tress to make prediction.
Why use Random Forest?
o It takes less training time as compared to other algorithms.
o It predicts output with high accuracy, even for the large dataset it runs efficiently.

o It can also maintain accuracy when a large proportion of data is missing.

STEPS FOR RANDOM FOREST ALGORITHM


Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.

EXAMPLE:
Applications of Random Forest
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.

2. Medicine: With the help of this algorithm, disease trends and risks of the disease can
be identified.

3. Land Use: We can identify the areas of similar land use by this algorithm.

4. Marketing: Marketing trends can be identified using this algorithm.

Bagging
Bootstrap Aggregating, also known as bagging, is a machine learning ensemble meta-
algorithm designed to improve the stability and accuracy of machine learning algorithms used
in statistical classification and regression.

It creates multiple instances of the same model by training each instance on a randomly
drawn subset of the training data with replacement (bootstrap sampling). The final
prediction is made by aggregating the outputs of all models.

It decreases the variance and helps to avoid overfitting. It is usually applied to decision tree
methods.

Bagging is a special case of the model averaging approach.

Implementation Steps of Bagging

1. Bootstrap Sampling: Create multiple random subsets of the training data with
replacement.

2. Model Training: Train an individual model on each subset.

3. Aggregation:

o Regression: Take the average of all model predictions.

o Classification: Use majority voting (most common class label).


y

An illustration for the concept of bootstrap aggregating (Bagging)

Example of Bagging

The Random Forest model uses Bagging, where decision tree models with higher variance
are present. It makes random feature selection to grow trees. Several random trees make a
Random Forest.
Pasting
Pasting is similar to bagging but differs in one key aspect—it uses sampling without
replacement, meaning each subset of data is unique and does not have repeated samples.

How It Works

1. Random Subset Selection: Create multiple random subsets of the training data without
replacement.

2. Model Training: Train an individual model on each subset.

3. Aggregation:

o Regression: Take the average of all model predictions.

o Classification: Use majority voting.

Advantages of Pasting

✅ Reduces variance, but less effective than bagging when dealing with high variance models.
✅ Uses the full dataset more efficiently as there are no repeated samples.
Boosting
Boosting is an ensemble modelling technique designed to create a strong classifier by
combining multiple weak classifiers. The process involves building models sequentially,
where each new model aims to correct the errors made by the previous ones.

 Initially, a model is built using the training data.

 Subsequent models are then trained to address the mistakes of their predecessors.

 boosting assigns weights to the data points in the original dataset.

 Higher weights: Instances that were misclassified by the previous model receive
higher weights.

 Lower weights: Instances that were correctly classified receive lower weights.

 Training on weighted data: The subsequent model learns from the weighted dataset,
focusing its attention on harder-to-learn examples (those with higher weights).

 This iterative process continues until:

o The entire training dataset is accurately predicted, or

o A predefined maximum number of models is reached.

Boosting Algorithms
There are several boosting algorithms. The original ones, proposed by Robert
Schapire and Yoav Freund were not adaptive and could not take full advantage of the weak
learners.

Schapire and Freund then developed AdaBoost, an adaptive boosting algorithm that won the
prestigious Gödel Prize. AdaBoost was the first really successful boosting algorithm
developed for the purpose of binary classification.

AdaBoost is short for Adaptive Boosting and is a very popular boosting technique that
combines multiple “weak classifiers” into a single “strong classifier”.

Algorithm:

1. Initialise the dataset and assign equal weight to each of the data point.
2. Provide this as input to the model and identify the wrongly classified data points.

3. Increase the weight of the wrongly classified data points and decrease the weights of
correctly classified data points. And then normalize the weights of all data points.

4. if (got required results)


Goto step 5
else
Goto step 2

5. End

What is Stacking Ensemble Technique?


Stacking is another ensemble technique. It is the extension of the voting ensemble technique
as we use different learning models which we called Base Models in stacking. On top of that,
another model is trained which uses the prediction of the base models for the final outcome.
The extra added learning model is known as Meta Model.
Stacking

But stacking ensemble faces an overfitting problem because we are using the same training
dataset to train the base models and also using the prediction of the same training dataset to
train the meta-model. To solve this problem stacking ensemble comes up with two methods.

1. Blending

2. k-fold

Blending

In this method, we divide the training dataset into two parts. The first part of the training
dataset will be used to train the base models then the second part of the training dataset is
used by the base models to predict the outcome which is further used by the meta-model.

K-Folds

In this method, we divide the training dataset into k parts/folds, then k-1 parts of the training
dataset are used to train the base models and one part which is left is used by the base models
to predict the outcome which is further used by the meta-model.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy