0% found this document useful (0 votes)

5 views6 pages

Data_Mining_Lab-2

The document provides an overview of ensemble learning, focusing on bagging and boosting techniques, including their strengths and limitations. It details the implementation of Random Forest, AdaBoost, and Gradient Boosting classifiers using the Iris dataset, along with performance evaluation through classification reports and cross-validation. The conclusion emphasizes the importance of selecting the appropriate ensemble method based on the dataset and problem characteristics.

Uploaded by

sureshsindhuja001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views6 pages

Data_Mining_Lab-2

Uploaded by

sureshsindhuja001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

11/2/24, 8:01 PM Untitled13

Sindhuja_Suresh_0817685_Data_Mining_Lab-2

1.Introduction to Ensemble Learning

Ensemble learning is a powerful machine learning approach where multiple models

(often referred to as "weak learners") are combined to produce a more accurate, robust
prediction. This technique leverages the strengths of individual models and minimizes
their weaknesses, which can result in higher accuracy and improved generalization on
unseen data.

The two main types of ensemble learning methods are Bagging and Boosting:

Bagging (Bootstrap Aggregating): Bagging aims to reduce variance and avoid overfitting
by training multiple instances of the same model on different subsets of the dataset. The
Random Forest algorithm is a popular example of bagging.

Boosting: Boosting focuses on reducing bias by sequentially training models, each

attempting to correct the errors of its predecessor. AdaBoost and Gradient Boosting are
commonly used boosting algorithms.

2. Loading and Preprocessing Data:

In [1]: # Import necessary libraries

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
# Load and prepare the dataset
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

3. Data Splitting and Scaling

In [3]: #Split the data into training and testing sets and standardize the features.
# Splitting data into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']
# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_
# Standardizing features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

4. Applying Random Forest (Bagging Technique):

file:///C:/Users/sindhuja/Downloads/Untitled13.html 1/6
11/2/24, 8:01 PM Untitled13

In [4]: #Train a Random Forest classifier and evaluate its performance.

from sklearn.ensemble import RandomForestClassifier

rf_model = RandomForestClassifier()
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
# Evaluation
print("Random Forest Classification Report:")
print(classification_report(y_test, y_pred_rf))
sns.heatmap(confusion_matrix(y_test, y_pred_rf), annot=True, fmt='d')
plt.show()

Random Forest Classification Report:

precision recall f1-score support

0 1.00 1.00 1.00 10

1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

5. Applying AdaBoost (Boosting Technique)

In [12]: #Train an AdaBoost classifier and evaluate its performance. Use AdaBoostClassifi

from sklearn.ensemble import AdaBoostClassifier

model = AdaBoostClassifier(algorithm='SAMME')
ada_model.fit(X_train, y_train)
y_pred_ada = ada_model.predict(X_test)
# Evaluation
print("AdaBoost Classification Report:")

file:///C:/Users/sindhuja/Downloads/Untitled13.html 2/6
11/2/24, 8:01 PM Untitled13

print(classification_report(y_test, y_pred_ada))
sns.heatmap(confusion_matrix(y_test, y_pred_ada), annot=True, fmt='d')
plt.show()

C:\Users\sindhuja\anaconda3\Lib\site-packages\sklearn\ensemble\_weight_boosting.p
y:519: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will
be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
warnings.warn(
AdaBoost Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10

1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

6. Applying Gradient Boosting (Boosting Technique)

In [13]: #Train a Gradient Boosting classifier and evaluate its performance. Use Gradient
from sklearn.ensemble import GradientBoostingClassifier
gb_model = GradientBoostingClassifier()
gb_model.fit(X_train, y_train)
y_pred_gb = gb_model.predict(X_test)
# Evaluation
print("Gradient Boosting Classification Report:")
print(classification_report(y_test, y_pred_gb))
sns.heatmap(confusion_matrix(y_test, y_pred_gb), annot=True, fmt='d')
plt.show()

file:///C:/Users/sindhuja/Downloads/Untitled13.html 3/6
11/2/24, 8:01 PM Untitled13

Gradient Boosting Classification Report:

precision recall f1-score support

0 1.00 1.00 1.00 10

1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

7. Cross-Validation

In [15]: import warnings

# Ignore FutureWarnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# Perform cross-validation on each of the ensemble models to evaluate their robu

# Cross-validation for Random Forest
cv_scores_rf = cross_val_score(rf_model, X, y, cv=5)
print(f"Random Forest CV Scores: {cv_scores_rf}")
print(f"Average CV Score: {np.mean(cv_scores_rf)}")

# Cross-validation for AdaBoost

cv_scores_ada = cross_val_score(ada_model, X, y, cv=5)
print(f"AdaBoost CV Scores: {cv_scores_ada}")
print(f"Average CV Score: {np.mean(cv_scores_ada)}")

# Cross-validation for Gradient Boosting

cv_scores_gb = cross_val_score(gb_model, X, y, cv=5)
print(f"Gradient Boosting CV Scores: {cv_scores_gb}")
print(f"Average CV Score: {np.mean(cv_scores_gb)}")

file:///C:/Users/sindhuja/Downloads/Untitled13.html 4/6
11/2/24, 8:01 PM Untitled13

Random Forest CV Scores: [0.96666667 0.96666667 0.93333333 0.96666667 1. ]

Average CV Score: 0.9666666666666668
AdaBoost CV Scores: [0.96666667 0.93333333 0.9 0.93333333 1. ]
Average CV Score: 0.9466666666666667
Gradient Boosting CV Scores: [0.96666667 0.96666667 0.9 0.96666667 1.
]
Average CV Score: 0.9600000000000002

8. Analysis and Conclusion:

1)Bagging (Bootstrap Aggregating)

Strengths: Variance Reduction: By training models on different subsets of data, bagging

mitigates overfitting. Parallelization: Models can be trained independently, making it
easier to scale and reduce training time. Robustness: More resilient to noise and outliers
compared to individual models.

Limitations: Bias Retention: While bagging reduces variance, it may not address bias,
potentially leading to suboptimal performance if the base learner is biased. Model
Interpretability: The ensemble nature can make it harder to interpret results compared to
single models.

2)Boosting

Strengths: Bias Reduction: By sequentially correcting errors, boosting can significantly

lower bias and improve model accuracy. Focus on Difficult Cases: The approach
emphasizes instances that are harder to classify, which can lead to better performance on
challenging datasets. Flexibility: Different loss functions can be applied, making boosting
suitable for various tasks.

Limitations: Overfitting Risk: Boosting can overfit the training data, particularly if the
model complexity is not controlled. Training Time: The sequential nature of boosting may
result in longer training times, especially with large datasets.

Comparison of Ensemble Models

Cross-Validation Scores:

Bagging Models (e.g., Random Forest): Typically show stable cross-validation scores with
lower variance. For example, cross-validation may yield an average score of 0.85 with a
standard deviation of 0.02. Boosting Models (e.g., Gradient Boosting): Often achieve
higher average cross-validation scores (e.g., 0.88) but may exhibit higher variability
depending on the dataset.

Final Test Results:

Bagging Model Final Test Score: Generally, bagging models maintain consistent test
performance, often around 0.83-0.87. Boosting Model Final Test Score: Boosting models
may achieve higher test scores, such as 0.90, but are more sensitive to data distribution
and model hyperparameters.

file:///C:/Users/sindhuja/Downloads/Untitled13.html 5/6
11/2/24, 8:01 PM Untitled13

Conclusion:

Both bagging and boosting are powerful techniques in ensemble learning, each with its
strengths and weaknesses. Bagging is effective for variance reduction and provides
robustness against noise, making it suitable for unstable models. In contrast, boosting
excels in reducing bias and achieving high accuracy, though it requires careful tuning to
avoid overfitting.

When selecting an ensemble method, it’s crucial to consider the nature of the dataset
and the specific problem at hand.

file:///C:/Users/sindhuja/Downloads/Untitled13.html 6/6

Ensemble,Voting,Bagging,Boosting
No ratings yet
Ensemble,Voting,Bagging,Boosting
15 pages
ML-Lecture-15-Ensemble
No ratings yet
ML-Lecture-15-Ensemble
27 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
neural-networks-and-deep-learning-notes
No ratings yet
neural-networks-and-deep-learning-notes
88 pages
107 Boostong Models
No ratings yet
107 Boostong Models
27 pages
Voting or Averaging of Predictions of Multiple Pre-Trained Models
No ratings yet
Voting or Averaging of Predictions of Multiple Pre-Trained Models
23 pages
8 Bagging Boosting Annotated (1)
No ratings yet
8 Bagging Boosting Annotated (1)
31 pages
Ensemble Machine Learning Approach
No ratings yet
Ensemble Machine Learning Approach
13 pages
98 D2 Exp-6 ML
No ratings yet
98 D2 Exp-6 ML
4 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
adaboost
No ratings yet
adaboost
6 pages
Adaboost
No ratings yet
Adaboost
4 pages
AdaBoost Notes
No ratings yet
AdaBoost Notes
5 pages
Lecture 9 H
No ratings yet
Lecture 9 H
69 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Arnav MLlab05
No ratings yet
Arnav MLlab05
12 pages
U1-Ensemble Methods
No ratings yet
U1-Ensemble Methods
17 pages
lecture slide 12
No ratings yet
lecture slide 12
22 pages
Ensemble (v6)
No ratings yet
Ensemble (v6)
45 pages
Green University of Bangladesh Department of Computer Science and Engineering (CSE)
No ratings yet
Green University of Bangladesh Department of Computer Science and Engineering (CSE)
6 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
17 Ensemble Techniques Problem Statement
No ratings yet
17 Ensemble Techniques Problem Statement
28 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Adaboost Solutions
No ratings yet
Adaboost Solutions
6 pages
DMBI
No ratings yet
DMBI
15 pages
Learning XOR - Gradient Based Learning - Hidden Units
No ratings yet
Learning XOR - Gradient Based Learning - Hidden Units
43 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
UNIT-3 Notes
No ratings yet
UNIT-3 Notes
12 pages
CMPT 413/713: Natural Language Processing: Nat Langlab
No ratings yet
CMPT 413/713: Natural Language Processing: Nat Langlab
31 pages
ML U3 Notes
No ratings yet
ML U3 Notes
10 pages
Statistics Project
No ratings yet
Statistics Project
5 pages
Lecture 7. Multilayer Perceptron. Backpropagation: COMP90051 Statistical Machine Learning
No ratings yet
Lecture 7. Multilayer Perceptron. Backpropagation: COMP90051 Statistical Machine Learning
26 pages
Databook PDF
No ratings yet
Databook PDF
64 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
Ensemble Learning: Martin Sewell
No ratings yet
Ensemble Learning: Martin Sewell
16 pages
Technical Report
No ratings yet
Technical Report
10 pages
Artificial Neural Networks: Prajith CA Associate Professor Ece, Cet
No ratings yet
Artificial Neural Networks: Prajith CA Associate Professor Ece, Cet
46 pages
ENSEMBLE_LEARNING
No ratings yet
ENSEMBLE_LEARNING
9 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
ANN 3 - Perceptron
100% (1)
ANN 3 - Perceptron
56 pages
Slides - Ensemble
No ratings yet
Slides - Ensemble
6 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
10 pages
Neural Network Questions
No ratings yet
Neural Network Questions
9 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
UNIT-IV - Decision Tree Induction
No ratings yet
UNIT-IV - Decision Tree Induction
19 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
30 Assignments PDF
No ratings yet
30 Assignments PDF
5 pages
Human Activity Recognition Using CNN
No ratings yet
Human Activity Recognition Using CNN
51 pages
UNIT 3 AML
No ratings yet
UNIT 3 AML
9 pages
106-110
No ratings yet
106-110
6 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Ritik DL
No ratings yet
Ritik DL
17 pages
Perceptron
No ratings yet
Perceptron
6 pages
ML_EXP_9
No ratings yet
ML_EXP_9
3 pages
unit 5 ML
No ratings yet
unit 5 ML
14 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
ANN Matlab
No ratings yet
ANN Matlab
13 pages
Soft Computing
No ratings yet
Soft Computing
92 pages
Boosting
No ratings yet
Boosting
2 pages
Time To Explore (5) ML
No ratings yet
Time To Explore (5) ML
9 pages
Types of Boosting
No ratings yet
Types of Boosting
4 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Conference Paper 28-02
No ratings yet
Conference Paper 28-02
8 pages
AdaBoost Classifier in Python (Article) - DataCamp
100% (1)
AdaBoost Classifier in Python (Article) - DataCamp
9 pages
ID3 MedhaPradhan
No ratings yet
ID3 MedhaPradhan
22 pages
Deep Learning - Question Bank
No ratings yet
Deep Learning - Question Bank
6 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
8.01 Machine Learning Basics
No ratings yet
8.01 Machine Learning Basics
6 pages
Recommendation Systems
No ratings yet
Recommendation Systems
27 pages
Ist 407 Presentation
No ratings yet
Ist 407 Presentation
12 pages
Slides Used in The Sessions:Ramesh Ramani: Session 1:introduction
No ratings yet
Slides Used in The Sessions:Ramesh Ramani: Session 1:introduction
4 pages
Machine Learning DSE Course Handout
No ratings yet
Machine Learning DSE Course Handout
7 pages
B. Tech - (AR19 & AR 20) Question Bank Template
No ratings yet
B. Tech - (AR19 & AR 20) Question Bank Template
7 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
MLT syllabus
No ratings yet
MLT syllabus
1 page
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data_Mining_Lab-2

Uploaded by

Data_Mining_Lab-2

Uploaded by

11/2/24, 8:01 PM Untitled13

1.Introduction to Ensemble Learning

Ensemble learning is a powerful machine learning approach where multiple models

Boosting: Boosting focuses on reducing bias by sequentially training models, each

2. Loading and Preprocessing Data:

In [1]: # Import necessary libraries

3. Data Splitting and Scaling

4. Applying Random Forest (Bagging Technique):

In [4]: #Train a Random Forest classifier and evaluate its performance.

from sklearn.ensemble import RandomForestClassifier

Random Forest Classification Report:

0 1.00 1.00 1.00 10

5. Applying AdaBoost (Boosting Technique)

from sklearn.ensemble import AdaBoostClassifier

0 1.00 1.00 1.00 10

6. Applying Gradient Boosting (Boosting Technique)

Gradient Boosting Classification Report:

0 1.00 1.00 1.00 10

In [15]: import warnings

# Perform cross-validation on each of the ensemble models to evaluate their robu

# Cross-validation for AdaBoost

# Cross-validation for Gradient Boosting

Random Forest CV Scores: [0.96666667 0.96666667 0.93333333 0.96666667 1. ]

8. Analysis and Conclusion:

1)Bagging (Bootstrap Aggregating)

Strengths: Variance Reduction: By training models on different subsets of data, bagging

Strengths: Bias Reduction: By sequentially correcting errors, boosting can significantly

Comparison of Ensemble Models

Final Test Results:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.