0% found this document useful (0 votes)

9 views50 pages

Class Adv Classification V

Ensemble classifiers (EC) utilize multiple base classifiers generated from training data through various manipulation methods, including training set resampling, feature selection, class label manipulation, and learning algorithm variation. Techniques such as Bagging and Boosting enhance classifier accuracy by reducing variance and focusing on misclassified records, respectively. Random Forests, a specific ensemble method for decision trees, combines predictions from multiple trees grown on bootstrap samples and random input vectors to improve classification performance.

Uploaded by

fakertoolzz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views50 pages

Class Adv Classification V

Uploaded by

fakertoolzz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

Ensemble Classifiers (EC)

• An ensemble classifier constructs a set of ‘base

classifiers’ from the training data
• Methods for constructing an EC
• Manipulating training set
• Manipulating input features
• Manipulating class labels
• Manipulating learning algorithms
Ensemble Classifiers (EC)
• Manipulating training set
• Multiple training sets are created by resampling the data
according to some sampling distribution
• Sampling distribution determines how likely it is that an
example will be selected for training – may vary from one trial
to another
• Classifier is built from each training set using a particular
learning algorithm
• Examples: Bagging & Boosting
Ensemble Classifiers (EC)

• Manipulating input features

• Subset of input features chosen to form each training set
• Subset can be chosen randomly or based on inputs given by
Domain Experts
• Good for data that has redundant features
• Random Forest is an example which uses DT as its base
classifiers
Ensemble Classifiers (EC)

• Manipulating class labels

• When no. of classes is sufficiently large
• Training data is transformed into a binary class problem by
randomly partitioning the class labels into 2 disjoint subsets, A0
& A1
• Re-labelled examples are used to train a base classifier
• By repeating the class labeling and model building steps several
times, and ensemble of base classifiers is obtained
• How a new tuple is classified?
• Example – error correcting output coding (pp 307)
Ensemble Classifiers (EC)

• Manipulating learning algorithm

• Learning algorithms can be manipulated in such a way that
applying the algorithm several times on the same training data
may result in different models
• Example – ANN can produce different models by changing
network topology or the initial weights of links between
neurons
• Example – ensemble of DTs can be constructed by introducing
randomness into the tree growing procedure – instead of
choosing the best split attribute at each node, we randomly
choose one of the top k attributes
Ensemble Classifiers (EC)

• First 3 approaches are generic – can be applied to any

classifier
• Fourth approach depends on the type of classifier used
• Base classifiers can be generated sequentially or in
parallel
Typical Ensemble Procedure
Typical Ensemble Procedure
Ensemble Classifiers

• Ensemble methods work better with ‘unstable

classifiers’
– Classifiers that are sensitive to minor perturbations in the
training set
• Examples:
– Decision trees
– Rule-based
– Artificial neural networks
Bias-Variance Decomposition
Bias Vs Variance
Bias: Example
Examples of Ensemble Methods

• How to generate an ensemble of classifiers?

– Bagging
– Boosting
– Random Forests
Bagging
• Also known as bootstrap aggregation

Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

• Sampling uniformly with replacement

• Build classifier on each bootstrap sample
• 0.632 bootstrap
• Each bootstrap sample Di contains approx. 63.2% of the
original training data
• Remaining (36.8%) are used as test set

Example taken from Tan et. al. book “Introduction to Data Mining”
Bagging: Boostrap Aggregation
• Analogy: Diagnosis based on multiple doctors’ majority vote
• Training
– Given a set D of d tuples, at each iteration i, a training set Di of d tuples is
sampled with replacement from D (i.e., boostrap)
– A classifier model Mi is learned for each training set Di
• Classification: classify an unknown sample X
– Each classifier Mi returns its class prediction
– The bagged classifier M* counts the votes and assigns the class with the
most votes to X
• Prediction: can be applied to the prediction of continuous values by taking
the average value of each prediction for a given test tuple
• Accuracy
– Often significant better than a single classifier derived from D
– For noise data: not considerably worse, more robust
– Proved improved accuracy in prediction 16
Bagging

• Accuracy of bagging:

• Works well for small data sets

• Example:

X 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
y 1 1 1 -1 -1 -1 -1 1 1 1

Actual
Class labels

Example taken from Tan et. al. book “Introduction to Data Mining”
Bagging
• Decision Stump
• Single level decision binary
tree
• Entropy – x<=0.35 or
x<=0.75
• Accuracy at most 70%

Actual
Class labels
Bagging

Accuracy of ensemble classifier: 100% 

Example taken from Tan et. al. book “Introduction to Data Mining”
Bagging- Final Points
• Works well if the base classifiers are unstable
• Increased accuracy because it reduces the variance
of the individual classifier
• Does not focus on any particular instance of the
training data
• Therefore, less susceptible to model over-fitting
when applied to noisy data
• What if we want to focus on a particular instances of
training data?
Boosting

• An iterative procedure to adaptively change

distribution of training data by focusing more on
previously misclassified records
– Initially, all N records are assigned equal weights
– Unlike bagging, weights may change at the end of a
boosting round
Boosting: Motivation
• Hard to design accurate classifier which generalizes well
• Easy to find many rule of thumb or weak classifiers
• a classifier is weak if it is slightly better than random guessing
• example: if an email has word “money” classify it as spam,
otherwise classify it as not spam
• likely to be better than random guessing
• Can we combine several weak classifiers to produce an
accurate classifier?
• Ada‐Boost (1996) was the first practical boosting algorithm
Boosting
• Records that are wrongly classified will have their
weights increased
• Records that are classified correctly will have their
weights decreased
Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify

• Its weight is increased, therefore it is more likely
to be chosen again in subsequent rounds

Example taken from Tan et. al. book “Introduction to Data Mining”
Boosting
• Equal weights are assigned to each training tuple (1/d
for round 1)
• After a classifier Mi is learned, the weights are
adjusted to allow the subsequent classifier Mi+1 to
“pay more attention” to tuples that were
misclassified by Mi.
• Final boosted classifier M* combines the votes of
each individual classifier
• Weight of each classifier’s vote is a function of its
accuracy
• Adaboost – popular boosting algorithm
Adaboost
• Input:
– Training set D containing d tuples
– k rounds
– A classification learning scheme
• Output:
– A composite model
Adaboost
• Data set D containing d class-labeled tuples (X1,y1),
(X2,y2), (X3,y3),….(Xd,yd)
• Initially assign equal weight 1/d to each tuple
• To generate k base classifiers, we need k rounds or
iterations
• Round i, tuples from D are sampled with
replacement , to form Di (size d)
• Each tuple’s chance of being selected depends on its
weight
Adaboost
• Base classifier Mi, is derived from training tuples of Di
• Error of Mi is tested using Di
• Weights of training tuples are adjusted depending on
how they were classified
– Correctly classified: Decrease weight
– Incorrectly classified: Increase weight
• Weight of a tuple indicates how hard it is to classify it
(directly proportional)
Adaboost
• Some classifiers may be better at classifying some
“hard” tuples than others
• We finally have a series of classifiers that complement
each other!
• Error rate of model Mi:
d
error ( M i ) w j * err ( X j )
j

where err(Xj) is the misclassification error for Xj(=1)

• If classifier error exceeds 0.5, we abandon it
• Try again with a new Di and a new Mi derived from it
Adaboost
• error (Mi) affects how the weights of training tuples are
updated
• If a tuple is correctly classified in round i, its weight is
multiplied by error( M i )
1  error( M i )
• Adjust weights of all correctly classified tuples
• Now weights of all tuples (including the misclassified tuples)
are normalized sum _ of _ old _ weights
• Normalization factor = sum _ of _ new _ weights
error ( M i )
• Weight of a classifier Mi’s weight is log
1  error ( M i )
Adaboost
• The lower a classifier error rate, the more accurate it is, and
therefore, the higher its weight for voting should be
• Weight of a classifier Mi’s vote is
error ( M i )
log
1  error ( M i )

• For each class c, sum the weights of each classifier that

assigned class c to X (unseen tuple)
• The class with the highest sum is the WINNER!
Example: AdaBoost
• Base classifiers: C1, C2, …, CT

• Error rate:
N
1
i 
N
 w I C ( x )  y 
j 1
j i j j

• Importance of a classifier:

1  1  i 
i  ln  
2  i 
Example: AdaBoost
• Weight update:
j
( j 1)
( j) 
w exp if C j ( xi )  yi
wi i
 
Z j  exp j if C j ( xi )  yi
where Z j is the normalizat ion factor
T
C * ( x) arg max   j I C j ( x)  y 
y j 1

• If any intermediate rounds produce error rate higher than

50%, the weights are reverted back to 1/n and the re-sampling
procedure is repeated
Illustrating AdaBoost
Initial weights for each data point Data points
for training

0.1 0.1 0.1

Original
Data +++ - - - - - ++

B1
0.0094 0.0094 0.4623
Boosting
Round 1 +++ - - - - - - -  = 1.9459
Illustrating AdaBoost
B1
0.0094 0.0094 0.4623
Boosting
Round 1 +++ - - - - - - -  = 1.9459

B2
0.3037 0.0009 0.0422
Boosting
Round 2 - - - - - - - - ++  = 2.9323

B3
0.0276 0.1819 0.0038
Boosting
Round 3 +++ ++ ++ + ++  = 3.8744

Overall +++ - - - - - ++
• Example:

X 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
y 1 1 1 -1 -1 -1 -1 1 1 1
Actual
Class labels

Example taken from Tan et. al. book “Introduction to Data Mining”
Random Forests
• Ensemble method specifically designed for decision tree classifiers
• Random Forests grows many classification trees (that is why the
name!)
• Combines predictions made by many decision trees.
• Each tree is generated based on a bootstrap sample and the
values of an independent set of random vectors.
• The random vectors are generated from a fixed probability
distribution.
• Ensemble of unpruned decision trees
• Each base classifier classifies a “new” vector
• Forest chooses the classification having the most votes (over all
the trees in the forest)
Random Forests
• Introduce two sources of randomness: “Bagging” and
“Random input vectors”
– Each tree is grown using a bootstrap sample of training data
– At each node, best split is chosen from random sample of
mtry variables instead of all variables
Random Forests
Randomizing Random Forests
• Each decision tree uses a random vector that is
generated from some fixed probability distribution.
This randomness can be incorporated in many ways:
1. Randomly select F input features to split at each
node (Forest-RI).
2. Create linear combinations of the input features to
split at each node (Forest-RC).
3. Randomly select one of the F best splits at each
node.
Why use Random Vectors?

• Recall that an underlying assumption of the

ensemble process is that the base learners are
independent.
• As the trees become more correlated (less
independent), the generalization error bound tends
to increase.
• Randomization helps to reduce the correlation
among decision trees.
Random Forests Error Bounds
• The upper bound for generalization error
of random forests converges to the
following expression, when the number of
trees is sufficiently large
Using Random Features

• Random split selection does better than bagging;

introduction of random noise into the outputs also
does better.
• To improve accuracy, the randomness injected has
to minimize the correlation while maintaining
strength.
Random Forest Algorithm
• M input variables, a number m<<M is specified such that at
each node, m variables are selected at random out of the M
and the best split on these m is used to split the node.
• m is held constant during the forest growing
• Each tree is grown to the largest extent possible
• There is no pruning
• Bagging using decision trees is a special case of random forests
when m=M
Random Forest Algorithm
• Out-of-bag (OOB) error
• Good accuracy without over-fitting
• Fast algorithm (can be faster than growing/pruning a single
tree); easily parallelized
• Handle high dimensional data without much problem
• Only one tuning parameter mtry = p , usually not sensitive to it
Out of Bag Error
Features and Advantages
The advantages of random forest are:
• It is one of the most accurate learning algorithms available. For
many data sets, it produces a highly accurate classifier.
• It runs efficiently on large databases.
• Simple and easily parallelized.
• It can handle thousands of input variables without variable
deletion.
• It gives estimates of what variables are important in the
classification.
• It generates an internal unbiased estimate of the generalization
error as the forest building progresses.
• It has an effective method for estimating missing data and
maintains accuracy when a large proportion of the data are
missing.
Disadvantages

• Random forests have been observed to overfit for some

datasets with noisy classification/regression tasks.
• For data including categorical variables with different
number of levels, random forests are biased in favor of
those attributes with more levels. Therefore, the variable
importance scores from random forest are not reliable for
this type of data.

ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
_LECTURE+NOTES_Boosting
No ratings yet
_LECTURE+NOTES_Boosting
8 pages
05 Ensemble Learning
No ratings yet
05 Ensemble Learning
38 pages
Module3
No ratings yet
Module3
26 pages
Voting or Averaging of Predictions of Multiple Pre-Trained Models
No ratings yet
Voting or Averaging of Predictions of Multiple Pre-Trained Models
23 pages
107 Boostong Models
No ratings yet
107 Boostong Models
27 pages
15 Ada Boost
No ratings yet
15 Ada Boost
15 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
lecture slide 12
No ratings yet
lecture slide 12
22 pages
16-Ensemble Learning - Cont... - 12-04-2024
No ratings yet
16-Ensemble Learning - Cont... - 12-04-2024
13 pages
Class X Part B Unit 1 Intro. To AI
No ratings yet
Class X Part B Unit 1 Intro. To AI
19 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Unit V -Multiple Learners
No ratings yet
Unit V -Multiple Learners
54 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
Ensemble (v6)
No ratings yet
Ensemble (v6)
45 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
ML8Ensembles (1)
No ratings yet
ML8Ensembles (1)
31 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
WEEK 8
No ratings yet
WEEK 8
101 pages
Chapter Five
No ratings yet
Chapter Five
42 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
ensembles_learning
No ratings yet
ensembles_learning
16 pages
ensemble
No ratings yet
ensemble
33 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
DA unit 2
No ratings yet
DA unit 2
51 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Bagging Vs Boosting in Machine Learning
No ratings yet
Bagging Vs Boosting in Machine Learning
4 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Alzheimer's Disease Detection
No ratings yet
Alzheimer's Disease Detection
49 pages
Bagging vs Boosting in Machine Learning
No ratings yet
Bagging vs Boosting in Machine Learning
5 pages
Introduction To Boosting - 2
No ratings yet
Introduction To Boosting - 2
79 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Python For Data Analysis - The Ultimate Beginner's Guide To Learn Programming in Python For Data Science With Pandas and NumPy, Master Statistical Analysis, and Visualization (2020)
No ratings yet
Python For Data Analysis - The Ultimate Beginner's Guide To Learn Programming in Python For Data Science With Pandas and NumPy, Master Statistical Analysis, and Visualization (2020)
109 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
ML_EXP_9
No ratings yet
ML_EXP_9
3 pages
pTIA DataX+ Dy0-001
No ratings yet
pTIA DataX+ Dy0-001
16 pages
A Paper On Stylometric Differences Between Authentic and Fabricated Ahadith
No ratings yet
A Paper On Stylometric Differences Between Authentic and Fabricated Ahadith
18 pages
Bass Diffusion Model
No ratings yet
Bass Diffusion Model
16 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Plant Leaf Diseases
No ratings yet
Plant Leaf Diseases
63 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Human Talent Prediction in HRM Using C4.5
No ratings yet
Human Talent Prediction in HRM Using C4.5
9 pages
Multivariate Time Series Classification With WEASE
No ratings yet
Multivariate Time Series Classification With WEASE
12 pages
Boosting
No ratings yet
Boosting
2 pages
Prediction and Analysis of Customer Complaints Usi
No ratings yet
Prediction and Analysis of Customer Complaints Usi
25 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Leeds Artificial Intelligence Course Guide
No ratings yet
Leeds Artificial Intelligence Course Guide
11 pages
Predicting_Heart_Diseases_Using_Machine_Learning_a
No ratings yet
Predicting_Heart_Diseases_Using_Machine_Learning_a
16 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Kharej-Konkur1402-Zaban-[konkur.in]
No ratings yet
Kharej-Konkur1402-Zaban-[konkur.in]
12 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
OM_Rai (1)
No ratings yet
OM_Rai (1)
1 page
G5ZB-MRes-AI-and-Machine-Learning-2024-25
No ratings yet
G5ZB-MRes-AI-and-Machine-Learning-2024-25
9 pages
ETI Microproject Report
No ratings yet
ETI Microproject Report
19 pages
Individual Assignments: Unit 2: Values, Data Types and Data Structures in R, Assignment 1
No ratings yet
Individual Assignments: Unit 2: Values, Data Types and Data Structures in R, Assignment 1
5 pages
Literature Review Table
No ratings yet
Literature Review Table
9 pages
R2_EnhancingMachineLearningWorkflowsAComprehensiveStudyofMachineLearningPipelines
No ratings yet
R2_EnhancingMachineLearningWorkflowsAComprehensiveStudyofMachineLearningPipelines
7 pages
Airline Delay Model
No ratings yet
Airline Delay Model
11 pages
Chronic Disease Prediction Using Machine Learning
No ratings yet
Chronic Disease Prediction Using Machine Learning
7 pages
Chopra - Recurrent Neural Networks with Non-Sequential Data to Predict Hospital Readmission of Diabetic Patients
No ratings yet
Chopra - Recurrent Neural Networks with Non-Sequential Data to Predict Hospital Readmission of Diabetic Patients
6 pages
bda5106-feature-extraction-engineering
No ratings yet
bda5106-feature-extraction-engineering
4 pages
24CC Integrated BTech CSE AIDS
No ratings yet
24CC Integrated BTech CSE AIDS
5 pages
Tugas2 Regresi Linear Berganda - Ipynb - Colab
No ratings yet
Tugas2 Regresi Linear Berganda - Ipynb - Colab
3 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Handwritten Calculator Using Contour Detection
No ratings yet
Handwritten Calculator Using Contour Detection
4 pages
Deep_Learning_Flowchart
No ratings yet
Deep_Learning_Flowchart
2 pages
Boosting and AdaBoost For Machine Learning
No ratings yet
Boosting and AdaBoost For Machine Learning
18 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
The Product Manager'S: Prompt Book
100% (4)
The Product Manager'S: Prompt Book
62 pages
Who's #1?: The Science of Rating and Ranking
From Everand
Who's #1?: The Science of Rating and Ranking
Amy N. Langville
4.5/5 (4)
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Class Adv Classification V

Uploaded by

Class Adv Classification V

Uploaded by

Ensemble Classifiers (EC)

• An ensemble classifier constructs a set of ‘base

• Manipulating input features

• Manipulating class labels

• Manipulating learning algorithm

• First 3 approaches are generic – can be applied to any

• Ensemble methods work better with ‘unstable

• How to generate an ensemble of classifiers?

• Sampling uniformly with replacement

• Works well for small data sets

Accuracy of ensemble classifier: 100% 

• An iterative procedure to adaptively change

• Example 4 is hard to classify

where err(Xj) is the misclassification error for Xj(=1)

• For each class c, sum the weights of each classifier that

• If any intermediate rounds produce error rate higher than

0.1 0.1 0.1

• Recall that an underlying assumption of the

• Random split selection does better than bagging;

• Random forests have been observed to overfit for some

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.