0% found this document useful (0 votes)

7 views65 pages

06 EnsembleLearning

Uploaded by

vafag.va32

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views65 pages

06 EnsembleLearning

Uploaded by

vafag.va32

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Ensemble Learning

Danna Gurari
University of Texas at Austin
Spring 2021

https://www.ischool.utexas.edu/~dannag/Courses/IntroToMachineLearning/CourseContent.html
Review
• Last week:
• Evaluating Machine Learning Models Using Cross-Validation
• Naïve Bayes
• Support Vector Machines

• Assignments (Canvas):
• Problem set 4 due tonight
• Lab assignment 2 due next week
• Project pre-proposal due in two weeks (finding a partner ideas)

• Questions?
Today’s Topics

• One-vs-all multiclass classification

• Classifier confidence

• Evaluation: ROC and PR-curves

• Ensemble learning
Today’s Topics

• One-vs-all multiclass classification

• Classifier confidence

• Evaluation: ROC and PR-curves

• Ensemble learning
Recall: Binary vs Multiclass Classification
Binary: distinguish 2 classes Multiclass: distinguish 3+ classes

Figure Source: http://mlwiki.org/index.php/One-vs-All_Classification

Recall: Binary vs Multiclass Classification
Binary: distinguish 2 classes Multiclass: distinguish 3+ classes

Perceptron Nearest Neighbor

Adaline Decision Tree
Support Vector Machine Naïve Bayes
One-vs-All (aka, One-vs-Rest): Applying Binary
Classification Methods for Multiclass Classification
• Given ‘N’ classes, train ‘N’ different classifiers: a single classifier
trained per class, with the samples of that class as positive samples
and all other samples as negatives; e.g.,

Figure Source: http://mlwiki.org/index.php/One-vs-All_Classification

One-vs-All (aka, One-vs-Rest): Limitation

• Often leads to unbalanced distributions during learning; i.e., when

the set of negatives is much larger than the set of positives

Figure Source: http://mlwiki.org/index.php/One-vs-All_Classification

One-vs-All (aka, One-vs-Rest): Class Assignment

• (Imperfect) Approach: use from N classifiers the most confident

match; this requires a real-valued confidence score for its decision

Figure Source: http://mlwiki.org/index.php/One-vs-All_Classification

Today’s Topics

• One-vs-all multiclass classification

• Classifier confidence

• Evaluation: ROC and PR-curves

• Ensemble learning
Classifier Confidence: Beyond Classification

• Indicate both the predicted class and uncertainty about the choice

• When and why might you want to know about the uncertainty?
• e.g., weather forecast: 25% chance it will rain today
• e.g., medical treatment: when unconfident, start a patient on a drug at a
lower dose and decide later whether to change the medication or dose
Classifier Confidence: How to Measure for
K-Nearest Neighbors?
• Proportion of neighbors with label y; e.g.,
When K=3:

https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
Classifier Confidence: How to Measure for
Decision Trees?
• Proportion of training samples with label y in the leaf where for the test sample;
e.g.,

14 yes; 6 no 100 no 120 yes 30 yes; 70 no

Classifier Confidence: How to Measure for
Naïve Bayes?

• Conditional probability P (Y|X) for the most probable class

Classifier Confidence: How to Measure for
Support Vector Machines?
• Distance to the hyperplane: e.g.,

http://chem-eng.utoronto.ca/~datamining/dmc/support_vector_machine.htm
Classifier Confidence vs Probability

• Classifiers can make mistakes in estimating their confidence level

• External calibration procedures can address this issue (e.g., using

calibration curves/reliability diagrams)
Today’s Topics

• One-vs-all multiclass classification

• Classifier confidence

• Evaluation: ROC and PR-curves

• Ensemble learning
Classification from a Classifier’s Confidence

• Observation: A threshold must be chosen to define the point at which

the example belongs to a class or not

• Motivation: how to choose the threshold?

• Default is 0.5
• Yet, it can tuned to avoid different types of errors
Review: Confusion Matrix for Binary Classification

Python Machine Learning; Raschkka & Mirjalili

Receiver Operating Characteristic (ROC) curve
Summarizes performance based
on the positive class
- A positive prediction is either
correct (TP) or not (FP)
Receiver Operating Characteristic (ROC) curve
To create, vary prediction threshold and
Summarizes performance based
compute TPR and FPR for each threshold
on the positive class
- A positive prediction is either
1 correct (TP) or not (FP)
TPR

0
0 FPR 1
Receiver Operating Characteristic (ROC) curve
What is the coordinate for a perfect predictor?
Summarizes performance based
on the positive class
- A positive prediction is either
1 correct (TP) or not (FP)
TPR

0
0 FPR 1
ROC Curve: Area Under Curve (AUC)
Which of the first three methods performs best overall?
Summarizes performance based
on the positive class
- A positive prediction is either
correct (TP) or not (FP)

Python Machine Learning; Raschkka & Mirjalili

ROC Curve: Multiclass Classification
• Plot curve per class:

https://stackoverflow.com/questions/56090541/how-to-plot-precision-and-recall-of-multiclass-classifier
Precision-Recall (PR) Curve

Summarizes performance based only on

the positive class (ignores true negatives):
Precision-Recall (PR) Curve
To create, vary prediction threshold and
compute precision and recall for each threshold
Summarizes performance based only on
the positive class (ignores true negatives):
1
Precision

0
0 Recall 1
Precision-Recall (PR) Curve
What is the coordinate for a perfect predictor?

Summarizes performance based only on

the positive class (ignores true negatives):
1
Precision

0
0 Recall 1
PR Curve: Area Under Curve (AUC)

• Which classifier is the best?

PR Curve: Multiclass Classification
• Plot curve per class:

https://stackoverflow.com/questions/56090541/how-to-plot-precision-and-recall-of-multiclass-classifier
Group Discussion: Evaluation Curves
1. Assume you are building a classifier for these Assume the following thresholds were used
applications: to create the curve: 0, 0.25, 0.5, 0.75, 1.
• Detecting offensive content online
• Medical diagnoses 1
• Detecting shoplifters
• Deciding whether a person is guilty of a crime
What classifier threshold would you choose for
each application and why?

Precision
2. When would you choose to evaluate with a
PR curve versus a ROC curve?

3. What is the area under the ROC and PR 0

curves for a perfect classifier? 0 Recall 1
Today’s Topics

• One-vs-all multiclass classification

• Classifier confidence

• Evaluation: ROC and PR-curves

• Ensemble learning
Idea: How Many Predictors to Use?

More than 1: Ensemble

Why Choose Ensemble Instead of an Algorithm?
• Reduces probability for making a wrong prediction, assuming:
• Classifiers are independent (not true in practice!)
• Suppose:
• n classifiers for binary classification task
• Each classifier has same error rate
• Probability mass function indicates the probability of error from an ensemble:
Number of classifiers Classifier error rate
Error probability
# ways to choose k subsets from set of size n
• e.g., n = 11, = 0.25; k = 6: probability of error is ~0.034 which is much lower
than probability of error from a single algorithm (0.25)
Why Choose Ensemble Instead of an Algorithm?
• Reduces probability for making a wrong prediction, assuming:
• Classifiers are independent (not true in practice!)
• Suppose:
• n classifiers for binary classification task
• Each classifier has same error rate
How to Get Diverse Classifiers?
• Probability mass function indicates the probability of error from an ensemble:

• e.g., n = 11, = 0.25; k = 6: probability of error is ~0.034 which is much lower

than probability of error from a single algorithm (0.25)
Why Choose Ensemble Instead of an Algorithm?
• Reduces probability for making a wrong prediction, assuming:
• Classifiers are independent (not true in practice!)
• Suppose:
• n classifiers for binary classification task
1. Use different algorithms
• Each classifier has same error rate
• Probability mass function indicates the probability of error from an ensemble:
2. Use different features
2.= 11,Use
• e.g., n = 0.25;different
k = 6: probability oftraining data
error is ~0.034 which is much lower
than probability of error from a single algorithm (0.25)
How to Predict with an Ensemble?
• Majority Voting
• Return most popular prediction from multiple prediction algorithms

• Bootstrap Aggregation, aka Bagging

• Resample data to train algorithm on different random subsets

• Boosting
• Reweight data to train algorithms to specialize on different “hard” examples

• Stacking
• Train a model that learns how to aggregate classifiers’ predictions
1613

Human “Computers”
Linear regression Early 1800

First programmable
machine 1945
Turing Test 1950
K-nearest neighbors
AI 1956
Perceptron 1957
Machine Learning 1959
Naïve Bayes
Decision Trees 1962
Historical Context of ML Models

1rst AI 2nd AI
Winter Winter

SVM
1974 1980 1987 1993
Rise of

Learning
Ensemble
How to Predict with an Ensemble of Algorithms?
• Majority Voting
• Return most popular prediction from multiple prediction algorithms

• Bootstrap Aggregation, aka Bagging

• Train algorithm repeatedly on different random subsets of the training set

• Boosting
• Train algorithms that each specialize on different “hard” training examples

• Stacking
• Train a model that learns how to aggregate classifiers’ predictions
Majority Voting

Figure Credit: Raschka & Mirjalili, Python Machine Learning.

Majority Voting

Prediction Model Prediction Model Prediction Model

Prediction Prediction Prediction

Majority Vote
Majority Voting: Binary Task
e.g., “Is it sunny today?”

Prediction Model Prediction Model Prediction Model Prediction Model

“Yes” “No” “Yes” “Yes”

“Yes”
Majority Voting: “Soft” (not “Hard”)

Prediction Model Prediction Model Prediction Model

Probability Probability Probability

Majority Vote
Majority Voting: Soft Voting on Binary Task
e.g., “Is it sunny today?”

Prediction Model Prediction Model Prediction Model Prediction Model

90% “Yes” 20% Yes 55% “Yes” 45% “Yes”

“Yes” (210/4 = 52.5% Yes)

Plurality Voting: Non-Binary Task
e.g., “What object is in the image?”

Prediction Model Prediction Model Prediction Model Prediction Model

“Cat” “Dog” “Pig” “Cat”

“Cat”
Majority Voting: Regression
e.g., “Is it sunny today?”

Prediction Model Prediction Model Prediction Model Prediction Model

90% “Yes” 20% Yes 55% “Yes” 45% “Yes”

52.5% (average prediction)

Majority Voting: Example of Decision Boundary

Figure Credit: Raschka & Mirjalili, Python Machine Learning.

How to Predict with an Ensemble of Algorithms?
• Majority Voting
• Return most popular prediction from multiple prediction algorithms

• Bootstrap Aggregation, aka Bagging

• Train algorithm repeatedly on different random subsets of the training set

• Boosting
• Train algorithms that each specialize on different “hard” training examples

• Stacking
• Train a model that learns how to aggregate classifiers’ predictions
Bagging

Figure Credit: Raschka & Mirjalili, Python Machine Learning.

Bagging: Training
• Build ensemble from “bootstrap samples” drawn with replacement
• e.g.,
Duplicate data can
occur for training

Some examples
missing from
training data;
e.g., round 1

Each classifier trained on Breiman, Bagging Predictors, 1994.

different subset of data Ho, Random Decision Forests, 1995.
Figure Credit: Raschka & Mirjalili, Python Machine Learning.
Bagging: Training
• Build ensemble from “bootstrap samples” drawn with replacement
• e.g.,

Class Demo:
- Pick a number
from the bag

Breiman, Bagging Predictors, 1994.

Ho, Random Decision Forests, 1995.
Figure Credit: Raschka & Mirjalili, Python Machine Learning.
Bagging: Predicting

Prediction Model Prediction Model Prediction Model Prediction Model

• Predict as done for “majority voting”

• e.g., “hard” voting
• e.g., “soft” voting
• e.g., averaging values for regression
Bagging: Random Forest
• Build ensemble from “bootstrap samples” drawn with replacement
• e.g.,

Fit decision trees by

also selecting random
feature subsets

Breiman, Bagging Predictors, 1994.

Ho, Random Decision Forests, 1995.
Figure Credit: Raschka & Mirjalili, Python Machine Learning.
Bagging: Intuition (Train an 8 detector)

Fellow et al., Deep Learning (chapter 7), 2016.

Bagging: Intuition (Train an 8 detector)

Fellow et al., Deep Learning (chapter 7), 2016.

Bagging: Intuition (Train an 8 detector)

Fellow et al., Deep Learning (chapter 7), 2016.

How to Predict with an Ensemble of Algorithms?
• Majority Voting
• Return most popular prediction from multiple prediction algorithms

• Bootstrap Aggregation, aka Bagging

• Train algorithm repeatedly on different random subsets of the training set

• Boosting
• Train algorithms that each specialize on different “hard” training examples

• Stacking
• Train a model that learns how to aggregate classifiers’ predictions
Boosting
• Key idea: sequentially train predictors that each try to correctly
predict examples that were hard for previous predictors

• Original Algorithm:
• Train classifier 1: use random subset of examples without replacement
• Train classifier 2: use a second random subset of examples without
replacement and add 50% of examples misclassified by classifier 1
• Train classifier 3: use examples that classifiers 1 and 2 disagree on
• Predict using majority vote from 3 classifiers
Boosting – Adaboost (Adaptive Boosting)

Assign equal weights • Assign larger weights to • Assign larger weights to Predict with weighted
to all examples previous misclassifications training samples C1 and C2 majority vote
disagree on
• Assign smaller weights to
previous correct • Assign smaller weights to
classifications previous correct
classifications
Freund and Schapire, Experiments with a New Boosting Algorithm, 1996. Raschka and Mirjalili; Python Machine Learning
Boosting – Adaboost (Adaptive Boosting)
e.g., 1d dataset
Round 2:
update weights

Round 1: training data, weights, predictions Raschka and Mirjalili; Python Machine Learning
Boosting – Adaboost (Adaptive Boosting)
e.g., 1d dataset
1. Compute error rate (sum misclassified examples’ weights):

2. Compute coefficient used to update weights and make

majority vote prediction:
3. Update weight vector:

• Correct predictions will decrease weight and vice versa

4. Normalize weights to sum to 1:

Raschka and Mirjalili; Python Machine Learning
Boosting – Adaboost (Adaptive Boosting)

To predict, use calculated for each classifier as its weight when voting with all trained classifiers.

Idea: value the prediction of each classifier based on the accuracies they had on the training dataset.

Raschka and Mirjalili; Python Machine Learning

How to Predict with an Ensemble of Algorithms?
• Majority Voting
• Return most popular prediction from multiple prediction algorithms

• Bootstrap Aggregation, aka Bagging

• Train algorithm repeatedly on different random subsets of the training set

• Boosting
• Train algorithms that each specialize on different “hard” training examples

• Stacking
• Train a model that learns how to aggregate classifiers’ predictions
Stacked Generalization, aka Stacking
• Train meta-learner to learn the optimal weighting of each classifiers’
predictions for making the final prediction
• Algorithm:
1. Split dataset into three disjoint sets.
2. Train several base learners on the first partition.
3. Test the base learners on the second partition and third partition.
4. Train meta-learner on second partition using classifiers’ predictions as features
5. Evaluate meta-learner on third prediction using classifiers’ predictions as features

David, H. Wolpert, Stacked Generalization, 1992.

Tutorial: http://blog.kaggle.com/2017/06/15/stacking-made-easy-an-introduction-to-
stacknet-by-competitions-grandmaster-marios-michailidis-kazanova/
Ensemble Learner Won Netflix Prize “Challenge”
• In 2009 challenge, winning team won $1 million using ensemble approach:
• https://www.netflixprize.com/assets/GrandPrize2009_BPC_BigChaos.pdf
• Dataset: 5-star ratings on 17770 movies from 480189 “anonymous” users collected
by Netflix over ~7 years. In total, the number of ratings is 100,480,507.

• Netflix did not use ensemble recommendation system. Why?

• “We evaluated some of the new methods offline but the additional accuracy gains
that we measured did not seem to justify the engineering effort needed to bring
them into a production environment” - https://medium.com/netflix-techblog/netflix-
recommendations-beyond-the-5-stars-part-1-55838468f429
• Computationally slow and complex from using “sequential” training of learners
Yehuda Koren, The BellKor Solution to the Netflix Grand Prize, 2009.
Today’s Topics

• One-vs-all multiclass classification

• Classifier confidence

• Evaluation: ROC and PR-curves

• Ensemble learning

cz4041 9 Ensemble
No ratings yet
cz4041 9 Ensemble
54 pages
VTU Module-4 Chapter-2 Ensemble Learning and Random Forests
No ratings yet
VTU Module-4 Chapter-2 Ensemble Learning and Random Forests
61 pages
Step by Step OBIEE 12C Installation on Windows – BI Publisher Installation, BI Analytic Installation – Obiee by Pavan
No ratings yet
Step by Step OBIEE 12C Installation on Windows – BI Publisher Installation, BI Analytic Installation – Obiee by Pavan
44 pages
Ch 7 - Ensemble Learning and Random Forests
No ratings yet
Ch 7 - Ensemble Learning and Random Forests
78 pages
INT524 unit3
No ratings yet
INT524 unit3
35 pages
Multiclass Classification
No ratings yet
Multiclass Classification
45 pages
3ML.02.MainConcepts_Evaluation
No ratings yet
3ML.02.MainConcepts_Evaluation
35 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
Scada PDF
100% (1)
Scada PDF
59 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Int3209 - Data Mining: Week 5: Classification Model Improvements
No ratings yet
Int3209 - Data Mining: Week 5: Classification Model Improvements
56 pages
Lecture 11_09.09.24 Classification Part 1
No ratings yet
Lecture 11_09.09.24 Classification Part 1
51 pages
Mit LCS TM 528
No ratings yet
Mit LCS TM 528
23 pages
FALLSEM2024-25_BCSE209L_TH_VL2024250101735_2024-07-25_Reference-Material-I
No ratings yet
FALLSEM2024-25_BCSE209L_TH_VL2024250101735_2024-07-25_Reference-Material-I
37 pages
BSC ML CH1.pptx
No ratings yet
BSC ML CH1.pptx
63 pages
CH-5_ML
No ratings yet
CH-5_ML
36 pages
04-EnsembleLearning
No ratings yet
04-EnsembleLearning
40 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
Chapter07 Ensemble Learning
No ratings yet
Chapter07 Ensemble Learning
21 pages
Ensemble_Learning_SA
No ratings yet
Ensemble_Learning_SA
27 pages
Multi Threading 2
No ratings yet
Multi Threading 2
47 pages
14 Model Ensembles
No ratings yet
14 Model Ensembles
63 pages
Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
ML Unit-3
No ratings yet
ML Unit-3
16 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
XII Maths
No ratings yet
XII Maths
8 pages
Ensemble_learning
No ratings yet
Ensemble_learning
12 pages
AIML-HC Mod 03
No ratings yet
AIML-HC Mod 03
46 pages
19-Introduction classification algorithm-18-09-2024
No ratings yet
19-Introduction classification algorithm-18-09-2024
102 pages
Module 6
No ratings yet
Module 6
24 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
Automated Patient and Doctor Handling System
100% (1)
Automated Patient and Doctor Handling System
5 pages
lec5_Classification
No ratings yet
lec5_Classification
27 pages
Module 4 - Classification (1)
No ratings yet
Module 4 - Classification (1)
10 pages
Technique Library PDF
No ratings yet
Technique Library PDF
328 pages
UNIT-5 ML notes
No ratings yet
UNIT-5 ML notes
24 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Thuraya Marinestar Firmware S1 1.1 Upgrade Userguide
No ratings yet
Thuraya Marinestar Firmware S1 1.1 Upgrade Userguide
10 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
Sample Question Etool (Excel) - Secondary
No ratings yet
Sample Question Etool (Excel) - Secondary
8 pages
Pci Dss v3 2 1 Saq A Compliance Standards
No ratings yet
Pci Dss v3 2 1 Saq A Compliance Standards
24 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 3 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 3 Notes
21 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Binary, Multi-class & Multi-label Classification
No ratings yet
Binary, Multi-class & Multi-label Classification
6 pages
Final Requirement 1
No ratings yet
Final Requirement 1
12 pages
Nozzle
No ratings yet
Nozzle
5 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Lecture 17 - Ensemble Learning
No ratings yet
Lecture 17 - Ensemble Learning
31 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
Sahi DB
No ratings yet
Sahi DB
3 pages
Unit 4 ML
No ratings yet
Unit 4 ML
28 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Ensemble Learning
No ratings yet
Ensemble Learning
16 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
SCADAPack 47x
No ratings yet
SCADAPack 47x
10 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Veeam Backup 12 0 Whats New
No ratings yet
Veeam Backup 12 0 Whats New
32 pages
Lec. - 1 - (Linear Eqns.)
No ratings yet
Lec. - 1 - (Linear Eqns.)
8 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
151 pages
Dbms Lab Manual 10CSL58
100% (2)
Dbms Lab Manual 10CSL58
34 pages
ML Unit 3 r20 Jntuk
No ratings yet
ML Unit 3 r20 Jntuk
22 pages
EE 122 Digital Electronics
No ratings yet
EE 122 Digital Electronics
3 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Plumbing Work Program
0% (1)
Plumbing Work Program
1 page
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
EPLAN Layers
No ratings yet
EPLAN Layers
10 pages
Unit 3
No ratings yet
Unit 3
99 pages
VU2ABS Broadband Hexbeam Assembly and Installation Guide
No ratings yet
VU2ABS Broadband Hexbeam Assembly and Installation Guide
6 pages
Unit-3 ML P (1) PPTs by DR KSR
No ratings yet
Unit-3 ML P (1) PPTs by DR KSR
21 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Survey On Multiclass Classification Methods
No ratings yet
Survey On Multiclass Classification Methods
9 pages
Crystallization In: Four With Usually Into The If by in Vacuum
No ratings yet
Crystallization In: Four With Usually Into The If by in Vacuum
1 page
Jntuk R20 ML Unit-Iii
100% (1)
Jntuk R20 ML Unit-Iii
21 pages
Real Numbers Answer 1681633012 811
No ratings yet
Real Numbers Answer 1681633012 811
9 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Tamil Typing Practice Book Free Download PDF
No ratings yet
Tamil Typing Practice Book Free Download PDF
2 pages
ZVEI BR Technical Cleanliness Engl
No ratings yet
ZVEI BR Technical Cleanliness Engl
100 pages
Young H Kim - Ana Tameru - A Basic Approach To Precalculus Trigonometry - Preparing To Succeed in Calculus-Cognella Academic Publishing (2015)
100% (1)
Young H Kim - Ana Tameru - A Basic Approach To Precalculus Trigonometry - Preparing To Succeed in Calculus-Cognella Academic Publishing (2015)
129 pages
Combining Classifiers: Outline
No ratings yet
Combining Classifiers: Outline
15 pages
PCK 136 Building and Enhancing New Literacies Across The Curriculum
100% (1)
PCK 136 Building and Enhancing New Literacies Across The Curriculum
9 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
CHAP 1.the Demand For Audit and Other Assurance Services
No ratings yet
CHAP 1.the Demand For Audit and Other Assurance Services
11 pages
Blackmail Format PDF
No ratings yet
Blackmail Format PDF
1 page
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Ds-8Acsh: Slim Type Sata Super Allwrite
No ratings yet
Ds-8Acsh: Slim Type Sata Super Allwrite
2 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

06 EnsembleLearning

Uploaded by

06 EnsembleLearning

Uploaded by

Ensemble Learning

• One-vs-all multiclass classification

• Evaluation: ROC and PR-curves

• One-vs-all multiclass classification

• Evaluation: ROC and PR-curves

Figure Source: http://mlwiki.org/index.php/One-vs-All_Classification

Perceptron Nearest Neighbor

Figure Source: http://mlwiki.org/index.php/One-vs-All_Classification

• Often leads to unbalanced distributions during learning; i.e., when

Figure Source: http://mlwiki.org/index.php/One-vs-All_Classification

• (Imperfect) Approach: use from N classifiers the most confident

Figure Source: http://mlwiki.org/index.php/One-vs-All_Classification

• One-vs-all multiclass classification

• Evaluation: ROC and PR-curves

14 yes; 6 no 100 no 120 yes 30 yes; 70 no

• Conditional probability P (Y|X) for the most probable class

• Classifiers can make mistakes in estimating their confidence level

• External calibration procedures can address this issue (e.g., using

• One-vs-all multiclass classification

• Evaluation: ROC and PR-curves

• Observation: A threshold must be chosen to define the point at which

• Motivation: how to choose the threshold?

Python Machine Learning; Raschkka & Mirjalili

Python Machine Learning; Raschkka & Mirjalili

Summarizes performance based only on

Summarizes performance based only on

• Which classifier is the best?

3. What is the area under the ROC and PR 0

• One-vs-all multiclass classification

• Evaluation: ROC and PR-curves

More than 1: Ensemble

• e.g., n = 11, = 0.25; k = 6: probability of error is ~0.034 which is much lower

• Bootstrap Aggregation, aka Bagging

• Bootstrap Aggregation, aka Bagging

Figure Credit: Raschka & Mirjalili, Python Machine Learning.

Prediction Model Prediction Model Prediction Model

Prediction Prediction Prediction

Prediction Model Prediction Model Prediction Model Prediction Model

“Yes” “No” “Yes” “Yes”

Prediction Model Prediction Model Prediction Model

Probability Probability Probability

Prediction Model Prediction Model Prediction Model Prediction Model

90% “Yes” 20% Yes 55% “Yes” 45% “Yes”

“Yes” (210/4 = 52.5% Yes)

Prediction Model Prediction Model Prediction Model Prediction Model

“Cat” “Dog” “Pig” “Cat”

Prediction Model Prediction Model Prediction Model Prediction Model

90% “Yes” 20% Yes 55% “Yes” 45% “Yes”

52.5% (average prediction)

Figure Credit: Raschka & Mirjalili, Python Machine Learning.

• Bootstrap Aggregation, aka Bagging

Figure Credit: Raschka & Mirjalili, Python Machine Learning.

Each classifier trained on Breiman, Bagging Predictors, 1994.

Breiman, Bagging Predictors, 1994.

Prediction Model Prediction Model Prediction Model Prediction Model

• Predict as done for “majority voting”

Fit decision trees by

Breiman, Bagging Predictors, 1994.

Fellow et al., Deep Learning (chapter 7), 2016.

Fellow et al., Deep Learning (chapter 7), 2016.

Fellow et al., Deep Learning (chapter 7), 2016.

• Bootstrap Aggregation, aka Bagging

2. Compute coefficient used to update weights and make

• Correct predictions will decrease weight and vice versa

4. Normalize weights to sum to 1:

Raschka and Mirjalili; Python Machine Learning

• Bootstrap Aggregation, aka Bagging

David, H. Wolpert, Stacked Generalization, 1992.

• Netflix did not use ensemble recommendation system. Why?

• One-vs-all multiclass classification

• Evaluation: ROC and PR-curves

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.