0% found this document useful (0 votes)

46 views31 pages

Supervised Learning With Scikit-Learn: How Good Is Your Model?

This document discusses supervised machine learning techniques using scikit-learn including classification metrics, class imbalance, confusion matrices, logistic regression, ROC curves, and area under the ROC curve. It also covers hyperparameter tuning using grid search cross-validation and evaluating models on a hold-out test set not used for training or validation.

Uploaded by

Victor Ng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views31 pages

Supervised Learning With Scikit-Learn: How Good Is Your Model?

Uploaded by

Victor Ng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

SUPERVISED LEARNING WITH SCIKIT-LEARN

How good is
your model?
Supervised Learning with scikit-learn

Classification metrics
● Measuring model performance with accuracy:
● Fraction of correctly classified samples
● Not always a useful metric
Supervised Learning with scikit-learn

Class imbalance example: Emails

● Spam classification
● 99% of emails are real; 1% of emails are spam
● Could build a classifier that predicts ALL emails as real
● 99% accurate!
● But horrible at actually classifying spam
● Fails at its original purpose
● Need more nuanced metrics
Supervised Learning with scikit-learn

Diagnosing classification predictions

● Confusion matrix

● Accuracy:
Supervised Learning with scikit-learn

Metrics from the confusion matrix

● Precision :

● Recall :

● F1 score :

● High precision: Not many real emails predicted as spam

● High recall: Predicted most spam emails correctly
Supervised Learning with scikit-learn

Confusion matrix in scikit-learn

In [1]: from sklearn.metrics import classification_report

In [2]: from sklearn.metrics import confusion_matrix

In [3]: knn = KNeighborsClassifier(n_neighbors=8)

In [4]: X_train, X_test, y_train, y_test = train_test_split(X, y,

...: test_size=0.4, random_state=42)

In [5]: knn.fit(X_train, y_train)

In [6]: y_pred = knn.predict(X_test)

Supervised Learning with scikit-learn

Confusion matrix in scikit-learn

In [7]: print(confusion_matrix(y_test, y_pred))
[[52 7]
[ 3 112]]

In [8]: print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.95 0.88 0.91 59

1 0.94 0.97 0.96 115

avg / total 0.94 0.94 0.94 174

SUPERVISED LEARNING WITH SCIKIT-LEARN

Let’s practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN

Logistic regression
and the ROC curve
Supervised Learning with scikit-learn

Logistic regression for binary classification

● Logistic regression outputs probabilities
● If the probability ‘p’ is greater than 0.5:
● The data is labeled ‘1’
● If the probability ‘p’ is less than 0.5:
● The data is labeled ‘0’
Supervised Learning with scikit-learn

Linear decision boundary

Source: Andreas Müller & Sarah Guido, Introduction to Machine Learning with Python
Supervised Learning with scikit-learn

Logistic regression in scikit-learn

In [1]: from sklearn.linear_model import LogisticRegression

In [2]: from sklearn.model_selection import train_test_split

In [3]: logreg = LogisticRegression()

In [4]: X_train, X_test, y_train, y_test = train_test_split(X, y,

...: test_size=0.4, random_state=42)

In [5]: logreg.fit(X_train, y_train)

In [6]: y_pred = logreg.predict(X_test)

Supervised Learning with scikit-learn

Probability thresholds
● By default, logistic regression threshold = 0.5
● Not specific to logistic regression
● k-NN classifiers also have thresholds
● What happens if we vary the threshold?
Supervised Learning with scikit-learn

The ROC curve

p=0

p = 0.5

p=1
Supervised Learning with scikit-learn

Plo!ing the ROC curve

In [1]: from sklearn.metrics import roc_curve

In [2]: y_pred_prob = logreg.predict_proba(X_test)[:,1]

In [3]: fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)

In [4]: plt.plot([0, 1], [0, 1], 'k--')

In [5]: plt.plot(fpr, tpr, label='Logistic Regression')

In [6]: plt.xlabel('False Positive Rate’)

In [7]: plt.ylabel('True Positive Rate')

In [8]: plt.title('Logistic Regression ROC Curve')

In [9]: plt.show();
Supervised Learning with scikit-learn

Plo!ing the ROC curve

logreg.predict_proba(X_test)[:,1]
SUPERVISED LEARNING WITH SCIKIT-LEARN

Let’s practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN

Area under
the ROC curve
Supervised Learning with scikit-learn

Area under the ROC curve (AUC)

● Larger area under the ROC curve = be!er model
Supervised Learning with scikit-learn

AUC in scikit-learn
In [1]: from sklearn.metrics import roc_auc_score

In [2]: logreg = LogisticRegression()

In [3]: X_train, X_test, y_train, y_test = train_test_split(X, y,

...: test_size=0.4, random_state=42)

In [4]: logreg.fit(X_train, y_train)

In [5]: y_pred_prob = logreg.predict_proba(X_test)[:,1]

In [6]: roc_auc_score(y_test, y_pred_prob)

Out[6]: 0.997466216216
Supervised Learning with scikit-learn

AUC using cross-validation

In [7]: from sklearn.model_selection import cross_val_score

In [8]: cv_scores = cross_val_score(logreg, X, y, cv=5,

...: scoring='roc_auc')

In [9]: print(cv_scores)
[ 0.99673203 0.99183007 0.99583796 1. 0.96140652]
SUPERVISED LEARNING WITH SCIKIT-LEARN

Let’s practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN

Hyperparameter
tuning
Supervised Learning with scikit-learn

Hyperparameter tuning
● Linear regression: Choosing parameters
● Ridge/lasso regression: Choosing alpha
● k-Nearest Neighbors: Choosing n_neighbors
● Parameters like alpha and k: Hyperparameters
● Hyperparameters cannot be learned by fi!ing the model
Supervised Learning with scikit-learn

Choosing the correct hyperparameter

● Try a bunch of diﬀerent hyperparameter values
● Fit all of them separately
● See how well each performs
● Choose the best performing one
● It is essential to use cross-validation
Supervised Learning with scikit-learn

Grid search cross-validation

0.5 0.701 0.703 0.697 0.696

0.4 0.699 0.702 0.698 0.702
0.3 0.721 0.726 0.713 0.703
0.2 0.706 0.705 0.704 0.701
C 0.1 0.698 0.692 0.688 0.675
0.1 0.2 0.3 0.4

Alpha
Supervised Learning with scikit-learn

GridSearchCV in scikit-learn
In [1]: from sklearn.model_selection import GridSearchCV

In [2]: param_grid = {'n_neighbors': np.arange(1, 50)}

In [3]: knn = KNeighborsClassifier()

In [4]: knn_cv = GridSearchCV(knn, param_grid, cv=5)

In [5]: knn_cv.fit(X, y)

In [6]: knn_cv.best_params_
Out[6]: {'n_neighbors': 12}

In [7]: knn_cv.best_score_
Out[7]: 0.933216168717
SUPERVISED LEARNING WITH SCIKIT-LEARN

Let’s practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN

Hold-out set for

final evaluation
Supervised Learning with scikit-learn

Hold-out set reasoning

● How well can the model perform on never before seen data?
● Using ALL data for cross-validation is not ideal
● Split data into training and hold-out set at the beginning
● Perform grid search cross-validation on training set
● Choose best hyperparameters and evaluate on hold-out set
SUPERVISED LEARNING WITH SCIKIT-LEARN

Let’s practice!

Math Curriculum Evaluation Tool-3
No ratings yet
Math Curriculum Evaluation Tool-3
4 pages
CELPIP Reading Tips To Get A Higher Score 2022
No ratings yet
CELPIP Reading Tips To Get A Higher Score 2022
2 pages
ES605/ES805: Research Methodology (2-0-6) 1/2017
No ratings yet
ES605/ES805: Research Methodology (2-0-6) 1/2017
2 pages
How Good Is Your Model?: Andreas Müller
No ratings yet
How Good Is Your Model?: Andreas Müller
54 pages
Supervised Learning Using Python - Chapter3
No ratings yet
Supervised Learning Using Python - Chapter3
47 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Ch1 - Slides - Supervised Learning
No ratings yet
Ch1 - Slides - Supervised Learning
32 pages
Supervised Learning Using Python - Chapter1
No ratings yet
Supervised Learning Using Python - Chapter1
34 pages
Machine Learning With Scikit-Learn: George Boorman
No ratings yet
Machine Learning With Scikit-Learn: George Boorman
34 pages
Supervised Learning With Scikit-Learn
No ratings yet
Supervised Learning With Scikit-Learn
178 pages
Slides (A12 A14)
No ratings yet
Slides (A12 A14)
353 pages
Supervised Learning With Scikit-Learn: Introduction To Regression
No ratings yet
Supervised Learning With Scikit-Learn: Introduction To Regression
31 pages
Vtu ML
No ratings yet
Vtu ML
62 pages
Supervised Learning With Scikit-Learn
No ratings yet
Supervised Learning With Scikit-Learn
178 pages
Supervised Learning: Andreas Müller
No ratings yet
Supervised Learning: Andreas Müller
43 pages
Scikit Learn
No ratings yet
Scikit Learn
25 pages
Scikit-Learn: Library For Machine Learning and Data Science With Python
No ratings yet
Scikit-Learn: Library For Machine Learning and Data Science With Python
11 pages
Introduction To Regression: George Boorman
No ratings yet
Introduction To Regression: George Boorman
50 pages
Lesson 09 - Introduction To Model Building
No ratings yet
Lesson 09 - Introduction To Model Building
85 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
Introduction To Scikit Learn
100% (1)
Introduction To Scikit Learn
108 pages
Practical Guide To Scikit-Learn For Data Science
No ratings yet
Practical Guide To Scikit-Learn For Data Science
27 pages
Linear Regression: Scikit-Learn
No ratings yet
Linear Regression: Scikit-Learn
3 pages
Linear Regression: Scikit-Learn
No ratings yet
Linear Regression: Scikit-Learn
3 pages
SK Learn
No ratings yet
SK Learn
9 pages
Python SciKit Learn Tutorial - DigitalOcean
No ratings yet
Python SciKit Learn Tutorial - DigitalOcean
11 pages
TP02
No ratings yet
TP02
3 pages
Data Mining Essen, Als 2: Data Mining in Prac, Ce, With Python
No ratings yet
Data Mining Essen, Als 2: Data Mining in Prac, Ce, With Python
31 pages
Data Science II: Charles C.N. Wang
No ratings yet
Data Science II: Charles C.N. Wang
38 pages
ML File - 1
No ratings yet
ML File - 1
12 pages
Scikit
No ratings yet
Scikit
3 pages
SocBiz-Winter Analytics Resources
No ratings yet
SocBiz-Winter Analytics Resources
7 pages
Scikit-Learn-Exercises - Jupyter Notebook
100% (2)
Scikit-Learn-Exercises - Jupyter Notebook
28 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Unveiling The Power
No ratings yet
Unveiling The Power
17 pages
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
100% (1)
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
1 page
Machine Learning
No ratings yet
Machine Learning
3 pages
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
Scikit Learn
No ratings yet
Scikit Learn
107 pages
Supervised Learning With Scikit-Learn: Preprocessing Data
No ratings yet
Supervised Learning With Scikit-Learn: Preprocessing Data
32 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Scikit Learn Cheat Sheet Python
No ratings yet
Scikit Learn Cheat Sheet Python
1 page
Logistic Regression
100% (2)
Logistic Regression
30 pages
About Scikit
No ratings yet
About Scikit
3 pages
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
No ratings yet
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
4 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
Machine Learning Lab Programs
No ratings yet
Machine Learning Lab Programs
6 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Scikit Learn - Quick Guide
No ratings yet
Scikit Learn - Quick Guide
111 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Intro To Machine Learning 101 Python Data Science v2
No ratings yet
Intro To Machine Learning 101 Python Data Science v2
101 pages
Scikit Learn
No ratings yet
Scikit Learn
2 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
From Everand
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
Blaine Bateman
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Reinforcement Learning: A Practical Guide to Algorithms
From Everand
Reinforcement Learning: A Practical Guide to Algorithms
Trilokesh Khatri
No ratings yet
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
From Everand
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
Sebastian Raschka
4/5 (20)
PMI-ACP Exam Companion : Q & A with Explanations
From Everand
PMI-ACP Exam Companion : Q & A with Explanations
SUJAN
No ratings yet
OpenCV 3 Blueprints
From Everand
OpenCV 3 Blueprints
Joseph Howse
No ratings yet
MGT 6753 - Mgrl. Acct.,2018
100% (1)
MGT 6753 - Mgrl. Acct.,2018
36 pages
Homework 4
No ratings yet
Homework 4
2 pages
Transcript From College, 6101 East Oltrof, Friday 21st 1:30, Go in From The Back, Questions, Computer Test, 5124378140
No ratings yet
Transcript From College, 6101 East Oltrof, Friday 21st 1:30, Go in From The Back, Questions, Computer Test, 5124378140
1 page
Come To Me. You Who Are Oppressed. You Who Toil Beneath The Heel of The World Above. Suffer Not The Lies of The Wise. Come To Me. I Will Set You Free
No ratings yet
Come To Me. You Who Are Oppressed. You Who Toil Beneath The Heel of The World Above. Suffer Not The Lies of The Wise. Come To Me. I Will Set You Free
1 page
X-Stine: The Reckoning
No ratings yet
X-Stine: The Reckoning
2 pages
Synthesis of N - ( (Tert-Butoxy) Carbonyl) - 3 - (9,10-Dihydro-9-Oxoacridin-2-Yl) - L - Alanine, A Newfluorescent Amino Acid Derivative
No ratings yet
Synthesis of N - ( (Tert-Butoxy) Carbonyl) - 3 - (9,10-Dihydro-9-Oxoacridin-2-Yl) - L - Alanine, A Newfluorescent Amino Acid Derivative
6 pages
COM 08 11328asdfds
No ratings yet
COM 08 11328asdfds
9 pages
2.4 Individual Performance Record PATHFI2-1st Half
No ratings yet
2.4 Individual Performance Record PATHFI2-1st Half
2 pages
Call For Papers AI in Education 2025
No ratings yet
Call For Papers AI in Education 2025
2 pages
Kareen Agcopra PMES Implemetation Action Plan.
No ratings yet
Kareen Agcopra PMES Implemetation Action Plan.
2 pages
Dream Classroom Essay
No ratings yet
Dream Classroom Essay
5 pages
Concept of Inclusive Education
No ratings yet
Concept of Inclusive Education
16 pages
Guide To Implementation
No ratings yet
Guide To Implementation
573 pages
ProfEd 111 (BECED)
No ratings yet
ProfEd 111 (BECED)
11 pages
Sem-5 Syllabus
No ratings yet
Sem-5 Syllabus
38 pages
Vasile Stelian Orga: Mihai Paul Precup Aquaserv Company P-Ţa Unirii, Nr.5 Ludus Romania
No ratings yet
Vasile Stelian Orga: Mihai Paul Precup Aquaserv Company P-Ţa Unirii, Nr.5 Ludus Romania
2 pages
Teaching and Learning English in Science Session Guide
100% (1)
Teaching and Learning English in Science Session Guide
17 pages
Grant Proposal Topic Verge Learning Management System
No ratings yet
Grant Proposal Topic Verge Learning Management System
8 pages
Reviewer-Child and Adult Devpt
No ratings yet
Reviewer-Child and Adult Devpt
6 pages
NECST - Journal of Teacher Training
No ratings yet
NECST - Journal of Teacher Training
52 pages
Igcse Speaking Classroom Activities
No ratings yet
Igcse Speaking Classroom Activities
30 pages
Learning Module In: English 9
No ratings yet
Learning Module In: English 9
5 pages
MGT 3013 Questions Ch03
100% (1)
MGT 3013 Questions Ch03
48 pages
Guided Backpropagation
No ratings yet
Guided Backpropagation
11 pages
Parents-Teachers Perceptions On The Impact of Modular Learning
79% (24)
Parents-Teachers Perceptions On The Impact of Modular Learning
16 pages
Eight Habits of Highly Effective 21st Century Teachers
100% (3)
Eight Habits of Highly Effective 21st Century Teachers
6 pages
Ewc661 Draft Proposal
No ratings yet
Ewc661 Draft Proposal
5 pages
Instructional Materials and English Lang
No ratings yet
Instructional Materials and English Lang
10 pages
French Grammar & Expression - II: Format For Course Curriculum
No ratings yet
French Grammar & Expression - II: Format For Course Curriculum
4 pages
( (Đề) G11 - U8
No ratings yet
( (Đề) G11 - U8
3 pages
Alicia Gilman: Objective
No ratings yet
Alicia Gilman: Objective
2 pages
Behaviorism
No ratings yet
Behaviorism
30 pages
Unsupervised Domain Adaptation by Backpropagation
No ratings yet
Unsupervised Domain Adaptation by Backpropagation
11 pages
14 - Cui and Wu (2017)
No ratings yet
14 - Cui and Wu (2017)
22 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Supervised Learning With Scikit-Learn: How Good Is Your Model?

Uploaded by

Supervised Learning With Scikit-Learn: How Good Is Your Model?

Uploaded by

SUPERVISED LEARNING WITH SCIKIT-LEARN

Class imbalance example: Emails

Diagnosing classification predictions

Metrics from the confusion matrix

● High precision: Not many real emails predicted as spam

Confusion matrix in scikit-learn

In [2]: from sklearn.metrics import confusion_matrix

In [3]: knn = KNeighborsClassifier(n_neighbors=8)

In [4]: X_train, X_test, y_train, y_test = train_test_split(X, y,

In [5]: knn.fit(X_train, y_train)

In [6]: y_pred = knn.predict(X_test)

Confusion matrix in scikit-learn

In [8]: print(classification_report(y_test, y_pred))

0 0.95 0.88 0.91 59

avg / total 0.94 0.94 0.94 174

Logistic regression for binary classification

Linear decision boundary

Logistic regression in scikit-learn

In [2]: from sklearn.model_selection import train_test_split

In [3]: logreg = LogisticRegression()

In [4]: X_train, X_test, y_train, y_test = train_test_split(X, y,

In [5]: logreg.fit(X_train, y_train)

In [6]: y_pred = logreg.predict(X_test)

The ROC curve

Plo!ing the ROC curve

In [2]: y_pred_prob = logreg.predict_proba(X_test)[:,1]

In [3]: fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)

In [4]: plt.plot([0, 1], [0, 1], 'k--')

In [5]: plt.plot(fpr, tpr, label='Logistic Regression')

In [6]: plt.xlabel('False Positive Rate’)

In [7]: plt.ylabel('True Positive Rate')

In [8]: plt.title('Logistic Regression ROC Curve')

Plo!ing the ROC curve

Area under the ROC curve (AUC)

In [2]: logreg = LogisticRegression()

In [3]: X_train, X_test, y_train, y_test = train_test_split(X, y,

In [4]: logreg.fit(X_train, y_train)

In [5]: y_pred_prob = logreg.predict_proba(X_test)[:,1]

In [6]: roc_auc_score(y_test, y_pred_prob)

AUC using cross-validation

In [8]: cv_scores = cross_val_score(logreg, X, y, cv=5,

Choosing the correct hyperparameter

Grid search cross-validation

0.5 0.701 0.703 0.697 0.696

In [2]: param_grid = {'n_neighbors': np.arange(1, 50)}

In [3]: knn = KNeighborsClassifier()

In [4]: knn_cv = GridSearchCV(knn, param_grid, cv=5)

Hold-out set for

Hold-out set reasoning

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.