0% found this document useful (0 votes)

7 views25 pages

Slides 0411

The document provides an overview of machine learning, including its definition, key assumptions, and various problems such as classification, clustering, and regression. It discusses the evolution of machine learning, connections to other subjects, and evaluation methodologies, including performance metrics like error rate and AUC. Additionally, it covers the importance of tuning parameters and the use of confusion matrices in assessing model performance.

Uploaded by

gfidelis838

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views25 pages

Slides 0411

Uploaded by

gfidelis838

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Introduction

Formalism and terminology

Tuning and evaluation

Data Science Lecture Notes 07

Donghui Yan

University of Massachusetts Dartmouth

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

Outline

• Introduction
• Formalism and terminology
• Evaluation methodology

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

Machine learning in real life

• Search engine design

I To max chance one gets what he searches in top K entries
• Computational advertising
I Placement of ads to maximize profit
• Design of e-commerce web site
I Selection of selling items to max click thru rate (or profit)
• Selection of headline news
I e.g., which news as headline in news portal at Yahoo, CNN etc
• Object recognition
I OCT hand digits recognition by USPS
• House (pool) cleaning robot.

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

State of the art in AI

• AlphaGo
I Beat Ke Jie (ranked #1 in world) 3:0 in 2017
I Major milestone in AI research
• Self-driving
• Conversation robot.

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

What is machine learning (ML)?

Definition. Machine learning refers to application, methodology,

and theory relevant to the automatic learning of patterns or
regularities from data.

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

Two important assumptions in Machine learning

• The future is related to the past
I The phenomenon is stationary, or the past and future drawn from
the same probability distribution
• Knowledge about the problem under study
I Generalization only possible when knowledge is encoded
I Features being the most elementary form.

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

Problems in machine learning

• Classification
I Y ∈ C = {c1 , c2 , ..., ck }, called labels
• Clustering
I Y not given (often called unsupervised learning)
• Regression
I Y ∈ R, called response
• Ranking
• And a lot more new topics emerging in recent years
I Topic model (e.g., what is the topic of a blogger article)
I Manifold (topological) learning
I Salient sentence extraction
I Graph learning etc.

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

Problems in machine learning

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

A little history about the evolving of machine learning

• Early days
I The AI approach
– 1956 Dartmouth conference marks the start of AI
– Perceptron (Rosenblatt, 1957)
– Dying of the research on Neural network late 1960’s
– Various induction machine, expert system, fuzzy system etc
– PAC learning (Valiant 1984)
I The statistical approach
– Statistical learning theory (Vapnik and Chervonenkis, 1964-1974)
– Fisher’s LDA, logistic regression, k-means, mixture analysis etc
– Early nonparametric statistics (e.g., kNN)
• The revitalization of Neural network in mid 1980’s
• SVM, boosting, Random Forests from early 1990’s and on
I The statistical approach is gaining popularity
• Neural network back again under guise of deep learning (2008-).
Donghui Yan UMassD Lecture Notes
Introduction
Formalism and terminology
Tuning and evaluation

Connections to other subjects

• Aspects of machine learning

I Machine
– Computer science (algorithms to realize machine learning)
I Learning
– Mathematics, probability and statistics (analysis and theory)
I Applications
– Provides motivation and ultimate testbed for learning algorithms.

• What’s the connection of (nonparametric) statistics and ML?

I Both learn from the data
I Nonparametric statistics ⊆ ML (by my definition)
I But, as a matter of fact, ML focus more on discrete problems
(e.g., classification) while (nonparametric) statistics more on the
continuous world (e.g., regression).

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

Predictive learning or classification

• Given data (X1 , Y1 ), ..., (Xn , Yn ), we wish to learn the

relationship f : X 7→ Y s.t.

The future prediction is the best

e.g., smallest error rate (or precision/recall, AUC etc)

I Xi called features, Yi ∈ {1, 2, ..., J} called labels or classes
I (X1 , Y1 ), ..., (Xn , Yn ) is called a training sample
I f is called the trained or fitted model

♠ The best possible decision rule (Bayes rule)

I As if one knows the distribution (X, Y ) (often unknown).

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

The classification problem

• What does the solution to a classification problem really do?
– Identifying the decision (or class) boundary.

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

Loss function
Depends on the application, typical loss functions
• The 0-1 loss

1, if f (X) 6= Y
cost =
0, otherwise.
I A loss function of special interest and most commonly used
• Cost-sensitive loss functions, i.e., a cost matrix, for a 6= b,

0 b
a 0
I Suitable when errors in diff classes have diff consequence
– e.g., fraud detection, cost a when mistaking fraud as normal and b
when mistaking normal as fraud.

Donghui Yan UMassD Lecture Notes

Introduction
Formalism and terminology
Tuning and evaluation

Function class

Function class F = {f } determines the type of classifiers

• Linear classifiers
I Logistic regression: logit(P (Y |X)) = Xβ
I SVM: f (X) = ni=1 wi K(Xi , X) + w0
P

• Boosting
PT (n)
I f (X) = i=1 ai h(X1 , ..., Xn , X)
with h from some data dependent basis library
• Tree-based classifiers
I C4.5, CART
I Random Forests and its variants.

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

Evaluation of a machine learning algorithm

• Train on training data and evaluate on test data

I Most common
• Cross-validation
I Split data into J partitions
I Use any one of the J partitions as test and rest for training
I Average result on all J tests
• Bootstrap and use out of bag estimate
I Train on a sample with replacement of all observations in the data
I Test on the rest
I Repeat many times and average results.

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

Evaluation methodology

• Have separate training/test set

• Fit a model (e.g., logistic model) on the training set
• Evaluate the trained model on the test set
I Correct classification when the label matches (0/1 loss)
I More advanced metric like AUC.

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

Evaluation illustrated

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

Selection of tuning parameters

• Treat the training set as the entire data

• Split training data = training set + tuning set
• Treat tuning set → test set and proceed as usual evaluation
I Calculate a performance metric, e.g., accuracy
I Select parameters that lead to the best performance
I Use selected parameters for final performance evaluation.

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

Performance metrics

Some popular performance metrics

• Error rate
– Most commonly used in statistics and machine learning
• Kappa statistics
– Commonly used in remote sensing, medical assessment
• Area under curve (AUC)
– When detection and false alarm rate matter, e.g., biomarker
discovery, anomaly/fraud detection.

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

Kappa statistic

• The idea is to measure the amount of departure from that arises

purely by chance
I κ is calculated from the confusion matrix
I κ takes into account sample sizes for different classes
I Controversial (many modifications going on).

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

The confusion matrix

Label 1 ... j ... C Total

1 n11 ... n1j ... n1C n1.
... ... ... ... ... ... ...
i ni1 ... nij ... niC ni.
... ... ... ... ... ... ...
C nC1 ... nCj ... nCC nC.
Total n.1 ... n.j ... n.C n

I C = # classes
I nij = # points from class i but classified as j
I n = size of the sample.

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

Calculation of κ

Definition. The κ coefficient is calculated as

Pobserved − Pexpected
κ=
1 − Pexpected

where
C C
X ni. n.i 1X
Pexpected = . , Pobserved = nii .
n n n
i=1 i=1

I Pexpected measures chance that observed and true labels agree

I Pobserved measures proportion of observations labeled correctly.

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

Example κ statistics
Confusion matrix of Logit on the South Africa Heart data

True/Predicted 1 2 Total
1 130 41 171
2 22 38 60
Total 152 79 231

I n1. = 171, n2. = 60, n.1 = 151, n.2 = 79, n = 231

I Pexpected = n1. n.1 /n + n2. n.2 /n =
171 · 152/2312 + 60 · 79/2312 = 0.5759
I Pobserved = (n11 + n22 )/n = (130 + 38)/231 = 0.7273
I κ = (Pobserved − Pexpected )/(1 − Pexpected ) =
(0.7273 − 0.5759)/(1 − 0.5759) = 0.36.

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

Area under curve

• ROC curve is a graphical plot of true positive rate (TPR) vs.

false positive rate (FPR) as discrimination threshold varies
I TPR = % true positives out of the positives
I FPR = % false positives out of the negatives
• Example (assume class 1 = positive, 2 = negative)
True/Predicted 1 2 Total
1 130 41 171
(true pos) (false neg) (pos)
2 22 38 60
(false pos) (true neg) (neg)
Total 152 79 231

Donghui Yan UMassD Lecture Notes

Introduction
Evaluation
Formalism and terminology
Performance metrics
Tuning and evaluation

Area under curve (AUC)

• Another measure to assess a machine learning algorithm
• Often used when cost for mis-classification is asymmetric
I e.g., intrusion as normal Vs normal as intrusion in cyber security
• R package “AUC”.

Donghui Yan UMassD Lecture Notes

Accenture Data Scientist Interview Questions
No ratings yet
Accenture Data Scientist Interview Questions
13 pages
Jeff Dean's Lecture For YC AI
100% (19)
Jeff Dean's Lecture For YC AI
86 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Introduction To ML
100% (1)
Introduction To ML
39 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
Certified Artificial Intelligence Practitioner 3
No ratings yet
Certified Artificial Intelligence Practitioner 3
36 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Noida Institute of Engineering and Technology
No ratings yet
Noida Institute of Engineering and Technology
24 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
ML - Week 1
No ratings yet
ML - Week 1
37 pages
Week 2: Machine Learning Intro: Instructor: Ting Sun
No ratings yet
Week 2: Machine Learning Intro: Instructor: Ting Sun
21 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
ML - Chapter 6 - Model Evaluation
No ratings yet
ML - Chapter 6 - Model Evaluation
65 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Strategy Deck
No ratings yet
Strategy Deck
16 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Lec 8
No ratings yet
Lec 8
35 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Unit1 ML NGP
No ratings yet
Unit1 ML NGP
106 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
ML 5units
No ratings yet
ML 5units
284 pages
7 ML
No ratings yet
7 ML
38 pages
ML - Module 1
No ratings yet
ML - Module 1
30 pages
14 Supervised Machine Learning
No ratings yet
14 Supervised Machine Learning
94 pages
03-Introduction To Machine Learning - DNN
No ratings yet
03-Introduction To Machine Learning - DNN
35 pages
DS-05 Introduction To Machine Learning
No ratings yet
DS-05 Introduction To Machine Learning
103 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
MLfinal 1
No ratings yet
MLfinal 1
7 pages
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
No ratings yet
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
61 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Introduction To ML Unit-1
No ratings yet
Introduction To ML Unit-1
90 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
AIML-HC Mod 03
No ratings yet
AIML-HC Mod 03
46 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
CS585 Lecture October03rd
No ratings yet
CS585 Lecture October03rd
146 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Hyper Parameter Turning
No ratings yet
Hyper Parameter Turning
4 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
ML MAKAUT Unit-3
No ratings yet
ML MAKAUT Unit-3
6 pages
TE - DWM Module No 3
No ratings yet
TE - DWM Module No 3
48 pages
ML Exam Preparation Tips
No ratings yet
ML Exam Preparation Tips
41 pages
The Three R's of Computer Vision:: Jitendra Malik UC Berkeley
No ratings yet
The Three R's of Computer Vision:: Jitendra Malik UC Berkeley
54 pages
dbms-10 Marks
No ratings yet
dbms-10 Marks
32 pages
How To Evaluate Machine Learning Models - Yulinda Rizky
No ratings yet
How To Evaluate Machine Learning Models - Yulinda Rizky
15 pages
Unit I
No ratings yet
Unit I
132 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
Csa4020 Deep-Learning LP 1.0 22 Csa4020 Deep-Learning LP 1.0 1 Deep Learning
No ratings yet
Csa4020 Deep-Learning LP 1.0 22 Csa4020 Deep-Learning LP 1.0 1 Deep Learning
2 pages
Cbse - Department of Skill Education: Artificial Intelligence (Subject Code - 417)
No ratings yet
Cbse - Department of Skill Education: Artificial Intelligence (Subject Code - 417)
8 pages
Machine Learning Approaches in Stock Market Prediction A
No ratings yet
Machine Learning Approaches in Stock Market Prediction A
8 pages
Intro DL 01
No ratings yet
Intro DL 01
64 pages
Written Work 2 - Attempt Review
No ratings yet
Written Work 2 - Attempt Review
6 pages
Classification and Analysis of Deep Learning Applications in Construction A Systematic Literature Review
No ratings yet
Classification and Analysis of Deep Learning Applications in Construction A Systematic Literature Review
16 pages
DL Unit2 HD
No ratings yet
DL Unit2 HD
141 pages
Seminar Paper On Deep Learning 21
No ratings yet
Seminar Paper On Deep Learning 21
37 pages
Notes Unit 1-3 Part-II
No ratings yet
Notes Unit 1-3 Part-II
20 pages
Unit 3 - Data Science, Machine Learning
No ratings yet
Unit 3 - Data Science, Machine Learning
20 pages
Module 6
No ratings yet
Module 6
24 pages
AICTE-vaani Proposal Template
100% (1)
AICTE-vaani Proposal Template
2 pages
Week 02
No ratings yet
Week 02
9 pages
ML Project Report
No ratings yet
ML Project Report
40 pages
Format of PPT For 7 CSE AND AI&DS
No ratings yet
Format of PPT For 7 CSE AND AI&DS
7 pages
Pneumonia Detection Using Deep Learning
No ratings yet
Pneumonia Detection Using Deep Learning
6 pages
SEMINAR
No ratings yet
SEMINAR
4 pages
Key Elements of Machine Learning
No ratings yet
Key Elements of Machine Learning
9 pages
Optimizing ViViT Training: Time and Memory
No ratings yet
Optimizing ViViT Training: Time and Memory
16 pages
CV Unit 4
No ratings yet
CV Unit 4
60 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
Deep Learning Powers Better Decisions in Financial Services
No ratings yet
Deep Learning Powers Better Decisions in Financial Services
29 pages
ICDL Insights Artificial Intelligence Syllabus 1.0
No ratings yet
ICDL Insights Artificial Intelligence Syllabus 1.0
4 pages
MLP Quiz-2
No ratings yet
MLP Quiz-2
4 pages
Blue Print - AI - Std10 - Preboard-1 - 24-25
No ratings yet
Blue Print - AI - Std10 - Preboard-1 - 24-25
2 pages
L2 - Problems in ML & Performance Evaluation
No ratings yet
L2 - Problems in ML & Performance Evaluation
30 pages
Course Catalogue Architecture 3
No ratings yet
Course Catalogue Architecture 3
17 pages
Web Sam
No ratings yet
Web Sam
6 pages
67637bba6c9bf69e5e5cbf31 65402517267
No ratings yet
67637bba6c9bf69e5e5cbf31 65402517267
2 pages
Breast Cancer Classification-Group240
No ratings yet
Breast Cancer Classification-Group240
4 pages
Arch
No ratings yet
Arch
9 pages
Syl London MKTG-UB9001L01 Khan
No ratings yet
Syl London MKTG-UB9001L01 Khan
8 pages
Up - Ebit 2026 - Infographic - Bachelor of Engineering in Mining Engineering - Web - zp261627
No ratings yet
Up - Ebit 2026 - Infographic - Bachelor of Engineering in Mining Engineering - Web - zp261627
4 pages
ARCH 545: Contemporary Theories of Landscape Architecture: Spring 2025
No ratings yet
ARCH 545: Contemporary Theories of Landscape Architecture: Spring 2025
13 pages
Aug2021 Undergraduate Handbook
No ratings yet
Aug2021 Undergraduate Handbook
19 pages
Miningengineering
No ratings yet
Miningengineering
16 pages
04 - BE Mechanical Engineering Spec - Mining Major - 2023 Final
No ratings yet
04 - BE Mechanical Engineering Spec - Mining Major - 2023 Final
3 pages
Mining Eng. 14 - Renumbered
No ratings yet
Mining Eng. 14 - Renumbered
2 pages
Marketing
No ratings yet
Marketing
2 pages
ML Unit IV
No ratings yet
ML Unit IV
70 pages
ARC 1000 Fall 2019 Syllabus Baweja
No ratings yet
ARC 1000 Fall 2019 Syllabus Baweja
13 pages
Mining Engineering Major Info 2017
No ratings yet
Mining Engineering Major Info 2017
2 pages
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
No ratings yet
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
127 pages
f163b81 Herbal Medicine
No ratings yet
f163b81 Herbal Medicine
13 pages
Bachelor-Of-Architecture Syllabus 10620240327.061242
No ratings yet
Bachelor-Of-Architecture Syllabus 10620240327.061242
100 pages
Syl Sydney MKTB-UB9001 West Spring2015
No ratings yet
Syl Sydney MKTB-UB9001 West Spring2015
8 pages
Architecture
No ratings yet
Architecture
17 pages
CP Student Handbook - 2019 2020
No ratings yet
CP Student Handbook - 2019 2020
23 pages
Fall/Winter 2022/23 Course Syllabus: MOS 2320A Section - 003 Marketing
No ratings yet
Fall/Winter 2022/23 Course Syllabus: MOS 2320A Section - 003 Marketing
8 pages
2003-4 Architecture
No ratings yet
2003-4 Architecture
35 pages
Detailed High School COurse Guide
No ratings yet
Detailed High School COurse Guide
100 pages
DTS 101 Lecture 2
No ratings yet
DTS 101 Lecture 2
30 pages
Outline
No ratings yet
Outline
4 pages
Syllabus 328 Wild Edible Cult and Pois Herbs CARS Submit 1
No ratings yet
Syllabus 328 Wild Edible Cult and Pois Herbs CARS Submit 1
5 pages
Evidence-Based Herbal Medicine CSTF
No ratings yet
Evidence-Based Herbal Medicine CSTF
6 pages
Herbal Medicines and Pharmacy
No ratings yet
Herbal Medicines and Pharmacy
305 pages
IJSTRA-2021-0017 Herbal Medicine
No ratings yet
IJSTRA-2021-0017 Herbal Medicine
9 pages
Update Week 13 Machine Learning Supervised
No ratings yet
Update Week 13 Machine Learning Supervised
21 pages
ASMI-02-0326 Herbal Medicine
No ratings yet
ASMI-02-0326 Herbal Medicine
9 pages
B1e4 Herbal Medicine
No ratings yet
B1e4 Herbal Medicine
15 pages
Nvidia Ai + LLM Study Plan
No ratings yet
Nvidia Ai + LLM Study Plan
4 pages
Herbal Remedies - Bridging Traditional Knowledge With Modern Science
No ratings yet
Herbal Remedies - Bridging Traditional Knowledge With Modern Science
94 pages
Downloadfile Herbal Medicine
No ratings yet
Downloadfile Herbal Medicine
11 pages
Cognizant
No ratings yet
Cognizant
15 pages
3 - InnovatiCS - Introduction To CRISP-DM
No ratings yet
3 - InnovatiCS - Introduction To CRISP-DM
35 pages
Mastering Algorithms and Data Structures
From Everand
Mastering Algorithms and Data Structures
Manish Soni
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Slides 0411

Uploaded by

Slides 0411

Uploaded by

Introduction

Formalism and terminology

Data Science Lecture Notes 07

University of Massachusetts Dartmouth

Donghui Yan UMassD Lecture Notes

Donghui Yan UMassD Lecture Notes

Machine learning in real life

• Search engine design

Donghui Yan UMassD Lecture Notes

State of the art in AI

Donghui Yan UMassD Lecture Notes

What is machine learning (ML)?

Definition. Machine learning refers to application, methodology,

Donghui Yan UMassD Lecture Notes

Two important assumptions in Machine learning

Donghui Yan UMassD Lecture Notes

Problems in machine learning

Donghui Yan UMassD Lecture Notes

Problems in machine learning

Donghui Yan UMassD Lecture Notes

A little history about the evolving of machine learning

Connections to other subjects

• Aspects of machine learning

• What’s the connection of (nonparametric) statistics and ML?

Donghui Yan UMassD Lecture Notes

Predictive learning or classification

• Given data (X1 , Y1 ), ..., (Xn , Yn ), we wish to learn the

The future prediction is the best

e.g., smallest error rate (or precision/recall, AUC etc)

♠ The best possible decision rule (Bayes rule)

Donghui Yan UMassD Lecture Notes

The classification problem

Donghui Yan UMassD Lecture Notes

Donghui Yan UMassD Lecture Notes

Function class F = {f } determines the type of classifiers

Donghui Yan UMassD Lecture Notes

Evaluation of a machine learning algorithm

• Train on training data and evaluate on test data

Donghui Yan UMassD Lecture Notes

• Have separate training/test set

Donghui Yan UMassD Lecture Notes

Donghui Yan UMassD Lecture Notes

Selection of tuning parameters

• Treat the training set as the entire data

Donghui Yan UMassD Lecture Notes

Some popular performance metrics

Donghui Yan UMassD Lecture Notes

• The idea is to measure the amount of departure from that arises

Donghui Yan UMassD Lecture Notes

The confusion matrix

Label 1 ... j ... C Total

Donghui Yan UMassD Lecture Notes

Definition. The κ coefficient is calculated as

I Pexpected measures chance that observed and true labels agree

Donghui Yan UMassD Lecture Notes

I n1. = 171, n2. = 60, n.1 = 151, n.2 = 79, n = 231

Donghui Yan UMassD Lecture Notes

Area under curve

• ROC curve is a graphical plot of true positive rate (TPR) vs.

Donghui Yan UMassD Lecture Notes

Area under curve (AUC)

Donghui Yan UMassD Lecture Notes

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.