0% found this document useful (0 votes)

4 views64 pages

ml-1

The document provides an introduction to machine learning concepts, focusing on polynomial curve fitting, probability theory, and decision theory. It discusses various polynomial orders, regularization techniques, and the implications of overfitting, as well as key probability rules and Bayes' theorem. Additionally, it covers parameter estimation methods, cross-validation, and concepts in information theory such as entropy and mutual information.

Uploaded by

wj9hn5fc5c

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views64 pages

ml-1

Uploaded by

wj9hn5fc5c

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

1

Machine Learning
Chapter 1: Introduction

孫民
清華大學
Credit : 林嘉文 (Chia-Wen Lin)
2/22/25
3 Polynomial Curve Fitting

Data Set Size:

𝑁 = 10

2/22/25
4 Sum-of-Squares Error Function

2/22/25
5 0th Order Polynomial

2/22/25
6 1st Order Polynomial

2/22/25
7 3rd Order Polynomial

2/22/25
8 9th Order Polynomial

2/22/25
9 Over-fitting

Root-Mean-Square (RMS) Error:

2/22/25
10 Polynomial Coefficients

2/22/25
11 Data Set Size: 𝑁 = 15
9th Order Polynomial

2/22/25
12 Data Set Size: 𝑁 = 100
9th Order Polynomial

2/22/25
13 Regularization

´ Penalize large coefficient values

2/22/25
14 Regularization: ln 𝜆 = −18

2/22/25
15 Regularization: ln 𝜆 = 0

2/22/25
16 Regularization: 𝐸!"# vs. ln 𝜆

2/22/25
17 Polynomial Coefficients

2/22/25
18 Probability Theory
Apples and Oranges

𝐵 𝑜𝑥 𝑖𝑠 𝑏 𝑙𝑢𝑒 𝑜𝑟 𝑟 𝑒𝑑

(F)Ruit is (a)pple or (o)range

2/22/25
19 Probability Theory – two random variables

´Marginal Probability

´Conditional Probability

Joint Probability

2/22/25
20 Probability Theory

´Sum Rule

Product Rule

2/22/25
21 The Rules of Probability

´ Sum Rule

´ Product Rule

2/22/25
22 Bayes’ Theorem
𝑃 𝑋, 𝑌 = 𝑃 𝑌 𝑋 𝑃 𝑋 Product Rule
Since P(X,Y) = P(Y,X), and
𝑃 𝑌, 𝑋 = 𝑃 𝑋 𝑌 𝑃 𝑌
Hence, 𝑃 𝑌 𝑋 𝑃 𝑋 = 𝑃 𝑋 𝑌 𝑃 𝑌

Posterior Likelihood Prior

Evidence

Posterior µ Likelihood × Prior 2/22/25

23 Probability Theory
Apples and Oranges

4
𝑝 𝐵=𝑟 = = 2/5 𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑝𝑟𝑜𝑏 𝑜𝑓 𝑝𝑖𝑐𝑘𝑖𝑛𝑔 𝑎𝑛 𝑎𝑝𝑝𝑙𝑒?
10
p(B = b) = 6/10=3/5 p(F=a)?
p(F=a|B=r) = 2/8 = ¼ Use Sum Rule:
P(F=a|B=b) = 3/4 p(F=a) = p(F=a,B=r)+p(F=a,B=b)
Use Product Rule:
p(F=a,B=r) = p(F=a|B=r)p(B=r)
p(F=a,B=b) = p(F=a|B=b)p(B=b)
Hence,
p(F=a) = p(F=a|B=r)p(B=r)+ p(F=a|B=b)p(B=b)
= ¼*2/5+3/4*3/5 = 2/20+9/20=11/20
p(F=o) = 1- p(F=a) = 9/20 2/22/25
24 Probability Theory
Apples and Oranges

2/22/25
25 Probability Densities (continuous variable)
Probability density Cumulative distribution function

2/22/25
26 Transformed Densities

x = g(y)
dx/dy = d g(y)/ dy = g’(y)

2/22/25
27 Expectations

Approximate Expectation
(discrete and continuous)

Conditional Expectation
(discrete)
2/22/25
28 Variances and Covariances

𝑊ℎ𝑒𝑛 𝑥, 𝑦 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡

E(xy) = E(x)E(y). Hence,
Cov[x,y] = 0

2/22/25
29 The Gaussian Distribution

2/22/25
30 Gaussian Mean and Variance

2/22/25
31 The Multivariate Gaussian
Σ 𝑖𝑠 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑚𝑎𝑡𝑟𝑖𝑥. 𝐷𝑖𝑎𝑔𝑛𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑎𝑟𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑠 𝜎

2/22/25
32 Gaussian Parameter Estimation

Likelihood function

𝐱 = 𝑥! , 𝑥" , ⋯ , 𝑥# If x is i.i.d.

2/22/25
33 Two Principles for Estimating Parameters

´Maximum likelihood estimation (ML)

Choose 𝛉 that maximizes the probability (likelihood)
of observed data
.!" = argmax 𝑃(𝐷|𝛉)
𝛉
𝛉

´Maximum a posteriori estimation (MAP)

Choose 𝛉 that is most probable given prior
probability and data
𝑃 𝐷𝛉 𝑃 𝛉
.!$%
𝛉 = argmax 𝑃 𝛉 𝐷 = argmax
𝛉 𝛉 𝑃(𝐷)
2/22/25
34 Maximum (Log) Likelihood
𝐱 = 𝑥! , 𝑥" , ⋯ , 𝑥# , 𝐱 is i.i.d.
𝛉$% = argmax 𝑝 𝐱 𝛉 ?
𝛉

(log-likelihood)

𝛉$% = argmax ln 𝑝 𝐱 𝛉
𝛉

(sample mean) (sample variance)

2/22/25
%
35 Properties of 𝜇"$ and 𝜎"$

(unbiased)

(biased)

2/22/25

(unbiased)
36 Curve Fitting Re-visited

𝛽: inverse variance (precision)

2/22/25
37 Maximum Likelihood

Determine 𝐰$% by minimizing sum-of-squares error, 𝐸 𝐰 .

1 #
"
𝐰$% = arg min 5 𝑦 𝑥( , 𝐰 − 𝑡(
𝐰 2 ()!

2/22/25
ML Curve Fitting
38

Green: Actual model 2/22/25

Red: Predicted model

39 MAP: A Step towards Bayes

Posterior Likelihood Prior

Determine 𝐰$*+ by minimizing regularized sum-of-squares error, 𝐸9 𝐰 .

Eq. (1.4) 2/22/25

40 Bayesian Curve Fitting

Predictive
Distribution
W for both mu
And beta. 𝑝 𝑡 𝑥, 𝐰, 𝛽 = 𝒩 𝑡 𝑦 𝑥, 𝐰 , 𝛽 ,!

(Refer to Sec. 3.3 for detailed derivation)

2/22/25
ML Curve Fitting:
41 Bayesian Predictive Distribution

2/22/25
ML Curve Fitting Bayesian Curve Fitting
42 Cross Validation for Model Selection

´5-fold cross-validation -> split the training

data into 5 equal folds
´4 of them for training and 1 for validation

2/22/25
43 Cross Validation

2/22/25
44 Curse of Dimensionality

2/22/25
Curse of Dimensionality
45

Polynomial curve fitting, M = 3

𝐷R # 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠

𝑉𝑜𝑙𝑢𝑚𝑛 𝑜𝑓 𝑆𝑝ℎ𝑒𝑟𝑒 𝑤𝑖𝑡ℎ 𝑟𝑎𝑑𝑖𝑢𝑠 𝑟

VD 𝑟 = 𝐾𝐷rD
[VD(1)-VD(1-∈)]/VD(1) = 1-(1- ∈)D

2/22/25
46 Decision Theory

Given (x,t), predict t give new x.

´ Inference step
´ Determine either or .

´ Decision step
´ For given x, determine optimal t or decision/action based
on t.

2/22/25
47 Minimum Misclassification Rate

Assuming t as C1 or C2 class

Change x_hat to
X0, blue/green fixed,
but red reduced.

2/22/25

Red/Green Blue
48 Minimum Expected Loss
´ Example: classify medical images as ‘cancer’
or ‘normal’
Decision

Truth

When a cancer patient is classified as normal -> 1000 loss

2/22/25
49 Minimum Expected Loss

True class k, but

Assign class j

Regions are chosen to minimize

Eliminate common factor p(x)

2/22/25
50 Reject Option – avoid making decision

2/22/25
51 Generative vs Discriminative

´ Generative approach:
Model
Use Bayes’ theorem

´ Discriminative approach:
Model directly

2/22/25
52 Why Separate Inference and Decision?

• Minimizing risk (loss matrix may change over time)

• Reject option
• Unbalanced class priors
• Combining models

2/22/25
53 Decision Theory for Regression

´ Inference step
´ Determine 𝑝 𝐱, 𝑡 .

´ Decision step
´ For given x, make optimal prediction, y(x), for t.

´ Loss function:

2/22/25
54 The Squared Loss Function

𝔼 𝐿 is minimized when

2/22/25
55
Information Theory

2/22/25
56 Entropy

h(x) is a monotonic function of p(x),

and expresses the information content (>=0).

ℎ 𝑥 = −𝑙𝑜𝑔2 𝑝(𝑥)

If x,y independent, p(x,y) = p(x) p(y),

h(x,y) = -log2p(x) -log2p(y) = h(x)+h(y)

H(x) is the expectation of h(x)

2/22/25
57 Entropy

Important quantity in
• coding theory
• statistical physics
• machine learning

2/22/25
58 Entropy

2/22/25
59 Entropy - coding theory

´ Coding theory: x discrete with 8 possible

states; how many bits to transmit the
state of x?
´ All states equally likely

Code: 000, 001, 010, 011, 100, 101, 110, 111 2/22/25
60 Entropy

2/22/25
61 Entropy - statistical physics
In how many ways can N identical
objects be allocated M bins?
Note that ni balls in ith bin.
# ways to allocate (multiplicity)

pi is the prob that ball assigned to ith bin.

Entropy maximized when
2/22/25
64 Differential Entropy – continuous x
Put bins of width ¢ along the real line

Differential entropy maximized (for fixed

& 𝜇) when

in which case (only related to 𝜎)

2/22/25
65 Conditional Entropy

ℎ 𝑦|𝑥 = −𝑙𝑜𝑔2 𝑝(𝑦|𝑥)

H[x]

2/22/25
66 The Kullback-Leibler Divergence (Relative Entropy)
Unknow p(x) modeled by q(x). Additional info required

2/22/25
67 Mutual Information
ℎ 𝑥 = −𝑙𝑜𝑔2 𝑝(𝑥)

If x,y independent, p(x,y) = p(x) p(y),

h(x,y) = -log2p(x) -log2p(y) = h(x)+h(y)

If x, y not independent

2/22/25

Statistical Foundations of Machine Learning: The Handbook
No ratings yet
Statistical Foundations of Machine Learning: The Handbook
364 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Git
100% (7)
Git
302 pages
MLE and MAP Classifier
No ratings yet
MLE and MAP Classifier
55 pages
04 Bayes Classification Rule
No ratings yet
04 Bayes Classification Rule
46 pages
DistributionLearn (Shai)
No ratings yet
DistributionLearn (Shai)
47 pages
(Dark) Overlord, Vol. 7 The Invaders of The Great Tomb
No ratings yet
(Dark) Overlord, Vol. 7 The Invaders of The Great Tomb
280 pages
EIE4105 Multimodal Human Computer Interaction Technology: Fundamental of Statistical Learning
No ratings yet
EIE4105 Multimodal Human Computer Interaction Technology: Fundamental of Statistical Learning
31 pages
Formulario Ep Probability and Statistics
No ratings yet
Formulario Ep Probability and Statistics
28 pages
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
Lecture Notes Week 2
No ratings yet
Lecture Notes Week 2
10 pages
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
No ratings yet
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
18 pages
Introduction ML
No ratings yet
Introduction ML
65 pages
Bayesian
No ratings yet
Bayesian
91 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
Statistics BI: Models of Random Outcomes. What Is A Model?
No ratings yet
Statistics BI: Models of Random Outcomes. What Is A Model?
22 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
7 pages
MLP-RL1
No ratings yet
MLP-RL1
6 pages
Suggestions From God - Salah Tul Istikhara
No ratings yet
Suggestions From God - Salah Tul Istikhara
3 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
Chap1 Bishop
No ratings yet
Chap1 Bishop
35 pages
EDAN96_2024_Last_lecture-1
No ratings yet
EDAN96_2024_Last_lecture-1
78 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Handbook Feb23
No ratings yet
Handbook Feb23
377 pages
Lecture # 2-1 Probabilistic Models
No ratings yet
Lecture # 2-1 Probabilistic Models
40 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
CS 501: Software Engineering Fall 2000
No ratings yet
CS 501: Software Engineering Fall 2000
22 pages
Lecture_Notes_MAI
No ratings yet
Lecture_Notes_MAI
114 pages
Module 2 Notes Bcs602
No ratings yet
Module 2 Notes Bcs602
19 pages
Lec1_mathreview
No ratings yet
Lec1_mathreview
61 pages
David J.A. Clines, Deconstructing Job
No ratings yet
David J.A. Clines, Deconstructing Job
16 pages
Turn Taking
No ratings yet
Turn Taking
31 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Toc 1
No ratings yet
Toc 1
17 pages
Mathematical Simulation of The Paths of Cricket Bowling
No ratings yet
Mathematical Simulation of The Paths of Cricket Bowling
26 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
Bondaries Between Sex and Revenge Porn
No ratings yet
Bondaries Between Sex and Revenge Porn
23 pages
Sam Roweis Probx
No ratings yet
Sam Roweis Probx
12 pages
Lecture #1 CSC-1101
No ratings yet
Lecture #1 CSC-1101
36 pages
Cs Ai Lecture Notes 02
No ratings yet
Cs Ai Lecture Notes 02
103 pages
FOR DOWNLOAD
No ratings yet
FOR DOWNLOAD
23 pages
Culture and Development
No ratings yet
Culture and Development
30 pages
ML_Lec 2- Review of probability and statistics
No ratings yet
ML_Lec 2- Review of probability and statistics
30 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
ml-20230316-1
No ratings yet
ml-20230316-1
9 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
Quantum Mechanics - 5
No ratings yet
Quantum Mechanics - 5
20 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
BMJ Volume 339 Issue Nov11 1 2009 (Doi 10.1136/bmj.b4418) Goyder, C. McPherson, A. Glasziou, P. - Self Diagnosis PDF
No ratings yet
BMJ Volume 339 Issue Nov11 1 2009 (Doi 10.1136/bmj.b4418) Goyder, C. McPherson, A. Glasziou, P. - Self Diagnosis PDF
9 pages
Stats Cheat Sheet
No ratings yet
Stats Cheat Sheet
28 pages
APA Guide
100% (1)
APA Guide
47 pages
Stair Climbing Trolley
No ratings yet
Stair Climbing Trolley
31 pages
How To Design A Logo of Letters
100% (17)
How To Design A Logo of Letters
10 pages
Unit1_topic123_FRQ
No ratings yet
Unit1_topic123_FRQ
15 pages
MS Theory Exam Study Guide
No ratings yet
MS Theory Exam Study Guide
50 pages
Probability and Statistics Cookbook
No ratings yet
Probability and Statistics Cookbook
28 pages
High Availability: Administration Guide
No ratings yet
High Availability: Administration Guide
59 pages
G9 English Lesson Exemplar 1st Quarter
No ratings yet
G9 English Lesson Exemplar 1st Quarter
87 pages
Determinants Lecture
No ratings yet
Determinants Lecture
11 pages
IGDC0613 - IIM Udaipur 3
No ratings yet
IGDC0613 - IIM Udaipur 3
9 pages
Probability and Statistics Cheat Sheet
100% (2)
Probability and Statistics Cheat Sheet
28 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Voyager Air User Manual English
No ratings yet
Voyager Air User Manual English
13 pages
Probability and Statistics - Cookbook
No ratings yet
Probability and Statistics - Cookbook
28 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
28 pages
6.ABC Analysis
No ratings yet
6.ABC Analysis
32 pages
A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
Tài liệu buổi 2
No ratings yet
Tài liệu buổi 2
15 pages
Iroquois Confederacy Unit Overview
No ratings yet
Iroquois Confederacy Unit Overview
10 pages
English For Journ Certificate PDF
100% (1)
English For Journ Certificate PDF
1 page
Scratch Resistance Hybrid Sol-Gel Silane Coating
No ratings yet
Scratch Resistance Hybrid Sol-Gel Silane Coating
1 page
05-10-0103 Layshaft brake, Gearbox
No ratings yet
05-10-0103 Layshaft brake, Gearbox
2 pages
F SSE-X3548S
No ratings yet
F SSE-X3548S
1 page
M272 Engine Part 2
100% (4)
M272 Engine Part 2
42 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Logical progression of twelve double binary tables of physical-mathematical elements correlated with scientific-philosophical as well as metaphysical key concepts evidencing the dually four-dimensional basic structure of the universe
From Everand
Logical progression of twelve double binary tables of physical-mathematical elements correlated with scientific-philosophical as well as metaphysical key concepts evidencing the dually four-dimensional basic structure of the universe
Federico Tambara
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Speed Mathamatics
From Everand
Speed Mathamatics
Naila Hina
1/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ml-1

Uploaded by

ml-1

Uploaded by

1

Data Set Size:

Root-Mean-Square (RMS) Error:

´ Penalize large coefficient values

(F)Ruit is (a)pple or (o)range

Posterior Likelihood Prior

Posterior µ Likelihood × Prior 2/22/25

𝑊ℎ𝑒𝑛 𝑥, 𝑦 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡

´Maximum likelihood estimation (ML)

´Maximum a posteriori estimation (MAP)

(sample mean) (sample variance)

𝛽: inverse variance (precision)

Determine 𝐰$% by minimizing sum-of-squares error, 𝐸 𝐰 .

Green: Actual model 2/22/25

Red: Predicted model

Posterior Likelihood Prior

Determine 𝐰$*+ by minimizing regularized sum-of-squares error, 𝐸9 𝐰 .

Eq. (1.4) 2/22/25

(Refer to Sec. 3.3 for detailed derivation)

´5-fold cross-validation -> split the training

Polynomial curve fitting, M = 3

𝑉𝑜𝑙𝑢𝑚𝑛 𝑜𝑓 𝑆𝑝ℎ𝑒𝑟𝑒 𝑤𝑖𝑡ℎ 𝑟𝑎𝑑𝑖𝑢𝑠 𝑟

Given (x,t), predict t give new x.

When a cancer patient is classified as normal -> 1000 loss

True class k, but

Regions are chosen to minimize

Eliminate common factor p(x)

• Minimizing risk (loss matrix may change over time)

h(x) is a monotonic function of p(x),

If x,y independent, p(x,y) = p(x) p(y),

H(x) is the expectation of h(x)

´ Coding theory: x discrete with 8 possible

pi is the prob that ball assigned to ith bin.

Differential entropy maximized (for fixed

in which case (only related to 𝜎)

ℎ 𝑦|𝑥 = −𝑙𝑜𝑔2 𝑝(𝑦|𝑥)

If x,y independent, p(x,y) = p(x) p(y),

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.