0% found this document useful (0 votes)

13 views35 pages

Machine Learning 04 - Bayes

The document discusses Bayesian classifiers based on decision theory, focusing on the computation of a-posteriori probabilities and the Bayes classification rule for both two and multiple classes. It explains the minimization of classification error and average risk through the use of loss matrices and discriminant functions. Additionally, it covers the application of Bayesian classifiers to normal distributions and decision hyperplanes, including examples of minimum distance classifiers and their calculations.

Uploaded by

233046

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views35 pages

Machine Learning 04 - Bayes

Uploaded by

233046

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 35

CLASSIFIERS BASED ON BAYES

DECISION THEORY
 Statistical nature of feature vectors

x  x1 , x2 ,..., xl 
T

 Assign the pattern represented by feature x

vector
to the most probable of the available classes
1 ,  2 ,...,  M

x   i : P ( i x )
That is
maximum
1
 Computation of a-posteriori probabilities
 Assume known
• a-priori probabilities
P (1 ), P (2 )..., P (M )
• p ( x i ), i 1,2...M

This is also known as the likelihood of

x w.r. to i .

2
 The Bayes rule (Μ=2)

p ( x) P (i x)  p ( x i ) P (i ) 
p ( x i ) P (i )
P (i x) 
p( x)

where 2
p ( x)  p ( x i ) P (i )
i 1

3
 The Bayes classification rule (for two classes M=2)
x
 Given classify it according to the rule

If P (1 x )  P (2 x ) x  1
If P (2 x )  P (1 x ) x  2
 Equivalently: classify according to the rule
x
p ( x 1 ) P (1 )( ) p( x 2 ) P (2 )
 For equiprobable classes the test becomes

p ( x 1 )() P ( x 2 )
4
R1 ( 1 ) and R2 ( 2 )
5
 Equivalently in words: Divide space in two
regions
If x  R1  x in 1
If x  R2  x in 2

 Probability of error for equiprobable classes

 Total shaded
x0 area 
1 1
Pe  p ( x 2 )dx  p ( x 1 )dx
 2  2 x0

 Bayesian classifier is OPTIMAL with respect to

6
minimising the classification error
 Indeed: Moving the threshold the total shaded
area INCREASES by the extra “grey” area.
7
 The Bayes classification rule for many (M>2)
classes:x i
 Given classify it to if:
P(i x)  P ( j x) j i

 Such a choice also minimizes the classification

error probability

 Minimizing the average risk

 For each wrong decision, a penalty term is assigned
since some decisions are more sensitive than others

8
 For M=2
• Define the loss matrix
11 12
L ( )
21 22

• 12 2
penalty term for deciding class ,
although the pattern belongs to1 , etc.

1
 Risk with respect to

r1 11 p( x 1 )d x  12 p( x 1 )d x

R1 R2

9
2
 Risk with respect to

r2 21 p ( x 2 )d x  22 p ( x 2 )d x
R1 R2


 Probabilities of wrong
decisions, weighted by the
penalty terms
 Average risk

r r1 P (1 )  r2 P ( 2 )
10
 ChooseR1 andR2 so that r is minimized

 Then assign x to1 if

1 11 p( x 1 ) P(1 )  21 p( x 2 ) P(2 )
2 12 p( x 1 ) P(1 )  22 p( x 2 ) P(2 )
 Equivalently:
assign x in 1 ( if2 )
p ( x 1 ) P (2 ) 21   22
12   ()
p( x 2 ) P (1 ) 12   11

12 : likelihood ratio

11
 If 1
P (1 ) P (2 )  and 11 22 0
2
21
x  1 if P ( x 1 )  P ( x 2 )
12
12
x  2 if P ( x 2 )  P ( x 1 )
21
if 21 12  Minimum classification
error probability

12
 An example:
1
 p ( x 1 )  exp( x )2


1
 p( x 2 )  exp( ( x  1) 2 )

1
 P (1 ) P (2 ) 
2
 0 0 .5 
 L  
 1.0 0 

13
 Then the threshold value is:
x0 for minimum Pe :
2 2
x0 : exp( x ) exp( ( x  1) ) 
1
x0 
2
 Thresholdx̂0for minimum r
2 2
xˆ0 : exp( x ) 2 exp( ( x  1) ) 
(1  n 2) 1
xˆ0  
2 2
14
1
Thus x̂0 moves to the left of  x0
(WHY?) 2

15
DISCRIMINANT FUNCTIONS
DECISION SURFACES
 If Ri , R j g ( x) P(i
are contiguous: x)  P( j x) 0
Ri : P (i x)  P ( j x)
+
- g ( x) 0
R j : P ( j x)  P (i x)

is the surface separating the regions. On one

side is positive (+), on the other is negative
(-). It is known as Decision Surface

16
 If f(.) monotonic, the rule remains the same if we
use:
x  i if : f ( P(i x))  f ( P( j x)) i  j

g i ( x)  f ( P(i x))
 is a discriminant
function

 In general, discriminant functions can be defined

independent of the Bayesian rule. They lead to
suboptimal solutions, yet if chosen appropriately,
can be computationally more tractable.
17
BAYESIAN CLASSIFIER FOR NORMAL
DISTRIBUTIONS

 Multivariate Gaussian pdf

1  1 
p ( x i )  
exp  ( x   i )   i 1 ( x   i ) 
1
 2 
(2 )  i
2 2

 i E  x   matrix in i


 i E ( x   i )( x   i )  
called auto covariance matrix

18
 ln()is monotonic. Define:


g i ( x) ln( p ( x i ) P (i )) 
ln p ( x  i )  ln P (i )

1 T 1
g i ( x)  ( x   i )  i ( x   i )  ln P (i )  Ci
2
 1
Ci  ( ) ln 2  ( ) ln  i
 Example: 2 2

 2 0 
 i  
2
 0   19
1 1
 g i ( x)  2
1
2
(x  x ) 
2 ( i1 x1  i 2 x2 )
2 2
 2

1
 ( i21  i22 )  ln( Pi )  Ci
2 2

Thatg iis,
(x) is quadratic and the
surfaces g i ( x)  g j ( x) 0

quadrics, ellipsoids, parabolas,

hyperbolas,
pairs of lines.
For example:

20
 Decision Hyperplanes

T 1
 Quadratic terms:x  x
i

If ALLΣ i  Σ (the same) the

quadratic terms are not of interest.
They are not involved in comparisons.
Then, equivalently, we can write:
T
g i ( x) w x  wio
i

wi  Σ  1  i
1 Τ 1
wi 0 ln P (i )   i Σ  i
2
Discriminant functions are LINEAR 21
 Let in addition:
• Σ  2 I . Then
1 T
g i ( x)   i x  wi 0
 2

• g ij ( x)  g i ( x)  g j ( x) 0
T
w ( x  x o )
• w  i   j ,

1 P (i )  i   j
• x o  (  i   j )   ln
2

2 P ( j )    2
i j

22
 Nondiagonal:   2 
T
• g ij ( x) w ( x  x 0 ) 0
1
• w  (  i   j )
1 P (i ) i   j
• x 0  (  i   j )  n ( )
2 P ( j )    2
i j  1
1
T
x  1
( x   1 x) 2

not normal to  i   j
 Decision hyperplane
normal to   1 (  i   j )
23
 Minimum Distance Classifiers

 equiprobable
1
P (i ) 
 M
1
 g i ( x)  ( x   i )T   1 ( x   i )
2
Euclidean Distance:

  2 I : Assign
smaller x  i :

dE  x   i
Mahalanobis Distance:
smaller

  2 I : Assign x  i :
1
T 1
d m (( x   i )  ( x   i )) 2

24
25
 Example:
Given 1 , 2 : P (1 ) P (2 ) and p ( x 1 )  N (  1 , Σ ),
 0  3  1.1 0.3
p ( x 2 )  N (  2 , Σ ),  1   ,  2   ,   
0
  3
   0 . 3 1 .9 
 1.0 
classify the vector x   using Bayesian classification :
 2. 2 
-1  0.95  0.15
 Σ  
  0.15 0. 55 
 Compute Mahalanobis d m from 1 ,  2 : d 2 m ,1 1.0, 2.2
 1.0   1   2 .0 
Σ   2.952, d m , 2  2.0,  0.8  
1 2
 3.672
 2 .2    0 .8 

 Classify x  1. Observe that d E ,2  d E ,1

26
BAYESIAN NETWORKS
 Bayes Probability Chain Rule

p( x1 , x2 ,..., x )  p( x | x 1 ,..., x1 ) p( x 1 | x 2 ,..., x1 ) ...

... p( x2 | x1 ) p( x1 )
 Assume now that the conditional dependence
for each xi is limited to a subset of the features
appearing in each of the product terms. That is:

p( x1 , x2 ,..., x )  p( x1 )  p( xi | Ai )
i 2
where
Ai  xi  1 , xi  2 ,..., x1

27
 For example, if ℓ=6, then we could assume:
p( x6 | x5 ,..., x1 )  p( x6 | x5 , x4 )
Then:
A6 x5 , x4   x5 ,..., x1

 The above is a generalization of the Naïve –

Bayes. For the Naïve – Bayes the assumption
is:
Ai = Ø, for i=1, 2, …, ℓ

28
 A graphical way to portray conditional
dependencies is given below
 According to this figure
we have that:
• x6 is conditionally
dependent on x4, x5.
• x5 on x4
• x4 on x1, x2
• x3 on x2
• x1, x2 are conditionally
independent on other
variables.

 For this case:

p( x1 , x2 ,..., x6 )  p( x6 | x5 , x4 ) p( x5 | x4 ) p( x3 | x2 ) p( x2 ) p( x1 )
29
 Bayesian Networks

 Definition: A Bayesian Network is a directed

acyclic graph (DAG) where the nodes
correspond to random variables. Each node is
associated with a set of conditional
probabilities (densities), p(xi|Ai), where xi is the
variable associated with the node and Ai is the
set of its parents in the graph.

 A Bayesian Network is specified by:

• The marginal probabilities of its root nodes.
• The conditional probabilities of the non-root
nodes, given their parents, for ALL possible
combinations.
30
 The figure below is an example of a Bayesian
Network corresponding to a paradigm from
the medical applications field.
 This Bayesian network
models conditional
dependencies for an
example concerning smokers
(S), tendencies to develop
cancer (C) and heart disease
(H), together with variables
corresponding to heart (H1,
H2) and cancer (C1, C2)
medical tests.

31
 Once a DAG has been constructed, the joint
probability can be obtained by multiplying the
marginal (root nodes) and the conditional
(non-root nodes) probabilities.

 Training: Once a topology is given,

probabilities are estimated via the training
data set. There are also methods that learn
the topology.

 Probability Inference: This is the most common

task that Bayesian networks help us to solve
efficiently. Given the values of some of the
variables in the graph, known as evidence, the
goal is to compute the conditional probabilities
for some of the other variables, given the
evidence.
32
 Example: Consider the Bayesian network of
the figure:

a) If x is measured to be x=1 (x1), compute

P(w=0|x=1) [P(w0|x1)].

b) If w is measured to be w=1 (w1) compute

P(x=0|w=1) [ P(x0|w1)].

33
 For a), a set of calculations are required that
propagate from node x to node w. It turns out that
P(w0|x1) = 0.63.

 For b), the propagation is reversed in direction. It

turns out that P(x0|w1) = 0.4.

 In general, the required inference information is

computed via a combined process of “message
passing” among the nodes of the DAG.

 Complexity:
 For singly connected graphs, message passing
algorithms amount to a complexity linear in the
number of nodes.
34
Naïve BAYESIAN

p( x1 , x2 ,..., x )  p( x1 ) p( x2 )... p( x )

Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
CM760 E-Brochure Hemobascula
No ratings yet
CM760 E-Brochure Hemobascula
6 pages
Law and Economics Anthology - Kenneth G. Dau-Schmidt y Thomas S. Ulen
No ratings yet
Law and Economics Anthology - Kenneth G. Dau-Schmidt y Thomas S. Ulen
64 pages
Solutions To Engineering Mechanics "RESULTANT OF ANY FORCE SYSTEM" 3rd Edition by Ferdinand Singer
57% (14)
Solutions To Engineering Mechanics "RESULTANT OF ANY FORCE SYSTEM" 3rd Edition by Ferdinand Singer
16 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
12 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
80 pages
Bayes
No ratings yet
Bayes
10 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Bayesian Theory
No ratings yet
Bayesian Theory
66 pages
Statistical Perspective
No ratings yet
Statistical Perspective
85 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
44 pages
Theory For Classification and Linear Models (I)
No ratings yet
Theory For Classification and Linear Models (I)
32 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
PR January20 04 PDF
No ratings yet
PR January20 04 PDF
40 pages
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
No ratings yet
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
31 pages
Lecturer4 - Bayesian Decision Theory
No ratings yet
Lecturer4 - Bayesian Decision Theory
40 pages
04 Bayes Classification Rule
No ratings yet
04 Bayes Classification Rule
46 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
Kuliah 3 Teori Keputusan Bayes Bag 2
No ratings yet
Kuliah 3 Teori Keputusan Bayes Bag 2
30 pages
Lecture 6 - Generative Models
No ratings yet
Lecture 6 - Generative Models
33 pages
PR January20 03 PDF
No ratings yet
PR January20 03 PDF
74 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
03 Classification Methods
No ratings yet
03 Classification Methods
37 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
L23 Bayesian Naive
No ratings yet
L23 Bayesian Naive
18 pages
Bayesian
No ratings yet
Bayesian
21 pages
Bayesian Decision Theory: Prof. Richard Zanibbi
No ratings yet
Bayesian Decision Theory: Prof. Richard Zanibbi
47 pages
Sample Questions Pattern Recognition
No ratings yet
Sample Questions Pattern Recognition
8 pages
4.1 Bayes Decision Theory
No ratings yet
4.1 Bayes Decision Theory
23 pages
03 Bayes Nearest Neighbors
No ratings yet
03 Bayes Nearest Neighbors
34 pages
CS3491-AI ML-Chapter 3
No ratings yet
CS3491-AI ML-Chapter 3
23 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
Chapter 07
No ratings yet
Chapter 07
68 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
UNIT-III The Normal Density
No ratings yet
UNIT-III The Normal Density
29 pages
Note 1518944988
No ratings yet
Note 1518944988
27 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
2 Naive Bayes
No ratings yet
2 Naive Bayes
49 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Problem Sheet 1 Answers
No ratings yet
Problem Sheet 1 Answers
4 pages
Bayes Classifier Compressed
No ratings yet
Bayes Classifier Compressed
42 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
Lecture 2 3
No ratings yet
Lecture 2 3
72 pages
Lecture 2 Part 1: Statistical Analysis (Bayesian Decision Theory, Probability Theory)
No ratings yet
Lecture 2 Part 1: Statistical Analysis (Bayesian Decision Theory, Probability Theory)
22 pages
Lecture 11
No ratings yet
Lecture 11
49 pages
Lec 6
No ratings yet
Lec 6
14 pages
pr2 Bayes
No ratings yet
pr2 Bayes
44 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
HR'S Strategic Partnership With Line Management: Organizational Behaviour Final Report
No ratings yet
HR'S Strategic Partnership With Line Management: Organizational Behaviour Final Report
5 pages
Homework 2
No ratings yet
Homework 2
2 pages
20~47液晶屏规格书下载
100% (3)
20~47液晶屏规格书下载
6 pages
Pugel Chapter4 PDF
0% (1)
Pugel Chapter4 PDF
4 pages
For and Against Essay Writing Creative Writing Tasks 86038
No ratings yet
For and Against Essay Writing Creative Writing Tasks 86038
2 pages
Local Literature 1
No ratings yet
Local Literature 1
5 pages
Learn With Book
No ratings yet
Learn With Book
2 pages
Talent Management
100% (3)
Talent Management
33 pages
List of Star Pattern and Arrays Programming Exercises
No ratings yet
List of Star Pattern and Arrays Programming Exercises
26 pages
Freehold Regional School District Progress Report
No ratings yet
Freehold Regional School District Progress Report
1 page
JHi 5 SJ 8 Oa FQTSMJ W8 L RFM 2 JJ 9 P MK VK7 I LNMM Ko Hy 7 G NC
No ratings yet
JHi 5 SJ 8 Oa FQTSMJ W8 L RFM 2 JJ 9 P MK VK7 I LNMM Ko Hy 7 G NC
1 page
Atmel - ATTINY PDF
No ratings yet
Atmel - ATTINY PDF
497 pages
The Relationship Between Performance Appraisal and Job Performance
No ratings yet
The Relationship Between Performance Appraisal and Job Performance
12 pages
ANPR Installation Manual
No ratings yet
ANPR Installation Manual
2 pages
Musheera I Patel: Tybms - D (HR) 7490
No ratings yet
Musheera I Patel: Tybms - D (HR) 7490
13 pages
Hidrogeologi
No ratings yet
Hidrogeologi
393 pages
Physics
No ratings yet
Physics
5 pages
NASSCOM HR Summit Presentation - Final Event
No ratings yet
NASSCOM HR Summit Presentation - Final Event
16 pages
RCD3601
No ratings yet
RCD3601
20 pages
Errata For Instructor's Solutions Manual For Gravity, An Introduction To Einstein's General Relativity
No ratings yet
Errata For Instructor's Solutions Manual For Gravity, An Introduction To Einstein's General Relativity
11 pages
2013 Gas-Lift Form - Abstracts and Notes
No ratings yet
2013 Gas-Lift Form - Abstracts and Notes
23 pages
Iso Iec 29146-2016
No ratings yet
Iso Iec 29146-2016
42 pages
Micromax 1
No ratings yet
Micromax 1
2 pages
CAP 3 Agronomic and Statistical Evaluation of Fertilizer Response 1985
No ratings yet
CAP 3 Agronomic and Statistical Evaluation of Fertilizer Response 1985
38 pages
Slavoj Zizek - The Fetish of The Party
No ratings yet
Slavoj Zizek - The Fetish of The Party
13 pages
Wi-Vi Technology PDF
No ratings yet
Wi-Vi Technology PDF
11 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Machine Learning 04 - Bayes

Uploaded by

Machine Learning 04 - Bayes

Uploaded by

CLASSIFIERS BASED ON BAYES

 Assign the pattern represented by feature x

This is also known as the likelihood of

 Probability of error for equiprobable classes

 Bayesian classifier is OPTIMAL with respect to

 Such a choice also minimizes the classification

 Minimizing the average risk

r1 11 p( x 1 )d x  12 p( x 1 )d x

 Then assign x to1 if

12 : likelihood ratio

is the surface separating the regions. On one

 In general, discriminant functions can be defined

 Multivariate Gaussian pdf

quadrics, ellipsoids, parabolas,

If ALLΣ i  Σ (the same) the

 Classify x  1. Observe that d E ,2  d E ,1

p( x1 , x2 ,..., x )  p( x | x 1 ,..., x1 ) p( x 1 | x 2 ,..., x1 ) ...

 The above is a generalization of the Naïve –

 For this case:

 Definition: A Bayesian Network is a directed

 A Bayesian Network is specified by:

 Training: Once a topology is given,

 Probability Inference: This is the most common

a) If x is measured to be x=1 (x1), compute

b) If w is measured to be w=1 (w1) compute

 For b), the propagation is reversed in direction. It

 In general, the required inference information is

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.