0% found this document useful (0 votes)
75 views20 pages

HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing

moderately and fine understandable text on on its specificity. the rest of the chapter is widely captured
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views20 pages

HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing

moderately and fine understandable text on on its specificity. the rest of the chapter is widely captured
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

MIT OpenCourseWare

http://ocw.mit.edu

HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing


Spring 2007

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

6.555/HST 582

Apr-07

Harvard-MIT Division of Health Sciences and Technology


HST.582J: Biomedical Signal and Image Processing, Spring 2007
Course Director: Dr. Julie Greenberg

Automated Decision Making Systems

Probability, Classification, Model Estimation

Information and Statistics


One the use of statistics:
There are three kind of lies: lies, damned lies, and statistics
- Benjamin Disraeli (popularized by Mark Twain)

On the value of information:


And when we were finished renovating our house, we had only $24.00
left in the bank only because the plumber didnt know about it.
- Mark Twain (from a speech paraphrasing one of his books)

April 07

HST 582

John W. Fisher III, 2002-2006

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

6.555/HST 582

Apr-07

Elements of Decision Making Systems


1. Probability
A quantitative way of modeling uncertainty.
2. Statistical Classification

application of probability models to inference.


incorporates a notion of optimality

3. Model Estimation

April 07

we rarely (OK never) know the model beforehand.


can we estimate the model from labeled observations.

HST 582

John W. Fisher III, 2002-2006

Problem Setup

April 07

HST 582

John W. Fisher III, 2002-2006

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

6.555/HST 582

Apr-07

Concepts
In many experiments there is some element of
randomness the we are unable to explain.
Probability and statistics are mathematical tools for
reasoning in the face of such uncertainty.
They allow us to answer questions quantitatively such as
Is the signal present or not?
Binary : YES or NO

How certain am I?
Continuous : Degree of confidence

We can design systems for which


Single use performance has an element of uncertainty
Average case performance is predictable

April 07

HST 582

John W. Fisher III, 2002-2006

Anomalous behavior (example)


How do quantify our belief that these are anomalies?
How might we detect them automatically?

April 07

HST 582

John W. Fisher III, 2002-2006

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

6.555/HST 582

Apr-07

Detection of signals in noise


In which of these plots is the signal present?
Why are we more certain in some cases than others?

April 07

signal

signal(?)+noise

signal(?)+noise

signal(?)+noise

HST 582

John W. Fisher III, 2002-2006

Coin Flipping
Fairly simple probability modeling problem
Binary hypothesis testing
Many decision systems come down to making a decision on
the basis of a biased coin flip (or N-sided die)

April 07

HST 582

John W. Fisher III, 2002-2006

10

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

6.555/HST 582

Apr-07

Bayes Rule
Bayes rule plays an important role in classification,
inference, and estimation.

A useful thing to remember is that conditional


probability relationships can be derived from a Venn
diagram. Bayes rule then arises from straightforward
algebraic manipulation.
April 07

HST 582

John W. Fisher III, 2002-2006

11

Heads/Tails Conditioning Example


2nd flip
1st flip

If I flip two coins and tell you


at least one of them is heads
what is the probability that at
least one of them is tails?
The events of interest are the
set of outcomes where at least
one of the results is a head.
The point of this example is
two-fold

HH

HT

TH

TT

Keep track of your sample space


and events of interest.
Bayes rule tells how to
incorporate information in order
to adjust probability.

April 07

HST 582

John W. Fisher III, 2002-2006

12

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

6.555/HST 582

Apr-07

Heads/Tails Conditioning Example


2nd flip
1st flip

The probability that at least


one of the results is heads is
by simple counting.
The probability that both of
the coins are heads is

HH

HT

TH

TT

2nd flip

April 07

HST 582

1st flip

The chance of winning is 1 in 3


Equivalently, the odds of
winning are 1 to 2

HH

HT

TH

TT

John W. Fisher III, 2002-2006

13

Defining Probability (Frequentist vs. Axiomatic)


The probability of an event is the number of times we expect a specific
outcome relative to the number of times we conduct the experiment.

Define:
N : the number of trials
NA, NB : the number of times events A and B are observed.
Events A and B are mutually exclusive (i.e. observing one precludes observing
the other).

Empirical definition:
Probability is defined as a
limit over observations

April 07

HST 582

Axiomatic definition:
Probability is derived from
its properties

John W. Fisher III, 2002-2006

14

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

6.555/HST 582

Apr-07

Estimating the Bias of a Coin (Bernoulli Process)

April 07

HST 582

John W. Fisher III, 2002-2006

15

4 out of 5 Dentists
What does this statement mean?
How can we attach meaning/significance to the claim?
An example of a frequentist vs. Bayesian viewpoint
The difference (in this case) lies in:
The assumption regarding how the data is generated
The way in which we can express certainty about our answer

Asympotitically (as we get more observations) they both


converge to the same answer (but at different rates).

April 07

HST 582

John W. Fisher III, 2002-2006

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

6.555/HST 582

Apr-07

Sample without Replacement, Order Matters


Begin with N empty boxes

each term represents the number


of different choices we have at
each stage
Start with N empty boxes

this can be re-written as


Choose one from N choices

and then simplified to

At left: color indicates the order


in which we filled the boxes.
Any sample which fills the same
boxes, but has a different color
in any box (there will be at least
2) is considered a different
sample.
April 07

HST 582

Choose another one from N-1 choices

:
:

Choose the kth box from the N-k+1 remaining choices

John W. Fisher III, 2002-2006

17

Sample without Replacement, Order doesnt Matter


The sampling procedure is the
same as the previous except
that we dont keep track of the
colors.
The number of sample draws
with the same filled boxes is
equal to the number of ways we
can re-order (permute) the
colors.
The result is to reduce the
total number of draws by that
factor.

Start with N empty boxes

Choose one from N choices

Choose another one from N-1 choices

:
:

Choose the kth box from the N-k+1 remaining choices

April 07

HST 582

John W. Fisher III, 2002-2006

18

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

6.555/HST 582

Apr-07

Cumulative Distributions Functions (PDFs)


cumulative distribution function (CDF) divides a
continuous sample space into two events

It has the following properties

April 07

HST 582

John W. Fisher III, 2002-2006

19

Probability Density Functions (PDFs)


probability density function (PDF) is defined in terms
of the CDF

Some properties which follow are:

April 07

HST 582

John W. Fisher III, 2002-2006

20

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

10

6.555/HST 582

Apr-07

Expectation
Given a function of a random
variable (i.e. g(X)) we define its
expected value as:

For the mean, variance, and


entropy (continous examples):

April 07

HST 582

Expectation is linear (see


variance example once weve
defined joint density function
and statistical independence)

Expectation is with regard to


ALL random variables within the
arguments.
This is important for multidimensional and joint random
variables.

John W. Fisher III, 2002-2006

21

Multiple Random Variables (Joint Densities)


We can define a density over
multiple random variables in a
similar fashion as we did for a
single random variable.

v
y

p XY ( u, v )

1. We define the probability of the


event { X x AND Y y} as a
function of x and y.
2. The density is the function we
integrate to compute the
probability.

u
x

April 07

HST 582

John W. Fisher III, 2002-2006

22

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

11

6.555/HST 582

Apr-07

Conditional Density
y

Given a joint density or mass function over


two random variables we can define the
conditional density similar to conditional
probability from Venn diagrams

p XY ( x, y )

yo

This is, it is not of practical use unless we


condition on Y equal to a value versus
letting it remain a variable (creating an
actual density)

p XY ( x, yo ) is the slice of p XY ( x, y )
along the line y = yo
x

p XY ( x, yo ) / pY ( yo )

We also get the following relationship

April 07

HST 582

John W. Fisher III, 2002-2006

23

Bayes Rule
For continuous random variables, Bayes rule is
essentially the same (again just an algebraic
manipulation of the definition of a conditional density).

p X |Y ( x | y ) =

pY | X ( y | x ) p X ( x )
pY ( y )

This relationship will be very useful when we start


looking at classification and detection.

April 07

HST 582

John W. Fisher III, 2002-2006

24

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

12

6.555/HST 582

Apr-07

Binary Hypothesis Testing (Neyman-Pearson)


(and a simplification of the notation)

2-Class problems are equivalent to the binary hypothesis


testing problem.

The goal is estimate which Hypothesis is true (i.e. from


which class our sample came from).
A minor change in notation will make the following
discussion a little simpler.
Probability density models for the
measurement x depending on which
hypothesis is in effect.

April 07

HST 582

John W. Fisher III, 2002-2006

25

Decision Rules
p0 ( x )

p1 ( x )

x
Decision rules are functions which map measurements to
choices.
In the binary case we can write it as

April 07

HST 582

John W. Fisher III, 2002-2006

26

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

13

6.555/HST 582

Apr-07

Error Types
p0 ( x )

p1 ( x )

x
R0

R1

R1

There are 2 types of errors


A miss

A false alarm

April 07

HST 582

John W. Fisher III, 2002-2006

27

Binary Hypothesis Testing (Bayesian)


2-Class problems are equivalent to
the binary hypothesis testing
problem.

Marginal density of X

Conditional probability of the


hypothesis Hi given X
The goal is estimate which
Hypothesis is true (i.e. from which
class our sample came from).
A minor change in notation will make
the following discussion a little
simpler.
Prior probabilities of each class

Class-conditional probability density


models for the measurement x
April 07

HST 582

John W. Fisher III, 2002-2006

28

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

14

6.555/HST 582

Apr-07

A Notional 1-Dimensional Classification Example


p0 ( x )

p1 ( x )

pX ( x )

x
So given observations of x, how should select our best
guess of Hi?
Specifically, what is a good criterion for making that
assignment?
Which Hi should we select before we observe x.
April 07

HST 582

John W. Fisher III, 2002-2006

29

Bayes Classifier
p0 ( x )

p1 ( x )

pX ( x )

x
A reasonable criterion for guessing values of H given
observations of X is to minimize the probability of
error.
The classifier which achieves this minimization is the
Bayes classifier.
April 07

HST 582

John W. Fisher III, 2002-2006

30

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

15

6.555/HST 582

Apr-07

Probability of Misclassification
p0 ( x )

p1 ( x )

x
R0

R1

R1

Before we derive the Bayes classifier, consider the


probability of misclassification for an arbitrary
classifier (i.e. decision rule).
The first step is to assign regions of X, to each class.
An error occurs if a sample of x falls in Ri and we assume
hypothesis Hj.

April 07

HST 582

John W. Fisher III, 2002-2006

31

Probability of Misclassification
p0 ( x )

p1 ( x )

x
R0

R1

R1

An error is comprised of two events

These are mutually exclusive events so their joint probability is the


sum of their individual probabilities

April 07

HST 582

John W. Fisher III, 2002-2006

32

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

16

6.555/HST 582

Apr-07

Minimum Probability of Misclassification


So now lets choose regions to minimize the probability of error.

In the second step we just change the region over which integrate
for one of the terms (these are complementary events).
In the third step we collect terms and note that all underbraced
terms in the integrand are non-negative.
If we want to choose regions (remember choosing region 1
effectively chooses region 2) to minimize PE then we should set
region 1 to be such that the integrand is negative.

April 07

HST 582

John W. Fisher III, 2002-2006

33

Minimum Probability of Misclassification


Consequently, for minimum probability of
misclassification (which is the Bayes error), R1 is
defined as

R2 is the complement. The boundary is where we have


equality.
Equivalently we can write the condition as when the
likelihood ratio for H1 vs H0 exceeds the PRIOR odds of
H0 vs H1

April 07

HST 582

John W. Fisher III, 2002-2006

34

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

17

6.555/HST 582

Apr-07

Risk Adjusted Classifiers


Suppose that making one type of
error is more of a concern than
making another. For example, it is
worse to declare H1 when H2 is true
then vice versa.
This is captured by the notion of
cost.
In the binary case this leads to a
cost matrix.

Derivation
Well simplify by assuming that
C11=C22=0 (there is zero cost to
being correct) and that all other
costs are positive.
Think of cost as a piecewise
constant function of X.
If we divide X into decision regions
we can compute the expected cost
as the cost of being wrong times
the probability of a sample falling
into that region.

The Risk Adjusted Classifier tries


to minimize the expected cost

April 07

HST 582

John W. Fisher III, 2002-2006

35

Risk Adjusted Classifiers


Expected Cost is then

As in the minimum probability of


error classifier, we note that all
terms are positive in the integral, so
to minimize expected cost choose
R1 to be:

If C10=C01 then the risk


adjusted classifier is equivalent
to the minimum probability of
error classifier.
Another interpretation of
costs is an adjustment to the
prior probabilities.

Alternatively

Then the risk adjusted


classifier is equivalent to the
minimum probability of error
classifier with prior
probabilities equal to P1adj and
P0adj, respectively.
April 07

HST 582

John W. Fisher III, 2002-2006

36

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

18

6.555/HST 582

Apr-07

Error Probability as an Expectation


p0 ( x )

p1 ( x )

x
R0

R1

R1

Equivalently, we can compute


error probability as the
expectation of a function of X
and H

April 07

HST 582

John W. Fisher III, 2002-2006

37

Bayes Classifier vs Risk Adjusted Classifier


p2 ( x )

p1 ( x )

x
R1

April 07

R2

HST 582

John W. Fisher III, 2002-2006

R1

38

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

19

6.555/HST 582

Apr-07

Okay, so what.
All of this is great. We now know what to do in a few classic cases if
some nice person hands us all of the probability models.
In general we arent given the models What do we do?
Density estimation to the rescue.
While we may not have the models, often we do have a collection of
labeled measurements, that is a set of {x,Hj}.
From these we can estimate the class-conditional densities.
Important issues will be:

How close will the estimate be to the true model.


How does closeness impact on classification performance?
What types of estimators are appropriate (parametric vs. nonparametric).
Can we avoid density estimation and go straight to estimating the decision
rule directly? (generative approaches versus discriminative approaches)

April 07

HST 582

John W. Fisher III, 2002-2006

39

Cite as: John Fisher. Course materials for HST.582J / 6.555J / 16.456J, Biomedical Signal and Image Processing,
Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded
on [DD Month YYYY].

John W. Fisher III

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy