Estimation and Detection Lec 01
Estimation and Detection Lec 01
Introduction
We are interested in processing of information-bearing signals to extract information. There are two types of problems of fundamental interest.
1. Detection: Finite number of possible situations
2. Estimation: Nearest to the possible situation
We look at three examples of interest.
Example 1.1 (Communication System). We need to estimate unknown analog
signal from the received signal that is distorted and corrupted.
Source
Transducer
Transmitter
Channel
Output
Transducer
Receiver
x(t)
e(t)
u(t)
y(t)
F
Figure 2: Block diagram for automatic control.
Other applications of estimation and detection theory are in seismology, radio
astronomy, sonar, speech, signal, and image processing, biomedical signal processing, optimal communications etc.
Probability Review
We denote observation space by equipped with a -algebra G, that is a measurable collection of sets. Further, for all elements of A G, we have a non-negative
set function P : [0, 1] that satisfies the following axioms of probability:
1. P () = 1,
2. P
for any disjoint countable collection of sets {An : n N}, we have P (n An ) =
n P (An ).
Example 2.1 (Finite Observations). When observation space has finitely
many elements, we can take G = P(). Further, specifying P ({}) for all
completely specifies the probability set function.
Example 2.2 (Euclidean Space). For the case when observation space = Rn ,
we take G = B n , Borel -algebra on Rn . For this case, it suffices to specify the set
function P (A) for sets A G of the form { : i xi , i [n]}.
Definition 2.3 (Expectation). For a real valued function g : R, we denote
its expectation by E[g(Y )] and define it as
Z
E[g(Y )] =
g(y)dP (y).
y
Hypothesis Testing
= C0j Pj (0 ) + C1j Pj (1 ).
Our objective is to design a decision rule that minimizes risk. Usually, costs
of correct identification of the true hypothesis is low, and cost of incorrect identification is higher. Hence, minimizing risk for any hypothesis would be to ensure
that probability Pj (i ) is low for i 6= j. One cant simultaneously decrease all
decision regions {i }, since they form a partition of observation space .
3.1
Definition 3.8. For two distributions P1 (y) and P0 (y) we can define likelihood
dP1
(y). When two distributions admit density, it is ratio of their
ratio as L(y) = dP
0
densities at y. For discrete distributions, it is ratio of their probability mass
functions at y.
3
Theorem 3.9. For a Bayesian hypothesis testing problem optimal decision rule
that minimizes unconditional risk for a prior distribution and costs {Cij } is a
threshold based rule called likelihood ratio test . That is,
B (y) = 1{L(y) } ,
where likelihood ratio L(y) =
dP1
(y)
dP0
and threshold =
0 (C10 C00 )
.
1 (C01 C11 )
By the definition of likelihood ratio and the threshold as defined in the theorem
hypothesis, theorem follows.
Remark 1. For uniform cost, threshold =
0
1
r() = 0 P0 (1 ) + 1 P1 (0 ),
is probability of error in detection.
Definition 3.10. We can define posterior probability of hypothesis Hj being
true conditioned on observation being y as
j (y) = Pr{Hj is true |Y = y} =
j dPj (y)
.
0 dP0 (y) + 1 dP1 (y)
k,j
Remark 3. Alternatively, one can also write rejection region in terms of posterior
costs as
1 = {y : R0 (y) R1 (y)}.
Therefore, Bayes decision rule can be interpreted as the one that minimizes the
posterior cost of choosing a hypothesis when the observation is y. That is,
B (y) = 1{ R1 (y) 1} .
R0 (y)
Remark 4. For uniform cost, Bayes decision rule is likelihood ratio test for posterior probabilities when the threshold is unity, That is,
B (y) = 1{L0 (y)1} .
This is equivalent to maximizing a posterior probability of underlying hypothesis
based on the observation. This is also called a MAP decision rule for binary
hypothesis test.