Sample Space.: How Many Different Equally Likely Possibilities Are There?
Sample Space.: How Many Different Equally Likely Possibilities Are There?
Summary
How many different equally likely possibilities are there? Then the probability is:
#of possibilities constraints my conditions
= (1)
# of equally likely possibilities
1
p(H) = (2)
2
1
p(2) = (3)
6
2
p(2 or 1) = (4)
6
p(2 and 1) = these are mutually exclusive events: they cannot HAPPEN at the same time (5)
The definition in the Eq. (1) is valid when the condition of equally probable events is satisfied.
Another example: What is the probability that I get head on the first flip and then heads on the second
flip p(HH)?
1/2 H
Heads
(H)
1/2 1/2 T
flipping
a coin
1/2 H
1/2
Tails (T)
1/2 T
Fig. 1. Probability tree. Probability tree works excellent, if you do not deal with huge numbers. For 100 trials, it can be ended up as a
messy calculations.
II. S ET O PERATIONS
We can combine events using set operations to obtain other events. We can also express complicated
events as combinations of simple events. So, set is a collection of elements of the events.
A∪B
A B
A∩B
A B
The intersection of two events A and B is denoted by A ∩ B and is defined as the set
of outcomes that are in both A and B. Two events are said to be mutually exclusive if their intersection
is the null event, A ∩ B = Ø.
(A ∩ B)c
A B
2
A⊂B
B
A
Commutative properties:
A∪B =B∪A and A∩B =B∩A (6)
Associative properties:
A ∪ (B ∪ C) = (A ∪ B) ∪ C and A ∩ (B ∩ C) = (A ∩ B) ∩ C (7)
Distributive properties:
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) and A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) (8)
DeMorgan’s rule:
(A ∩ B)c = Ac ∪ B c and (A ∪ B)c = Ac ∩ B c (9)
The result of DeMorgen’s rule states that if we partition the sample space into two mutually exclusive
events, like A and Ac , then the probabilities of these two events add up to one.
p(Ac ) = 1 − p(A)
A. Conditional probability
The conditional probability, p(A|B), of event A given that event B has occurred is defined by:
p(A ∩ B)
p(A|B) = f or p(B) > 0. (10)
p(B)
3
Bayes’s rule: Let B1 , B2 , · · · Bn be a partition of a sample space S. Suppose that event A occurs; what
is the probability of event Bj . By the definition of conditional probability we have
p(A ∩ Bj ) p(A|Bj )p(Bj )
p(Bj |A) = = X
n (11)
p(A)
p(A|Bk )p(Bk )
k=1
B. Independence of events
If knowledge of the occurrence of an event B does not alter the probability of some other event A,
then it would be natural to say that event A is independent of B. In terms of probabilities this situation
occurs when
p(A ∩ B)
p(A) = p(A|B) = (12)
p(B)
The above equation has the problem in the case of p(B) = 0. Then we define the independency if:
p(A ∩ B) = p(A)P (B) (13)
yielding
p(A|B) = p(A) and p(B|A) = p(B). (14)
C. Sequential experiments
Example: A die is tossed and the number N1 of dots facing up is counted and noted; an integer
N2 is then selected at random from the range 1 to N1
• Find the set of outcomes corresponding to the event: die shows four dots facing up
• Find the set of outcomes corresponding to the event N2 = 3
• Find the set of outcomes corresponding to the event N2 = 6
In each of the previous examples, a measurement assigns a numerical value to the outcome of the
random experiment. To do this we make use of RANDOM VARIABLES (RV). RV is not a variable in
the traditional sense of the world it is more of a function that maps us from the world of a random process
to a number.
|Random P rocess
{z ⇒ A Number}
RV
4
A random variable X is a function that assigns a real number -X(ζ)- to each outcome ζ in the sample
space of a random experiment.
(
1 heads
X= ⇒ The outcomes of the events heads and tails are quantified by using RV (15)
0 tails
Example:Suppose that a coin is tossed three times and the sequence of heads and tails is noted.
The sample space for this experiment is S = HHH, HHT, T HH, T HT, T T H, T T T. Now let X be the
number of heads in three coin tosses. X assigns each outcome ζ in S a number from the set SX = 0, 1, 2, 3.
The table below lists the eight outcomes of S and the corresponding values of X.
ζ :HHH HHT HT H T HH HT T T HT T T H T T T
(16)
X(ζ) : 3 2 2 2 1 1 1 0
X is then a RANDOM VARIABLE taking on values in the set SX = 0, 1, 2, 3.
An example:
ã = exact amount of rain tomorrow
pH+L
1
0.8
0.6
0.4
0.2
+
1 2 3 4 5 6 7
Fig. 2. x axis is amount of rain (RV): We are looking the exact probability of 2 inches, not 2.01, 1.999999, 2.00002.
The probability of p(ǎ = 2) would be equal to the area under curve from 1.9 to 2.1 of f (x). Here it
is important to realize when a RV can take an infinitive number of values, it is a continuous RV.
!!!! The probability of all of the events that might occur can not be more than %100 = 1. Lets do a
5
bit of terminology then,
Probability Density Function (PDF):Non-negative function fX (x) entitled PDF specifies the probabilities
of events of the form X falls in a small interval of width dx about the point x. The probabilities of events
involving X are then expressed in terms of the PDF by adding the probabilities of intervals of width dx.
As the widths of the intervals approach zero, we obtain an integral in terms of the PDF:
• The probability of an interval is therefore the area under fX (x) in that interval.
• A continuous RV is defined as a RV whose CDF FX (x) is a continuous everywhere, and which,
in addition, is sufficiently smooth that it can be integral of some non-negative function f (x). The
Cumulative Distribution Function (CDF) of X can be obtained by integrating the PDF
Z x
FX (x) = fX (t)dt. (18)
−∞
where pX (xk ) = p(X = xk ) gives the magnitude of the jumps in the CDF.
• The PDF completely specifies the behavior of continuous random variables.
• Normalization condition for PDFs:
Z ∞
fX (t)dt = 1 ⇒ The probability of all of the events that might occur can not be more than %100
−∞
(20)
• A valid PDF can be FORMED from any nonnegative, piecewise continuous function g(x) that has a
finite interval:
Z ∞
g(x)
g(x)dx = c < ∞ ⇒ by letting fX (x) = NORMALIZED!! (21)
−∞ c
0 x<a
x−a
fX (x) = b−a
a≤x≤b (23)
1 x>b
6
Example: The PDF of the samples of the amplitude of speech waveforms is found to decay
exponentially at a rate α, so the following PDF is proposed:
fX (x) = ce−α|x| −∞<x<∞ . (24)
Find the constant c, and then find the probability of p(|X| < µ).
We now know what a probability distribution is. Now let’s study a couple of the more common ones.
Binomial Distribution: The binomial RV arises in applications where there are two types of objects
(i.e. heads/tails, correct/erroneous bits, ...), and we are interested in the number of type 1 objects in a
randomly selected batch of size n., where the type of each object is independent of the types of the other
objects in the batch.
Suppose that a random experiment is repeated n independent times. Let X be the number of times a
certain event A occurs in these n trials. X could be the number of heads in n toss of a coin. If we let Ij
be the indicator function for the event A in the j th trial, then
X = I1 + I2 + ... + In (25)
that is, X is the sum of the Bernoulli RV associated with each of the n independent trials. X has the
following PMF (Probability Mass Function):
!
n k
p(X = k) = p (1 − p)n−k f or k = 0, 1, 2, ..., n (26)
k
X is called the binomial RV.
Example: Lets assume that I am basket player, and considering my previous games I am %30
successful to make a basket. I am going to take 6 shots in the game. So my random variables are:
X = #shots I make = {0, 1, 2, 3, 4, 5, 6} (27)
!
6
p(X = 0) = × 0.30 × 0.76−0 ⇒ no shoots at all
0
!
6
p(X = 1) = × 0.31 × 0.76−1
1
!
6
p(X = 2) = × 0.32 × 0.76−2
2
!
6
p(X = 3) = × 0.33 × 0.76−3 (28)
3
!
6
p(X = 4) = × 0.34 × 0.76−4
4
!
6
p(X = 5) = × 0.35 × 0.76−5
5
!
6
p(X = 6) = × 0.36 × 0.76−6 In order to make 6 shots I m going to miss 0 shots
6
7
IV. FUNCTIONS OF R ANDOM VARIABLES (RV)
Let X be a random variable and let g(x) be a real-valued function defined on the real line. Define
Y = g(X), that is, Y is determined by evaluating the function g(x) at the value assumed by the RV X.
Then Y is also a RV. The probabilities -and hence- the cumulative distribution function of Y depend on
the g(x).
y−b
The event {Y ≤ y} occurs when A = {aX + b ≤ y} occurs. If a > 0, then A = {X ≤ a
}, and thus
! !
y−b y−b
FY (y) = p X ≤ = FX a > 0. (29)
a a
y−b
On the other hand, if a < 0, then A = {X ≥ a
}:
! !
y−b y−b
FY (y) = p X ≥ = 1 − FX a < 0. (30)
a a
We can obtain the pdf of Y by differentiating with respect to y. To do this we need to use the chain
rule for derivatives:
dF dF du
= (31)
dy du dy
y−b
where u = a
.Then, the PDF
!
1 y−b
fY (y) = fX a>0 (32)
a a
and !
1 y−b
fY (y) = − fX a<0 (33)
a a
!
1 y−b
fY (y) = fX a>0 (34)
|a| a
8
Example: The amplitude of the radar signal A has a Rayleigh distribution:
2A − A2
pA (A) = e σ A≥0 (35)
σ
If the intensity has defined as an I = A2 , what is its PDF?
We can increase the examples, here you have to know that if X is a RV, than its function Y = g(X)
is a RV as well. And the PDF -and hence- the other functions of Y can be found by change of variable
rule.
9
variance of the RV X is defined as the mean-squared variation E[D 2 ]:
h i
V AR[X] = E (X − E[X])2
h i
= E X 2 − 2E[X]X + E[X]2 the expectation of the constant equals to itself
h i
= E X 2 − 2E[X]E[X] + E[X]2
V AR[X] = E[X 2 ] − E[X]2
(39)
• The standard deviation of the RV X is defined by
It is easy to note that the mean and the variance are the functions of the first two moments.
Example: The PDF for the Gaussian random variable X is given by:
1 (x−m)2
pX (x) = √ e− 2σ2 − ∞ < x < ∞ m, σ > 0 ∈ ℜ (42)
2πσ
have the mean m and the standard deviation σ. Prove it. Then plot three different PDF with different m
and σ values, and interpret it as in Fig. 3.
V. ENTROPY
Let X be a discrete Random Variable with SX = {1, 2, · · · , K} and pmf pk = p(X = k). We are
interested in quantifying the uncertainty of the event Ak = {X = k}. Clearly, the uncertainty of Ak is
low if the probability of Ak is low if the probability of Ak is close to one, and it is high if the probability
of Ak is small. The following measure of uncertainty satisfies these two properties:
10
(a) mean (b) variance
Fig. 3. (a) The Gaussian PDF with the mean of m = {−1, 1, 2} and fixed variance. (b) The Gaussian PDF with variance of σ = {0.75, 1, 2}
with a mean of 0.
1
I(X = k) = ln = − ln p(X = k). (43)
p(X = k)
Note from the Fig. 4
Fig. 4. ln(1/x) ≥ 1 − x. Here, it is easy to recognize that I(X = k) increases with decreasing p(X = k).
The entropy of a RV X is defined as the expected value of the uncertainty of its outcomes:
K
X
HX = − p(X = k) ln p(X = k) for discrete RV, (45)
k=1
Z ∞
HX = − fX (x) ln fX (x)dx for continuous RV. (46)
−∞
11
-H1 - ΡL logH1 - ΡL - Ρ logHΡL
1.0
0.8
0.6
0.4
0.2
0.0 Ρ
0.0 0.2 0.4 0.6 0.8 1.0
Fig. 5. Here, the base of the logorithm is 2. Note that changing the base of the logarithm is equivalent to multiplying entropy by a constant.
The blue curve shows the curve of −ρ ln ρ, and the red one shows the curve of −(1 − ρ) ln(1 − ρ). The uncertainties of the events vary
together in complementary fashion.
12