Handout 4-Conditional Probability and Independent Events
Handout 4-Conditional Probability and Independent Events
STAT 1003A
Wits University
E-mail: Herbert.Hove@wits.ac.za
LECTURE SERIES
2|Page
Prepared by Dr. Herbert Hove
3.7. Conditional Probability
We have discussed ways of assigning probabilities to events. Next we consider a way to modify
these probabilities when information relevant to the outcome of the experiment becomes
available. This is the subject of conditional probability.
Example 3.7.1: In a game of chance, Adam and Betty toss a fair coin “best of three” to decide
who wins. Adam wins if there are more heads than tails, otherwise Betty wins.
Denote by A and B the events “Adam wins” and “Betty wins” respectively
𝑛(𝐴) 4
𝐴 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻} ⇒ 𝑃(𝐴) = = = 0.5(50%𝑐ℎ𝑎𝑛𝑐𝑒)
𝑛(𝑆) 8
𝑛(𝐵) 4
𝐵 = {𝑇𝑇𝐻, 𝑇𝐻𝑇, 𝐻𝑇𝑇, 𝑇𝑇𝑇} ⇒ 𝑃(𝐴) = = = 0.5(50%𝑐ℎ𝑎𝑛𝑐𝑒)
𝑛(𝑆) 8
The events A and B can be represented in a Venn diagram as follows:
Suppose now that the result of the 1st toss is “heads”. Will Adam and Betty still have a 50%
chance of winning?
The information relevant to the outcome of the experiment that has becomes available is that
the result of the 1st toss is “heads”. Denote by C the event “heads on first toss”.
3|Page
Prepared by Dr. Herbert Hove
Then:
With the additional event C, the Venn diagram now looks as follows:
Since it is known that event C occurred (shaded region), outcomes that will result in Adam
winning are now those in which C occurred and A also occurred. Thus, the event Adam wins
is now conditional on event C having occurred. Probability questions of this kind are
considered in the framework of conditional probability.
𝑛(𝐴∩𝐶)
( ) 𝑃(𝐴∩𝐶) 3
𝑛(𝑆)
𝑃(𝐴|𝐶) = 𝑛(𝐶) = = 4 = 0.75 (75% chance)
𝑃(𝐶)
𝑛(𝑆)
Similarly, the event Betty wins is also a conditional probability denoted by:
𝑛(𝐵∩𝐶)
( ) 𝑃(𝐵∩𝐶) 1
𝑛(𝑆)
𝑃(𝐵|𝐶) = 𝑛(𝐶) = = 4 = 0.25 (25% chance)
𝑃(𝐶)
𝑛(𝑆)
4|Page
Prepared by Dr. Herbert Hove
Note that 𝑃(𝐴|𝐶) and 𝑃(𝐵|𝐶) make sense only when 𝑃(𝐶) > 0, otherwise they will be
regarded as undefined. In line with the expressions for 𝑃(𝐴|𝐶) and 𝑃(𝐵|𝐶), conditional
probability is formally defined as follows:
Definition 3.7.1: Let A and B be two events in Ƒ of the given probability space (S, Ƒ, 𝑃[⋅])
such that 𝑃(𝐵) > 0. The conditional probability of event A given that event B has occurred,
denoted by 𝑃(𝐴|𝐵) is defined by:
𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) = 𝑃(𝐵)
PROOF:
𝐶𝑃 𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) = . Since 𝑃(𝐴 ∩ 𝐵) ≥ 0, 𝑃(𝐵) ≥ 0 and 𝑃(𝐵) ≥ 𝑃(𝐴 ∩ 𝐵), it follows
𝑃(𝐵)
𝑃(𝐴∩𝐵)
that ≥ 0. Hence 𝑃(𝐴|𝐵) ≥ 0
𝑃(𝐵)
b) 𝑃(𝑆|𝐵) = 1
Outcomes not in B are therefore no longer possible. Hence, B is now the new sample
space.
PROOF:
𝐶𝑃 𝑃(𝑆 ∩ 𝐵) 𝑃(𝐵)
𝑃(𝑆|𝐵) = = =1
𝑃(𝐵) 𝑃(𝐵)
5|Page
Prepared by Dr. Herbert Hove
c) If 𝐴1 , 𝐴2 , … , 𝐴𝑘 is a sequence of mutually exclusive events in Ƒ and
𝑘
⋃ 𝐴𝑖 ∈ Ƒ,
𝑖=1
then
𝑘
PROOF:
𝑃[(𝐴1 ∪ 𝐴2 ∪. . .∪ 𝐴) ∩ 𝐵]
𝑃(𝐴1 ∪ 𝐴2 ∪. . .∪ 𝐴𝑘 |𝐵) =
𝑃(𝐵)
𝑃[(𝐴1 ∩ 𝐵) ∪ (𝐴2 ∩ 𝐵) ∪. . .∪ (𝐴𝑘 ∩ 𝐵)]
=
𝑃(𝐵)
𝑃(𝐴1 ∩ 𝐵) + 𝑃(𝐴2 ∩ 𝐵)+. . . +𝑃(𝐴𝑘 ∩ 𝐵)
=
𝑃(𝐵)
= 𝑃(𝐴1 |𝐵) + 𝑃(𝐴2 |𝐵)+. . . +𝑃(𝐴𝑘 |𝐵)
𝑘
= ∑ 𝑃[𝐴𝑖 |𝐵]
𝑖=1
Thus, 𝑃[⋅ |𝐵] for given B such that 𝑃(𝐵) > 0 is a probability function. Hence, 𝑃[⋅ |𝐵] has the
same properties as the unconditional probability.
6|Page
Prepared by Dr. Herbert Hove
Exercise:
A random experiment involves tossing two fair 6 – sided dice. Given that the sum of the
outcomes is 7, find the probability that at least one of the dice exhibits a 3. Answer: 1/3
SOLUTION:
Let: A be the event “sum of the outcomes is 7”
B be the event “at least one of the dice exhibits a 3”
𝑛(𝐵 ∩ 𝐴) 2
𝑃(𝐵 ∩ 𝐴) ( 𝑛(𝑆) ) 36 2 1
𝑃(𝐵|𝐴) = = = = =
𝑃(𝐴) 𝑛(𝐴) 6 6 3
𝑛(𝑆) 36
7|Page
Prepared by Dr. Herbert Hove
3.8. Multiplication Rule
A simple rearrangement of the conditional probability formula yields the multiplication rule:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴) if both 𝑃(𝐴) and 𝑃(𝐵) are nonzero. The
multiplication rule relates 𝑃(𝐴|𝐵) to 𝑃(𝐵|𝐴) in terms of the unconditional probabilities 𝑃(𝐴)
and 𝑃(𝐵). It is useful when calculating probabilities in random experiments that have a
sequential structure as shown in the next example.
Example 3.8.1: Two cards are dealt from a deck of 52 cards one at a time without replacement.
𝟒 𝟑
What is the probability that they are both aces? Answer: (𝟓𝟐) (𝟓𝟏)
SOLUTION:
Let 𝐴𝑖 denote the event that the ith card dealt is an ace
Let 𝐴𝑖 denote the event that the ith card dealt is not an ace
𝟒 𝟑
𝑃(𝐴2 |𝐴1 ) = 3⁄51 (𝟓𝟐) (𝟓𝟏)
𝑃(𝐴1 ) = 4⁄52
8|Page
Prepared by Dr. Herbert Hove
The multiplication rule generalises to more than two events:
❖ For three events:
Exercise:
A class has 20 pupils; 6 girls and 14 boys. Three pupils are selected at random one after another.
What is the probability that all three are boys? Answer: (14/20)*(13/19)*(12/18)
Let Ai denote the event that the ith child selected is a boy
9|Page
Prepared by Dr. Herbert Hove
3.9. Bayes’s Rule
This section explores some applications of conditional probability. In particular, interest is in
learning a classical theorem known as Bayes’ Theorem used when finding the conditional
probability𝑃(𝐴|𝐵), say when the “reverse” conditional probability 𝑃(𝐵|𝐴) is the probability
that is known.
EXAMPLE:
Three machines 𝐴1 , 𝐴2 and 𝐴3 produce respectively, 50%, 30% and 20% of the total number
of items in a factory. The respective percentages of defective items are 3%, 4% and 5%.
The probabilities 𝑃(𝐴1 ), 𝑃(𝐴2 ) and 𝑃(𝐴3 ) are often referred to as prior probabilities since
they are probabilities of events 𝐴1 , 𝐴2 and 𝐴3 that are known prior to obtaining any additional
information.
The Quality Control Manager (QCM), say would be responsible for investigation the source of
defectives found for intervention purposes. Specifically, interest would be in answering the
following questions:
If a randomly selected item from the factory’s produce is found to be defective, what is the
probability that:
10 | P a g e
Prepared by Dr. Herbert Hove
That is, the probabilities that are of interest to the QCM are:
𝑃(𝐴1 ∩𝐷)
1) 𝑃(𝐴1 |𝐷) = 𝑃(𝐷)
𝑃(𝐴2 ∩𝐷)
2) 𝑃(𝐴2 |𝐷) = 𝑃(𝐷)
𝑃(𝐴3 ∩𝐷)
3) 𝑃(𝐴3 |𝐷) = 𝑃(𝐷)
These conditional probabilities are often referred to as posterior probabilities because they
are probabilities of the events after additional information have been obtained. In calculating
these probabilities, what is known is effectively turned upside down to find what is of interest.
That is, what is required is 𝑃(𝐴𝑖 |𝐷), 𝑖 = 1,2,3 but what is known is 𝑃(𝐷|𝐴𝑖 ), 𝑖 = 1,2,3. It is
for this reason that one may say interest is in finding “reverse conditional probabilities” when
answering these kinds of problems.
Continuing with the machine items example, answer the following questions:
a) What is the probability that a randomly selected item from the factory’s produce is
defective? Answer: 0.037
b) Suppose a randomly selected item is found to be defective. What is the probability that
the item was produced by machine 𝐴1 ? Answer: 0.405
11 | P a g e
Prepared by Dr. Herbert Hove
Solutions:
Assuming partitioning events 𝐴1 , 𝐴2 and 𝐴3 of the sample space
12 | P a g e
Prepared by Dr. Herbert Hove
a) Denote by 𝐷 the event that a randomly selected part is defective. Then:
= ∑ 𝑃(𝐴𝑖 )𝑃(𝐷|𝐴𝑖 )
𝑖=1
= 0.037
This is referred to as “Total Probability” and is generalised as follows: Observe from the Venn
diagram that 𝐷 can be decomposed into the disjoint union of its intersections with the sets
𝐴𝑖 (𝐴𝑖 ∩ 𝐷, 𝑖 = 1,2,3).
= ∑ 𝑃(𝐴𝑖 )𝑃(𝐷|𝐴𝑖 )
𝑖=1
Theorem 3.9.1.1 Total Probability: For a given probability space (S, Ƒ, 𝑃[⋅]), if
𝐴1 , 𝐴2 , . . . , 𝐴𝑛 is a collection of mutually disjoint events in Ƒ that are such that 𝑆 = ⋃𝑛𝑖=1 𝐴𝑖 and
𝑃(𝐴𝑖 ) > 0 for 𝑖 = 1,2, . . . , 𝑛, then for every 𝐵 ∈ Ƒ,
13 | P a g e
Prepared by Dr. Herbert Hove
PROOF:
Note that
𝑛
𝐵 = ⋃(𝐵 ∩ 𝐴𝑖 )
𝑖=1
and that (𝐵 ∩ 𝐴𝑖 ), 𝑖 = 1,2, . . . , 𝑛 are mutually disjoint (stronger condition than pairwise
disjoint). Hence,
𝑛
𝑃(𝐵) = 𝑃 (⋃(𝐵 ∩ 𝐴𝑖 ))
𝑖=1
𝑛
= ∑ 𝑃(𝐵 ∩ 𝐴𝑖 )
𝑖=1
𝑛
Theorem 3.9.1.2 Bayes’ Theorem: For a given probability space (S, Ƒ, 𝑃[⋅]), if 𝐴1 , 𝐴2 , . . . , 𝐴𝑛
is a collection of mutually disjoint events in Ƒ satisfying
𝑛
𝑆 = ⋃ 𝐴𝑖
𝑖=1
and 𝑃(𝐴𝑖 ) > 0 for 𝑖 = 1,2, . . . , 𝑛, then for every 𝐵 ∈ Ƒ such that 𝑃(𝐵) > 0:
𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 )
𝑃(𝐴𝑖 |𝐵) = 𝑛
∑𝑖=1 𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 )
14 | P a g e
Prepared by Dr. Herbert Hove
PROOF:
𝑃(𝐴𝑖 ∩ 𝐵)
𝑃(𝐴𝑖 |𝐵) = Def int i on of conditional probability
𝑃(𝐵)
𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 )
= Multiplication Rule
𝑃(𝐵)
𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 )
= 𝑛 Total probability
∑𝑖=1 𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 )
3.10. Independence
Recall that the conditional probability, say 𝑃(𝐴|𝐵) captures the partial information that event
B provides about event A. A special case arises when knowing that event B has occurred does
not alter the probability of event A occurring. That is:
𝑃(𝐴|𝐵) = 𝑃(𝐴)
Definition 3.10.1 Independent Events: For a given probability space (S, Ƒ, 𝑃[⋅]), let A and
B be two events in Ƒ. Events A and B are defined to be independent (or statistically/
stochastically independent) if and only if one of the following conditions is satisfied:
i) 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵)
ii) 𝑃(𝐴|𝐵) = 𝑃(𝐴) if P ( B ) 0
The equivalence of the three conditions above can be argued by showing that i) implies ii); ii)
implies iii) and iii) implies i).
15 | P a g e
Prepared by Dr. Herbert Hove
If 𝑃(𝐴|𝐵) = 𝑃(𝐴), then:
𝑃(𝐵 ∩ 𝐴) 𝑃(𝐴|𝐵)𝑃(𝐵) 𝑃(𝐴)𝑃(𝐵)
𝑃(𝐵|𝐴) = = = = 𝑃(𝐵)
𝑃(𝐴) 𝑃(𝐴) 𝑃(𝐴)
for 𝑃(𝐴) > 0 and 𝑃(𝐵) > 0. Consequently, ii) implies iii).
Finally, if 𝑃(𝐵|𝐴) = 𝑃(𝐵), then 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴) = 𝑃(𝐵)𝑃(𝐴) for P ( A) 0 . Hence, iii)
implies i).
Exercises:
Consider the experiment involving two successive rolls of a fair 4 – sided die. Define events
A, B, C and D as follows:
independent?
SOLUTION:
1 2 3 4
1 (1,1) (1, 2) (1, 3) (1, 4)
2 (2,1) (2, 2) (2, 3) (2, 4)
3 (3,1) (3, 2) (3, 3) (3, 4)
4 (4,1) (4, 2) (4, 3) (4, 4)
16 | P a g e
Prepared by Dr. Herbert Hove
a)
4
𝐴 = {(2,1); (2,2); (2,3); (2,4)} ⇒ 𝑃(𝐴) =
16
4
𝐵 = {(1,4); (2,3); (3,2); (4,1)} ⇒ 𝑃(𝐵) =
16
1
𝐴 ∩ 𝐵 = {(2,3)} ⇒ 𝑃(𝐴 ∩ 𝐵) =
16
4 4 1
Now, 𝑃(𝐴)𝑃(𝐵) = 16 × 16 = 16 = 𝑃(𝐴 ∩ 𝐵).
b)
5
𝐶 = {(1,3); (2,3); (3,1); (3,2); (3,3)} ⇒ 𝑃(𝐶) =
16
3
𝐷 = {(3,3); (3,4); (4,3)} ⇒ 𝑃(𝐷) =
16
1
𝐶 ∩ 𝐷 = {(3,3)} ⇒ 𝑃(𝐶 ∩ 𝐷) =
16
5 3 15
Now, 𝑃(𝐶)𝑃(𝐷) = 16 × 16 = 162 ≠ 𝑃(𝐶 ∩ 𝐷).
c)
4 12
𝑃(𝐴) = ; 𝑃(𝐵) =
16 16
3
𝐴 ∩ 𝐵 = {(2,1); (2,2); (2,4)} ⇒ 𝑃(𝐴 ∩ 𝐵) =
16
4 12 3
Now, 𝑃(𝐴)𝑃(𝐵) = 16 × 16 = 16 = 𝑃(𝐴 ∩ 𝐵).
This leads to the following proposition: If A and B are independent events, then A and 𝐵 are
also independent events.
17 | P a g e
Prepared by Dr. Herbert Hove
PROOF:
𝐴 = (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵) where (𝐴 ∩ 𝐵) ∩ (𝐴 ∩ 𝐵) = ∅
Now,
𝑃(𝐴) = 𝑃(𝐴 ∩ 𝐵) + 𝑃(𝐴 ∩ 𝐵)
d)
12 12
𝑃(𝐴) = ; 𝑃(𝐵) =
16 16
9
𝐴 ∩ 𝐵 = {(1,1); (1,2); (1,3); (3,1); (3,2); (3,4); (4,2); (4,3); (4,4)} ⇒ 𝑃(𝐴 ∩ 𝐵) =
16
12 12 9
Now, 𝑃(𝐴)𝑃(𝐵) = 16 × 16 = 16 = 𝑃(𝐴 ∩ 𝐵).
This also leads to the following proposition: If A and B are independent events, then 𝐴 and 𝐵
are also independent events.
PROOF:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴 ∪ 𝐵) DeMorgan′s
= 1 − 𝑃(𝐴 ∪ 𝐵)
= 1 − [𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)] Addition Rule
= 1 − 𝑃(𝐴) − 𝑃(𝐵) + 𝑃(𝐴 ∩ 𝐵)
= 1 − 𝑃(𝐴) − 𝑃(𝐵) + 𝑃(𝐴)𝑃(𝐵) Independence
= 1 − 𝑃(𝐴) − 𝑃(𝐵)[1 − 𝑃(𝐴)]
= (1 − 𝑃(𝐴))(1 − 𝑃(𝐵))
= 𝑃(𝐴)𝑃(𝐵)
18 | P a g e
Prepared by Dr. Herbert Hove
e)
5 13
𝑃(𝐶) = ; 𝑃(𝐷) =
16 16
4
𝐶 ∩ 𝐷 = {(1,3); (2,3); (3,1); (3,2)} ⇒ 𝑃(𝐶 ∩ 𝐷) =
16
5 13 65
Now, 𝑃(𝐶)𝑃(𝐷) = 16 × 16 = 256 ≠ 𝑃(𝐶 ∩ 𝐷).
19 | P a g e
Prepared by Dr. Herbert Hove