Probability Solution Manual
Probability Solution Manual
Solution Manual
2 Conditional Probability 26
4 Expectation 73
6 Moments 116
i
Chapter 1
1.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Story Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Naive Definition Of Probability . . . . . . . . . . . . . . . . . 14
1.4 Axioms Of Probability . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Inclusion Exclusion . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6 Mixed Practice . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.1 Counting
1.1.1 problem 1
There are
11
!
4
ways to select 4 positions for I,
7
!
3
!
1
CHAPTER 1. PROBABILITY AND COUNTING 2
4 4 2 1
permutations.
1.1.2 problem 2
(a) If the first digit can’t be 0 or 1, we have eight choices for the first
digit. The remaining six digits can be anything from 0 to 9. Hence,
the solution is
8 × 106
(b) We can subtract the number of phone numbers that start with 911 from
the total number of phone numbers we found in the previous part.
If a phone number starts with 911, it has ten choices for each of the
remaining four digits.
8 × 106 − 104
1.1.3 problem 3
(a) Fred has 10 choices for Monday, 9 choices for Tuesday, 8 choices for
Wednesday, 7 choices for Thursday and 6 choices for Friday.
10 × 9 × 8 × 7 × 6
(b) For the first restaurant, Fred has 10 choices. For all subsequent days,
Fred has 9 choices, since the only restriction is that he doesn’t want to
eat at the restaurant he ate at the previous day.
10 × 94
CHAPTER 1. PROBABILITY AND COUNTING 3
1.1.4 problem 4
(a) There are n
2
matches.
For a given match, there are two outcomes. Each match has two pos-
sible outcomes. We can use the multiplication rule to count the total
possible outcomes.
2( 2 )
n
(b) Since every player plays every other player exactly once, the number of
games is the number of ways to pair up n people.
!
n
2
1.1.5 problem 5
(a) By the end of each round, half of the players participating in the round
are eliminated. So, the problem reduces to finding out how many times
the number of players can be halved before a single player is left.
The number of times 2N can be divided by two is log2 2N which means
the total amount of rounds in the tournament is
N N N N
f (N ) = + + + · · · + log N
2 4 8 2 2
log2 N
X 1
=N
i=0 2 (1.1)
i
N −1
=N×
N
=N −1
CHAPTER 1. PROBABILITY AND COUNTING 4
N −1
1.1.6 problem 6
Line up the 20 players in some order then say the first two are a pair, the
next two are a pair, etc. This overcounts by a factor of 10! because we don’t
care about the order of the games. So in total we have
20!
10!
ways for them to play. This correctly counts for the whether player A plays
white or black. If we didn’t care we would need to divide by 210 .
Another way to look at it is to choose the 10 players who will play white
then let each of them choose their opponent from the other 10 players. This
gives a total of
20
!
× 10!
10
possibilities of how they are matched up. We don’t care about the order
of the players who play white but once we’ve chosen them the order of the
players who play black matters since different orders mean different pairings.
1.1.7 problem 7
(a) There are 7
3
ways to assign three wins to player A. For a specific
combination of three games won by A, there are 42 ways to assign two
draws to A. There is only one way to assign two losses to A from the
remaining two games, namely, A losses both games.
7 4 2
! ! !
× ×
3 2 2
CHAPTER 1. PROBABILITY AND COUNTING 5
(b) If A were to draw every game, there would need to be at least 8 games
for A to obtain 4 points, so A has to win at least 1 game. Similarly, if
A wins more than 4 games, they will have more than 4 points.
7 7 5 7 4 7
! ! ! ! ! !
+ × + × +
1 2 4 3 2 4
(c) If B were to win the last game, that would mean that A had already
obtained 4 points prior to the last game, so the last game would not
be played at all. Hence, B could not have won the last game. The last
game must have ended in either A winning (case 1) or a draw (case 2).
Case 1: A wins the last game. This means A had 3 points after 6
games.
There are four possibilities for A to earn 3 points in 6 games:
CHAPTER 1. PROBABILITY AND COUNTING 6
1.1. 6 draws
1.2. 3 wins and 3 losses
1.3. 2 wins, 2 draws, and 2 losses
1.4. 1 win, 4 draws, and 1 loss.
1.1. There is only one way to assign 6 draws to 6 games: The number
of possibilities is 1.
1.2. There are 63 ways to assign 3 wins to A out of the first 6 games.
The remaining
3 games are losses for A. The number of possibili-
ties is 3 .
6
1.3. There are 6
ways to assign 2 wins to A out of the first 6 games.
2
There are ways to assign 2 draws out of the remaining 4 games.
4
2
The remaining
2 games are losses for A. The number of possibili-
ties is 62 × 42 .
1.4. There are 6
ways to assign 1 win to A out of the first 6 games.
1
There are 54 ways to assign 4 draws out of the remaining 5 games.
The
remaining
game is a loss for A. The number of possibilities
is 61 × 54 .
Case 2: The last game ends in a draw. This means A had 3.5 points
after 6 games.
There are three possibilities for A to earn 3.5 points in 6 games:
6 6 4 6 5 6 3 6 4 6
! ! ! ! ! ! ! ! ! !
1+ + × + × + × + × +
3 2 2 1 4 3 1 2 3 1
1.1.8 problem 10
(a) Case 1: Student takes exactly one statistics course.
There are 5 choices for the statistics course. There are 15
6
choices of
selecting 6 non-statistics courses.
Case 2: Student takes exactly two statistics courses.
There are 52 choices for the two statistics course. There are 15
5
choices of selecting 5 non-statistics courses.
Case 3: Student takes exactly three statistics courses.
There are 53 choices for the three statistics course. There are 15
4
choices of selecting 4 non-statistics courses.
Case 4: Student takes exactly four statistics courses.
There are 54 choices for the four statistics course. There are 15
3
choices of selecting 3 non-statistics courses.
Case 5: Student takes all the statistics courses.
There are 15
2
choices of selecting 2 non-statistics courses.
So the total number of choices is
5 15 5 15 5 15 5 15 5 15
! ! ! ! ! ! ! ! ! !
× + × + × + × + ×
1 6 2 5 3 4 4 3 5 2
CHAPTER 1. PROBABILITY AND COUNTING 8
An Alternative Approach
There are 20 7
choices of selecting 7 courses which is the maximum
number of choices if there were no restriction as choosing at least one
statistics course.
15
7
is the number of choices without any statistics course.
So the total number of choices with at least one statistics course is
20 15
! !
−
7 7
(b) It is true that there are 51 ways to select a statistics course, and 19
6
ways to select 6 more courses from the remaining 19 courses, but this
procedure results in overcounting.
For example, consider the following two choices.
(a) STAT110, STAT134, History 124, English 101, Calculus 102, Physics
101, Art 121
(b) STAT134, STAT110, History 124, English 101, Calculus 102, Physics
101, Art 121
1.1.9 problem 11
(a) Each of the n inputs has m choices for an output, resulting in
mn
possible functions.
1.1.10 problem 12
(a)
52
!
13
52 39 26 13
! ! ! !
13 13 13 13
(c) The key is to notice that the sampling is done without replacement.
4
52
13
assumes that all four players have 52
13
choices of hands available
to them. This would be true if sampling was done with replacement.
1.1.11 problem 13
The problem amounts to sampling with replacement where order does not
matter, since having 10 copies of each card amounts to replacing the card.
This is done using the Bose-Einstein method.
Thus, the answer is
52 + 10 − 1 61
! !
=
10 10
1.1.12 problem 14
There are 4 choices for sizes and 8 choices for toppings, of which any combi-
nation (including no toppings) can be selected.
CHAPTER 1. PROBABILITY AND COUNTING 10
The total number of possible choices of toppings is 8i=0 8i = 28 = 256.
P
waystosample
1 object from the first set and n − 1 objects from the second
set, 2 n−2 ways to sample 2 objects from the first set and n − 2 objects
n n
1.2.2 problem 18
Consider the right hand side of the equation. Since a committe chair can
only be selected from the first group, there are n ways
to choose them.
Then, for each choice of a committee chair, there are 2n−1
n−1
ways to choose
the remaining members. Hence, the total number of committees is n 2n−1n−1
.
Now consider the left side of the equation. Suppose we pick k people
from the first group and n − k people from the second group, then there are
k ways to assign a chair from the members of the first group we have picked.
CHAPTER 1. PROBABILITY AND COUNTING 11
2
k can range from 1 to n giving us a total of nk=1 k nk n−k
n
= nk=1 k nk
P P
possible committees.
Since, both sides of the equation count the same thing, they are equal.
1.2.3 problem 21
(a) Case 1: If Tony is in a group by himself, then we have to break the
remaining n people into k − 1 groups. This can be done in
( )
n
k−1
ways.
Case 2: If Tony is not in a group by himself, then we first break up the
remaining n people into k groups. Then, Tony can join any of them.
The number of possible groups then is
( )
n
k
k
(b) Say Tony wants to have m in his group. That is to say he does not
want n − m people. These n − m people must then be broken into k
groups.
The number of people Tony wants to join his group can range from 0
to n − k. The reason for the upper bound is that at least k people are
required to make up the remaining k groups.
Taking the sum over the number of people in Tony’s group we get
n−k
!( )
X n n−j
j=0 j k
Now, instead of taking the sum over the number of people Tony wants
in his group, we can equivalently take the sum over the number of
people Tony does not want in his group. Hence,
CHAPTER 1. PROBABILITY AND COUNTING 12
n−k k
!( ) !( )
n n−j n i
=
X X
j=0 j k i=n i k
Since the sum counts all possible ways to group n + 1 people into k + 1
groups, we have
k
n+1
!( ) ( )
n i
=
X
i=n i
k k+1
as desired.
1.2.4 problem 22
(a) Let us count the number of games in a round-robin tournament with
n + 1 participants in two ways.
Method 1: Since every player plays against all other players exactly
once, the problem reduces
to finding the number of ways to pair up
n + 1 people. There are n+1
2
ways to do so.
Method 2: The first player participates in n games. The second one
also participates in n games, but we have already counted the game
against the first player, so we only care about n − 1 games. The third
player also participates in n games, but we have already counted the
games against the first and second players, so we only care about n − 2
games.
In general, player i will participate in n + 1 − i games that we care
about. Taking the sum over i we get
n + (n − 1) + (n − 2) + · · · + 2 + 1
Since both methods count the same thing, they are equal.
An Alternative Approach The RHS expression counts number of
ways to pair two people from a group of n + 1 people of different ages.
Let’s say the eldest person in the subgroup of two is also the eldest
person in the whole group. Then, we would have n people to choose
from as the second person of the sub group. If subgroup’s eldest is
the second-eldest in the whole group, we’d have n − 1 people to choose
from, and so on all the way to 1. By adding these cases, we get the
CHAPTER 1. PROBABILITY AND COUNTING 13
LHS expression which covers all possibilities of pairing two people from
group of n + 1, and hence is equivalent to RHS.
(b) LHS: If n is chosen first, then the subsequent 3 numbers can be any of
0, 1, . . . , n − 1. These 3 numbers are chosen with replacement resulting
in n3 possibilities. Summing over possible values of n we get 13 + 23 +
· · · + n3 total number of possibilities.
RHS: We can count the number of permutations of the 3 numbers
chosen with replacement from a different perspective. The 3 numbers
can either all be distinct, or all be the same, or differ in exactly 1 value.
Case 1: All 3 numbers are distinct.
Selecting 4 (don’t forget the very
first, largest selected number) dis-
tinct numbers can be done in 4 ways. The 3 smaller numbers are
n+1
free to permute amongst themselves. This gives us a total of 6 n+1
4
possibilities.
Case 2: All 3 numbers are the same.
In this case, we have to select 2 digits. The smaller digit will be sampled
3 times and there are no ways to permute identical numbers, so the
number of possiblities is 2 .n+1
(2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9, 10).
For each of this possibilities, there are 3 ways for 1 person to choose
button, 2 for second and 1 for third (3! in total by multiplication rule).
So number of favorable combinations is
7 ∗ 3!
Generally each person have 9 floors to choose from so for 3 people there
are 93 combinations by multiplication rule.
Hence, the probability that the buttons for 3 consecutive floors are pressed
is
7 ∗ 3!
93
1.3.2 problem 26
(a) The problem is isomorphic (having same structure) to the birthday
problem. When sampling with replacement, each person corresponds
to a date in the birthday problem, and the size of sample corresponds
to the number of people in birthday problem. Hence, taking a random
sample of 1000 from a population of a million corresponds to asking a
thousand people their birth date where there are a total of a million
dates. Number of ways to take such a sample is K 1000 where K is size
of population. Similarly, number of ways to take sample without re-
placement corresponds to number of ways of having no birthday match
in that situation: K(K − 1) . . . (K − 1000 + 1)
(b)
K(K − 1) . . . (K − 1000 + 1)
P (A) = 1 − P (Ac ) = 1 −
K 1000
where K = 1000000.
CHAPTER 1. PROBABILITY AND COUNTING 15
1.3.3 problem 27
For each of the k names, we sample a memory location from 1 to n with
equal probability, with replacement. This is exactly the setup of the birthday
problem. Hence, the probability that at least one memory location has more
than 1 value is
n(n − 1) . . . (n − k + 1)
P (A) = 1 − P (Ac ) = 1 −
nk
Also, P (A) = 1 if n < k.
1.3.4 problem 30
Suppose the word consists of 7 letters. Once we choose the first letter, the
seventh one has to be the same. Once we choose the second letter, the sixth
one has to be the same. In general, we are free to choose 4 letters. Hence,
the probability that a 7 letter word is a palindrome is
264 1
=
267 263
If the word consists of 8 letters, then there are 268 possible words, but for
a palindrome, the number of letters we are free to choose is still 4. Hence,
the probability is
264 1
= 4
26 8 26
1.3.5 problem 32
Call the two black cards B1 , B2 and the two red cards R1 , R2 . Since every
configuration of the 4 cards is equally likely, each outcome has a probability
of 24
1
of occurance.
Case 1: j = 0.
If both guesses are incorrect, then both of them are black cards. There
are two choices for the configuration of the black cards and for each, there are
two choices for the configuration of the red cards for a total of 4 possibilities.
4 1
P (j = 0) = =
24 6
Case 2: j = 4
CHAPTER 1. PROBABILITY AND COUNTING 16
Notice that to guess all the cards correctly, we only need to guess correctly
the two red cards, which, by symmetry, is as likely as guessing both of them
wrong.
Hence,
1
P (j = 4) = P (j = 0) =
6
Case 3: j = 2
One of the guesses is red the other is black. Like before, there are two
choices for the red and two choices for the black cards. This undercounts the
possibilities by a factor of 2, since we can switch the places of the red and
the black cards. Hence,
2 2 2
P (j = 2) = + =
6 6 3
Notice that getting both right, none right and one right are all the possible
outcomes. Hence,
P (j = 1) = P (j = 3) = 0
1.3.6 problem 35
We can generate a random hand of 13 cards with the desired property by the
following process:
1.3.7 problem 36
We can think of the problem as sampling with replacement where order
matters.
There are 630 possible sequences of outcomes. We are interested in the
cases where each face of the die is rolled exactly 5 times. Since each sequence
is equally likely,
we
can use the naive definition of probability.
There are 5 ways to select the dice that fall on a 1. Then, 25
30
5
ways
to select the dice falling on a 2, 20
5
falling on a 3, 15
5
falling on a 4, 10
5
falling on a 5 and finally, 55 falling on a 6.
Thus, the desired probability is
30 25 20 15 10 5
5 5 5 5 5 5
630
Alternatively, imagining the sample space to be a 30 digit long sequence
of 1, 2 . . . 6, we want the cases in which each of 1, 2 . . . 6 numbers appear
exactly five times. There are (5!)
30!
6 ways to arrange such a sequence. Hence,
the probability is
30!
(5!)6 630
1.3.8 problem 37
(a) Ignore all the cards except J, Q, K, A. There are 16 of those, 4 of which
are aces. Each card has an equal chance of being first in the list, so the
answer is 41 .
Source: https://math.stackexchange.com/a/3726869/649082
(b) Ignore all the cards except J, Q, K, A. There are 4 choices for a king, 4
choices for a queen and 4 choices for a jack with 3! permutations of the
cards. Then, there are 4 choices for an ace. The remaining 12 cards
3
can be permuted in 12! ways, so the answer is 4 ×3!×4×12!
16!
.
1.3.9 problem 38
(a) There are 12 choices of seats for Tyron and Cersei so that they sit next
to each other (11 cases, where they take i−1 and i positions and 1 case,
CHAPTER 1. PROBABILITY AND COUNTING 18
where they take 1 and 12th position, because table is round). Tyron
can sit to the left or to the right of Cersei. The remaining 10 people
can be ordered in 10! ways, so the answer is
24 × 10! 2
=
12! 11
(b) There are 122
choices of seats to be assigned to Tyron and Cersei,
but only 12 choices where they sit next to each other. Since every
assignment of seats is equally likely the answer is
12 2
=
12 11
2
1.3.10 problem 39
There are a total of 2N K
possible committees of K people. There are Nj
ways to select j couples for the committee. K − 2j people need to be selected
from the remaining N − j couples such that only one person is selected from
a couple. First, we select K − 2j couples from the remaining N − j couples.
Then, for each of the selected couples, there are 2 choices for committee
membership.
N
j
N −j
K−2j
2K−2j
2N
K
1.3.11 problem 40
(a) Counting strictly increasing sequences of k numbers amounts to count-
ing the number of ways to select k elements out of the n, since for
any such selection, there is exactly one increasing ordering. Thus, the
answer is
n
k
nk
(b) The problem can be thought of sampling with replacement where order
doesn’t matter, since there is only one non decreasing ordering of a
given sequence of k numbers. Thus, the answer is
n−1+k
k
nk
CHAPTER 1. PROBABILITY AND COUNTING 19
1.3.12 problem 41
We can treat this problem as sampling numbers 1 to n with replacement
with each number being equally likely. There are nn possible sequences. To
count the number of sequences with exactly one of the numbers missing, we
first select the missing number. There are n ways to do this. The rest of the
numbers have to be sampled at least once with one number being sampled
exactly twice. There are n − 1 choice to select the number that will be
sampled twice. Finally, we have n sampled numbers which can be ordered in
any of n!2 ways, since one of the numbers is repeated. Thus, the answer is
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
P (S) = 1.
(c) Inequality follows directly from the first property of probabilities with
strict equality if and only if P (A ∩ B) = 0.
CHAPTER 1. PROBABILITY AND COUNTING 20
1.4.2 problem 44
Since B = (B − A) ∪ A, P (B) = P (A) + P (B − A) by the second axiom of
probability. Rearranging terms,
P (B − A) = P (B) − P (A)
1.4.3 problem 45
B △ A = (A ∪ B) − (A ∩ B). By problem 44,
P (B △ A) = P (A ∪ B) − P (A ∩ B)
= P (A) + P (B) − P (A ∩ B) − P (A ∩ B)
= P (A) + P (B) − 2P (A ∩ B)
1.4.4 problem 46
Bk = Ck − Ck+1 . Since Ck+1 ⊆ Ck , P (Bk ) = P (Ck ) − P (Ck+1 ).
1.4.5 problem 47
(a) Consider the experiment of flipping a fair coin twice. The sample space
S is {HH, HT, T H, T T }. Let A be the event that the first flip lands
heads and B be the event that the second flip lands heads. P (A∩B) = 41
since A ∩ B corresponds to the outcome HH.
On the other hand, A corresponds to the outcomes {HH, HT } and B
corresponds to the outcomes {HH, T H}. Thus, P (A) = P (B) = 12 .
Since P (A ∩ B) = P (A)P (B), A and B are independent events.
(b) A1 and B1 should intersect such that the ratio of the area of A1 ∩ B1
to the area of A1 equals the ratio of the area of B1 to the area of R.
As a simple, extreme case, if A1 = B1 , then A and B are dependent,
since the condition above is violated.
CHAPTER 1. PROBABILITY AND COUNTING 21
(c)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= P (A) + P (B) − P (A)P (B)
= P (A)(1 − P (B)) + P (B)
= P (A)P (B c ) + P (B)
= P (A)P (B c ) + 1 − P (B c )
= 1 + P (B c )(P (A) − 1)
= 1 − P (B c )P (Ac )
1.5.2 problem 52
Let Ai be the event that the i-th student takes the same seat on both days.
20
The desired probability then is 1−P ( Ai ). By inclusion exclusion principle,
S
i=1
CHAPTER 1. PROBABILITY AND COUNTING 22
20
P( Ai ) = P (Ai ) − P (Ai ∩ Aj ) + P (Ai ∩ Aj ∩ Ak ) − · · · +
S P P P
i i<j i<j<k
i=1
(−1) P (A1 ∩ · · · ∩ A20 ),
21
20 20
1 1 1 1
P( Ai ) = + − ··· +
[ X X X
−
i=1 i=1 20 1≤i<j≤20 20 ∗ 19 1≤i<j<k≤20 20 ∗ 19 ∗ 18 20!
20 1 20 1 1
! !
=1− + − ··· +
2 20 ∗ 19 3 20 ∗ 19 ∗ 18 20!
1 1 1
= 1 − + − ··· +
2! 3! 20!
≈1−e −1
1.5.3 problem 53
(a) 628 − 368
(c) 628 − 368 − 368 − 108 + 2(628 − 368 − 528 + 268 ) + 628 − 368 − 368 + 108
1.5.4 problem 55
(153)(222)
(a)
(375)
(375)−(275)−(255)−(225)+(155)+(105)+(125)
(b)
(375)
(b) <
CHAPTER 1. PROBABILITY AND COUNTING 23
(c) =
We are interested in two outcomes of the samme sample space. This is,
S = {(a1 , a2 , a3 ) : ai ∈ {1, 2, 3, ..., 365}} The first outcome is (1, 1, 1),
and the second outcome is (1, 2, 3). The answer follows, since every
outcome of the sample space is equally likely.
(d) <
If the first toss is T , Martin can never win, since as soon as H is seen
on any subsequent toss, the game stops, and Gale is awarded the win.
If the first toss is H, then if the second toss is also H, Martin wins. Oth-
erwise, if the second toss is T , Gale wins, since as soon as a subsequent
toss shows H, Gale is awarded a win.
Thus, Martin loses 3
4
of the time.
1.6.2 problem 57
S22
10
The desired event can be expressed as Ai , where Ai is the event that
i=1
the i-th molecule in my breath is shared with Caesar. We can compute the
desired probability using inclusion exclusion.
Since every molecule in the universe is equally likely to be shared with
Caesar, and we assume our breath samples molecules with replacement,
n
P ( Ai ) = ( 10122 )n .
T
i=1
Thus,
22 22
10 10
1
i
P( Ai ) = (−1)i+1
[ X
1.6.3 problem 58
Explanation: https://math.stackexchange.com/questions/1936525/inclusion-
exclusion-problem
CHAPTER 1. PROBABILITY AND COUNTING 24
1.6.4 problem 59
(a) 15+9
9
(b) 5+9
9
1.6.5 problem 60
(a) nn
(b) 2n−1
n−1
1.6.6 problem 62
(a) 1 − k!ek (→
−
p)
CHAPTER 1. PROBABILITY AND COUNTING 25
(b) Consider the extreme case where p1 = 1 and pi = 0 for i ̸= 1. Then, the
probability that there is at least one birthday match is 1. In general, if
1
pi > 365 for a particular i, then a birthday match is more likely, since
that particular day is more likely to be sampled multiple times. Thus,
it makes intuitive sense that the probability of at least one birthday
match is minimized when pi = 365 1
.
(c) First, consider ek (x1 , ..., xn ). We can break up this sum into the sum
of three disjoint cases.
(a) Sum of terms that contain both x1 and x2 . This sum is given by
x1 x2 ek−2 (x3 , ..., xn )
(b) Sum of terms that contain either x1 or x2 but not both. This sum
is given by (x1 + x2 ) ek−1 (x3 , ..., xn )
(c) Sum of terms that don’t contain either x1 or x2 . This sum is give
by ek (x3 , ..., xn )
Thus,
ek (x1 , ..., xn ) = x1 x2 ek−2 (x3 , ..., xn )+(x1 +x2 )ek−1 (x3 , ..., xn )+ek (x3 , ..., xn )
Conditional Probability
P (S) = 0.216
P (C|S) = 23P (C|S c )
1
P (C|S c ) = P (C|S)
23
26
CHAPTER 2. CONDITIONAL PROBABILITY 27
P (S)P (C|S)
P (S|C) =
P (C)
P (S)P (C|S)
=
P (S)P (C|S) + P (S c )P (C|S c )
P (S) · 23P (C|S c )
=
P (S) · 23P (C|S c ) + P (S c )P (C|S c )
P (S)
=
23P (S) + P (S c )
23 · 0.216
=
23 · 0.216 + 0.784
≈ 0.864
2.1.2 problem 4
(a)
P (K)P (R|K)
P (K|R) =
P (R)
P (K)P (R|K)
=
P (K)P (R|K) + P (K c )P (R|K c )
p
=
p + (1 − p) n1
2.1.3 problem 5
By symmetry, all 50 of the remaining cards are equally likely. Thus, the
probability that the third card is an ace is 50
3
.
We can reach the same answer using the definition of conditional prob-
ability. Let A be the event that the first card is the Ace of Spades, B be
the event that the second card is the 8 of Clubs and C be the event that the
CHAPTER 2. CONDITIONAL PROBABILITY 28
P (C, A, B) 3∗49!
3
P (C|A, B) = = 52!
=
P (A, B) 50!
52!
50
2.1.4 problem 6
Let H be the event that 7 tosses of a coin land Heads. Let A be the event
that a randomly selected coin is double-headed.
P (A)P (H|A) 1
P (A|H) = = 100
P (A)P (H|A) + P (Ac )P (H|Ac )
7
1
100
+ 99
100
∗ 1
2
2.1.5 problem 7
(a)
P (D)P (H|D)
P (D|H) =
P (H)
P (D)P (H|D)
=
P (D)P (H|D) + P (Dc )P (H|Dc )
7
(
1 1
2 100
+ 99
100 2
1
)
= 7 7
(
1 1
2 100
+ 99 1
100 2
) + 1 1
2 2
= 0.69
2.1.6 problem 8
Let A1 be the event that the screen is produced by company A, B1 be the
event that the screen is produced by company B, and C1 be the event that
CHAPTER 2. CONDITIONAL PROBABILITY 29
the screen is produced by company C. Let D be the event that the screen is
defective.
P (A1 )P (D|A1 )
P (A1 |D) =
P (A1 )P (D|A1 ) + P (Ac1 )P (D|Ac1 )
P (A1 )P (D|A1 )
=
P (A1 )P (D|A1 ) + P (A1 )(P (B1 |Ac1 )P (D|B1 , Ac1 ) + P (C1 |Ac1 )P (D|C1 , Ac1 ))
c
0.5 ∗ 0.01
=
0.5 ∗ 0.01 + 0.5 ∗ (0.6 ∗ 0.02 + 0.4 ∗ 0.03)
= 0.29
2.1.7 problem 9
(a) P (A1 |B) = P (A1 )P (B|A1 )
P (B)
= P (A1 )
P (B)
= P (A2 ) P (A2 )P (B|A2 )
P (B) P (B)
= P (A2 |B).
2.1.8 problem 10
(a)
P (A3 |A1 ) = P (A2 |A1 )P (A3 |A2 , A1 ) + P (Ac2 |A1 )P (A3 |Ac2 , A1 )
= 0.8 ∗ 0.8 + 0.2 ∗ 0.3 = 0.7
(b)
P (A3 |Ac1 ) = P (A2 |Ac1 )P (A3 |A2 , Ac1 ) + P (Ac2 |Ac1 )P (A3 |Ac2 , Ac1 )
= 0.3 ∗ 0.8 + 0.7 ∗ 0.3 = 0.45
CHAPTER 2. CONDITIONAL PROBABILITY 30
2.1.9 problem 11
Using the odds form of Baye’s Theorem,
P (A) = 0.39
2.1.10 problem 12
(a) Let Ai be the event that Alice sends bit i. Let Bj be the event that
Bob recieves bit j.
(b) Let Bj,k,l be the event that Bob recieves bit tuple j, k, l.
2.1.11 problem 13
(a) Let B be the event that the test done by company B is successfull. Let
A be the event that the test done by company A is successfull. Let D
be the event that a random person has the disease.
(b) Since the disease is so rare, most people don’t have it. Company B
diagnoses them correctly every time. However, in the rare cases when
a person has the disease, company B fails to diagnose them correctly.
Company A however shows a very good probability of an accurate
diagnoses for afflicted patients.
(c) If the test conducted by company A has equal specifity and sensitivity,
then it’s accuracy surpasses that of company B’s test if the specifity
and the sensitivity are larger than 0.99. If company A manages to
achieve a specifity of 1, then any positive sensitivity will result in a
more accurate test. If company A achieves a sensitivity of 1, it still
requires a specificity larger than 0.98, since positive cases are so rare.
2.1.12 problem 14
(a) Intuitively, P (A|B) > P (A|B c ), since Peter will be in a rush to install
his alarm if he knows that his house will be burglarized before the end
of next year.
CHAPTER 2. CONDITIONAL PROBABILITY 32
(b) Intuitively P (B|Ac ) > P (B|A), since Peter is more likely to be robbed
if he doesn’t have an alarm by the end of the year.
2.1.13 problem 15
Given the inequailities and the fact that P (A∩B) = P (A)+P (B)−P (A∪B),
to maximize P (A ∩ B) we maximize the smallest of the three expressions.
Namely, P (A). Thus, we would like to know that event A occured.
2.1.14 problem 16
P (A) = P (B)P (A|B) + P (B c )P (A|B c ).
Given P (A|B) ≤ P (A), if P (A|B c ) < P (A), then the right hand side
of the equation above is strictly less than the left hand side, and we have a
contradiction.
We can intuitively think of this problem as asking "How likely is X to be
elected as president?" and hearing "It depends" in response. The implication
is that there exists some latent event (major states vote against X) that
reduces the chances of X getting elected, and if we know that the former
does not occure, the chances of X getting elected improve.
2.1.15 problem 17
(a) P (B|A) = P (B)P (A|B)
P (B)P (A|B)+P (B c )P (A|B c )
= 1 =⇒ P (B c )P (A|B c ) = 0.
Since P (B c ) ̸= 0 by assumption, P (A|B c ) = 0 =⇒ P (Ac |B c ) = 1.
For example, consider a deck of 52 cards, where all but one of the cards
are the Queen of Spades. Let A be the event that the first turned card
is a Queen of Spades, and let B be the event that the second turned
card is a Queen of Spades, where sampling is done with replacement.
Then, P (A) = P (B) ≈ 1. Then, by independence, P (A|B c ) ≈ 1 =⇒
P (Ac |B c ) ≈ 0.
2.1.16 problem 18
P (B) = P (A ∩ B) + P (Ac ∩ B).
P (Ac ∩ B) = P (Ac )P (B|Ac ) = 0, since P (Ac ) = 0.
Thus, P (B) = P (A ∩ B) = P (B)P (A|B) =⇒ P (A|B) = 1.
2.1.17 problem 19
See https://math.stackexchange.com/q/3292400/649082
2.1.18 problem 20
(a) Since the second card is equally likely to be any of the remaining 3
cards, the probability that both cards are queens is 13 .
(b) Our sample space now consists of all order pairs of the two queens and
the two jacks, where at least one card is a queen. Since all the outcomes
are equally likely, the answer is 10
2
= 15 .
(c) Now, the sample space consists of all order pairs of the two queens and
the two jacks, where one of the cards is the Queen of Hearts. Thus,
the answer is 26 = 13 .
2.1.19 problem 21
(a) The sample space is (H, H, H), (H, H, T ), (H, T, H), (T, H, H). Since
each outcome is equally likely, the answer is 41 .
(b) Since the last throw is independent of the first two, the probability that
all three throws landed heads given two of them landed heads equals
the probability that the third throw landed heads, which is 12 .
CHAPTER 2. CONDITIONAL PROBABILITY 34
2.1.20 problem 27
Let G be the event that the suspect is guilty. Let T be the event that one of
the criminals has blood type 1 and the other has blood type 2.
Thus,
P (G)P (T |G) pp2 p
P (G|T ) = = =
P (G)P (T |G) + P (G )P (T |G )
c c pp2 + (1 − p)2p1 p2 p + 2p1 (1 − p)
For P (G|T ) to be larger than p, p1 has to be smaller than 12 . This result
makes sense, since if p1 = 21 , then half of the population has blood type 1,
and finding it at the crime scene gives us no information as to whether the
suspect is guilty.
2.1.21 problem 28
P (D) P (T |D)
(a) P (D|T )
P (Dc |T )
= P (Dc ) P (T c |Dc )
.
(b) Suppose our population consists of 10000 people, and only one percent
of them is afflicted with the disease. So, 100 people have the disease
and 9900 people don’t. Suppose the specificity and sensitivity of our
test are 95 percent. Then, out of the 100 people who have the disease,
95 test positive and 5 test negative, and out of the 9900 people who do
not have the disease, 9405 test negative and 495 test positive.
Thus, P (D|T ) = 95
95+495
.
Here, we can see why specificity matters more than sensitivity. Since,
the disease is rare, most people do not have it. Since specificity is
measured as a percentage of the population that doesn’t have the dis-
ease, small changes in specificity equate to much larger changes in the
number of people than in the case of sensitivity.
2.1.22 problem 29
Let Gi be the event that the i-th child is a girl. Let Ci be the event that the
i-th child has property C.
0.25(2p−p2 )
P (G1 ∩ G2 |(G1 ∩ C1 ) ∪ (G2 ∩ C2 )) = 0.5p+0.5p−0.25p 2 = 2−0.5p = 4−p .
0.5(2−p) 2−p
This result confirms the idea that the more rare characteristic C is, the
closer we get to specifying which child we mean when we say that at least
one of the children has C.
CHAPTER 2. CONDITIONAL PROBABILITY 35
2.2.2 problem 34
(a) A and B are not independent, since knowing that A occured makes Gc
more likely, which in turn makes B makes more likely.
P (G)P (Ac |G)
(b) P (G|Ac ) = P (G)P (Ac |G)+P (Gc )P (Ac |Gc )
= g(1−p1 )
g(1−p1 )+(1−g)(1−p2 )
2.2.3 problem 36
(a) Since any applicant who is good at baseball is accepted to the college,
the proportion of admitted students good at baseball is higher than the
proportion of applicants good at baseball, because applicants include
people who aren’t good at either math or baseball.
2.2.4 problem 37
See https://math.stackexchange.com/a/3789043/649082
CHAPTER 2. CONDITIONAL PROBABILITY 36
2.2.5 problem 38
Let S be the event that an email is spam. Let L = W1c , ..., W22
c c
, W23 , W24 c
, ..., W64 c
, W64 , W65 , W66 , ..., W
Let q = j (1 − pj ) where 1 ≤ j ≤ 100 : j ∈ / 23, 64, 65.
Q
2
P (G1 )P (D2 , G2 |G1 ) = (P (G2 |G1 )P (D2 |G1 , G2 ))
3
2 1 1
= (p + (1 − p) )
3 2 2
2 1 1 1
= p+ − p
3 2 4 4
1
= (p + 1).
6
Thus,
(p
+ 1)
1
P (G1 |D2 , G2 ) = 6
1
6
(p
+ 1) + 13 × 1
2
p+1
= .
p+2
Note that when p = 1, the result matches that of the basic Monty Hall
problem.
CHAPTER 2. CONDITIONAL PROBABILITY 37
2.3.2 problem 42
Let Gi be the event that the i-th door contains a goat, and let Di be the
event that Monty opens door i.
Let S be the event of success under the specified strategy.
(a)
Note that when p = 1, the problem reduces to the basic Monty Hall
problem, and we get the correct solution 23 . In the case when p =
0, Monty never gives the contestant a chance to switch their initial,
incorrect choice to the correct one, resulting in a definite failure under
the specified strategy.
(b)
Note that if p = 1, the problem reduces to the basic Monty Hall prob-
lem, and the solution matches that of the basic, conditional Monty Hall
problem. If p = 0 on the other hand, then the reason Monty has opened
a door is because the contestant’s initial guess (Door 1) is correct. By
choosing the strategy to switch, the contestant always loses.
CHAPTER 2. CONDITIONAL PROBABILITY 38
2.3.3 problem 43
Let Ci be the event that Door i contains the car. Let Di be the event that
Monty opens Door i. Let Oi be the event that Door i contains the computer,
and let Gi be the event that Door i contains the goat.
(a)
3
∗ 2 + 3 P (G2 |C3c )P (D2 |G2 , C3c )
1
∗ 12
= 3
1
3
∗ 1
2
+ 23 ∗ 1
4
1
= 6
1
6
+ 1
6
1
= .
2
(b)
3
P (O2 |C3 )P (D2 |O2 , C3 ) + P (C3c )P (D2 , O2 |C3c )
1
∗ 12 ∗ p
= 3
1
3
∗ 21 ∗ p + 23 ∗ 14 ∗ q
1
p
= 6
+ 16 q
1
6
p
p
=
p + (1 − p)
= p.
CHAPTER 2. CONDITIONAL PROBABILITY 39
2.3.4 problem 44
Let Gi be the event that the i-th door contains a goat, and let Di be the
event that Monty opens door i. Let S be the event that the contestant is
successful under his strategy.
(a) There are two scenarios which result in the contestant selecting door
3 and Monty opening door 2. Either the car is behind door 3 and
Monty randomly opens door 2, or doors 3 and 2 contain goats, and
Monty opens door 2. Only the latter scenario results in a win for the
contestant.
Thus,
(p1 + p2 ) p1p+p
1
p1
P (S|D2 , G2 ) = 2
= .
p3 12 + (p1 + p2 ) p1p+p
1
2
p1 + 12 p3
(b) We can slightly modify the scenario in part a where doors 3 and 2
contain goats by multiplying the probability of the scenario by 21 to
accomodate the chance that Monty might open the door with the car
behind it.
1
(p + p2 ) p1p+p
2 1
1 1
p
2 1 p1
P (S|D2 , G2 ) = 2
= = .
p3 2 + 2 (p1 + p2 ) p1p+p
1 1 1
2
p 1
p
2 1
+ 1
p
2 3
p1 + p 3
(c)
p3
P (S|D2 , G2 ) = .
p3 + 12 p1
(d)
p3
P (S|D2 , G2 ) = .
p3 + p 1
2.3.5 problem 45
(a) Since the prizes are independent for each door, and since the strategy
is switch doors every time, what is behind Door 1 is irrelevant.
Possible outcomes for doors 2 and 3 are Goat and Car with probability
2pq, in which case the contestant wins, Car and Car with probability
CHAPTER 2. CONDITIONAL PROBABILITY 40
p2 , in which case the contestant wins again, and Goat and Goat with
probability q 2 , in which case the contestant loses.
Thus,
p2 + 2pq p2 + 2pq
P (S) = = = p2 + 2pq.
p2 + 2pq + q 2 (p + q)2
(b) There are two scenarios in which Monty opens Door 2. Either Door 3
contains a Car and Door 2 contains a Goat, which happens with prob-
ability pq, or both doors contain Goats and Monty randomly chooses
to open Door 2, which happens with probability 21 q 2 . Contestant wins
in the first case and loses in the second case.
Thus,
pq
P (S|D2 , G2 ) = .
pq + 12 q 2
2.3.6 problem 46
Let S be the event of successfully getting the Car under the specified strategy.
Let Ci be the event that Door i contains the Car. Let A be the event that
Monty reveals the Apple, and let Ai be the event that Door i contains the
Apple.
(a)
P (S) = P (S ∩ C1 ) + P (S ∩ C2 ) + P (S ∩ C3 ) + P (S ∩ C4 )
= P (C1 )P (S|C1 ) + P (C2 )P (S|C2 ) + P (C3 )P (S|C3 ) + P (C4 )P (S|C4 )
1 1 1
= ∗ 0 + 3 ∗ ∗ (p + q)
4 4 2
1 1
=3∗ ∗
4 2
3
=
8
CHAPTER 2. CONDITIONAL PROBABILITY 41
(b)
P (A) = P (A ∩ G1 ) + P (A ∩ A1 ) + P (A ∩ B1 ) + P (A ∩ C1 )
= P (G1 )P (A|G1 ) + P (A1 )P (A|A1 ) + P (B1 )P (A|B1 ) + P (C1 )P (A|C1 )
1 1 1
= p+0+ q+ q
4 4 4
1
= (1 + q)
4
(c)
P (S ∩ A) 1
∗ p ∗ 12 + 41 ∗ q ∗ 1 1
P (S|A) = = 4 2
= 8
P (A) 1
4
(1 + q) 1
4
(1 + q)
2.3.7 problem 47
(a) Contestant wins under the "stay-stay" strategy if and only if the Car is
behind Door 1.
1
P (S) =
4
(b) If the Car is not behind Door 1, Monty opens one of the two doors
revealing a Goat. Contestant stays. Then, Monty opens the other
door with a Goat behind it. Finally, contestant switches to the Door
concealing the Car.
3 3
P (S) = 0 + ∗1=
4 4
(c) Under the "switch-stay" strategy, if the Car is behind Door 1 the con-
testant loses. Given that the Car is not behind Door 1, Monty opens
one of the Doors containing a Goat. The contestant will win if they
switch to the Door containing the Car and will lose if they switch to
the Door containing the last remaining Goat.
Thus,
3 1 3
P (S) = P (C1 )P (S|C1 ) + P (C1c )P (S|C1c ) = 0 + ∗ =
4 2 8
CHAPTER 2. CONDITIONAL PROBABILITY 42
(d) Under the "switch-switch" strategy, if the car is behind Door 1, then
Monty opens a door with a Goat behind it. The contestant switches
to a door with a Goat behind it. Monty then opens the last door
containing a Goat, at which point the contestant switches back to the
door containing the Car.
If Door 1 contains a Goat, Monty opens another Door containing a Goat
and presents the contestant with a choice. If the contestant switches
to the remaining door containing a Goat, then Monty is forced to open
Door 1, revealing the final Goat. The contestant switches to the one
remaining Door which contains the Car. If, on the other hand, the con-
testant switches to the door containing the Car, then on the subsequent
switch they lose the game.
Thus,
1 3 1 5
P (S) = ∗1+ ∗ =
4 4 2 8
(e) "Stay-Switch" is the best strategy.
P (A2 ) = p1 p2 + q1 q2
1 1
= (1 − q1 )(1 − q2 ) + b1 + b2 +
2 2
1 1 1 1
= b1 − b2 − + b1 + b2 +
2 2 2 2
1
= + 2b1 b2
2
Suppose the statement holds for all n ≤ k − 1. Let Si be the event that
the i-th trial is a success.
(c) if pi = 1
2
for some i, then bi = 0 and P (An ) = 12 .
if pi = 0 for all i, then bi = 12 for all i. Hence, the term 2k−1 b1 b2 ...bk−1 bk
equals 21 . Thus, P (An ) = 1. This makes sense since the number of
successes will be 0, which is an even number.
if pi = 1 for all i, then bi = − 21 for all i. Hence, the term 2k−1 b1 b2 ...bk−1 bk
will either equal to 12 or − 12 depending on the parity of the number of
trials. Thus, P (An ) is either 0 or 1 depending on the parity of the
number of trials.
This makes sense since, if every trial is a success, the number of suc-
cesses will be even if the number of trials is even. The number of
successes will be odd otherwise.
2.4.2 problem 52
The problem is equivalent to betting 1 increments and having A start with
ki dollars, while B starts with k(N − i) dollars.
Thus, p < 21 ,
ki
1− q
p
pi = kN .
1− q
p
CHAPTER 2. CONDITIONAL PROBABILITY 44
Note that,
ki ki−1
1− q
p
−ki q
p i 1
lim kN = lim kN −1 = lim k(N −i) = 0.
k→∞
1− q k→∞
−kN q N k→∞ q
p p p
This result makes sense, since p < 21 implies that A should lose a game
with high degree of certainty over the long run.
2.4.3 problem 53
See https://math.stackexchange.com/a/2706032/649082
2.4.4 problem 54
(a) pk = ppk−1 + qpk+1 with boundary condition p0 = 1.
(b) Let Aj be the event that the drunk reaches k before reaching −j. Then,
Aj ⊆ Aj+1 since to reach −(j + 1) the drunk needs to pass −j. Note
∞
that Aj is equivalent to the event that the drunk ever reaches k,
S
j=1
since the complement of this event, namely the event that the drunk
reaches −j before reaching k for all j implies that the drunk never has
the time to reach k.
∞
By assumption, P ( Aj ) = limn→+∞ P (An ). P (An ) can be found as a
S
j=1
result of a gambler’s ruin problem.
If p = 21 ,
n
P (An ) = → 1.
n+k
If p > 21 , n
1− q
p
P (An ) = n+k → 1.
1− q
p
If p < 12 , n
1− q !k
p p
P (An ) = n+k → .
1− q q
p
CHAPTER 2. CONDITIONAL PROBABILITY 45
which is 20
13
.
(b) We can imagine that it is much more difficult to get a green gummi
bear out of a jar with subscript 1 than it is out of a jar with subscript
2. C jars have a lower overall success rate, because most of their green
gummi bears are in C1 , which is harder to sample from compared to
the jars with subscript 2.
Let A be the event that a sampled gummi bear is green. Let B be the
event that the jar being sampled from is an M jar. Let C be the event
that the jar being sampled from has subscript 1.
Then, by Simpson’s Paradox, P (A|B, C) < P (A|B c , C), P (A|B, C c ) <
P (A|B c , C c ), however, P (A|B) > P (A|B c ).
2.5.2 problem 58
(a) If A and B are independent, then
Since P (A|B, C) > P (A|B c , C) and P (A|B, C c ) > P (A|B c , C c ), P (A|B) >
P (A|B c ), so Simpson’s Paradox does not hold.
(a)
P (D)P (T |D)
P (D|T ) =
P (T )
p(P (A|D)P (T |D, A) + P (B|D)P (T |D, B))
=
P (A)P (T |A) + P (B)P (T |B)
p 1
a
2 1
+ 12 b1
= 1
2
(P (D|A)P (T |D, A) + P (Dc |A)P (T |Dc , A)) + 21 (P (D|B)P (T |D, B) + P (Dc |B)P
1
p(a1+ b1 )
= 2
1
2
(pa1 + (1 − p)(1 − a2 )) + 21 (pb1 + (1 − p)(1 − b2 ))
(b)
P (A)P (T |A)
P (A|T ) =
P (T )
1
(pa1 + (1 − p)(1 − a2 ))
= 2
P (A)P (T |A) + P (B)P (T |B)
1
(pa1 + (1 − p)(1 − a2 ))
= 1 2
2
(pa1 + (1 − p)(1 − a2 )) + 12 (pb1 + (1 − p)(1 − b2 ))
CHAPTER 2. CONDITIONAL PROBABILITY 47
2.6.2 problem 61
(a)
n
P (D)P ( Ti |D)
T
n
P (D| Ti ) = i=1
\
n
P( Ti )
T
i=1
i=1
p ni=1 a
Q
=
+q
Qn Qn
p i=1 a i=1 b
n
pa
=
pan + qbn
(b)
n
P (D)P ( Ti |D)
T
n
P (D| Ti ) = i=1
\
n
P( Ti )
T
i=1
i=1
n n
p(P (G)P ( Ti |D, G) + P (Gc )P ( Ti |D, Gc ))
T T
= i=1
n
i=1
P( Ti )
T
i=1
p( 12 + a )
1 n
= n
2 0
n
P (G)P ( Ti |G) + P (Gc )P ( Ti |Gc )
T T
i=1 i=1
p( 12 + 21 an0 )
= n n
P (G)(P (D|G)P ( Ti |D, G) + P (Dc |G)P ( Ti |Dc , G)) + P (Gc )(P (D|Gc )P (
T T
i=1 i=1 i
p( 21 + 12 an0 )
= 1
2
+ 12 (pan0 + (1 − p)bn0 )
p(1 + an0 )
=
1 + pan0 + (1 − p)bn0
2.6.3 problem 62
Let D be the event that the mother has the disease. Let Ci be the event that
the i-th child has the disease.
CHAPTER 2. CONDITIONAL PROBABILITY 48
(a)
P (C1c ∩ C2c ) = P (D)P (C1c ∩ C2c |D) + P (Dc )P (C1c ∩ C2c |Dc )
1 1 2
= ∗ +
3 4 3
9
=
12
(b) The two events are not independent. If the elder child has the disease,
the mother has the disease, which means the younger child has prob-
ability 21 of having the disease. Unconditionally, the younger child has
probability 16 of having the disease.
(c)
2.6.4 problem 63
This problem is similar to the variations on example 2.2.5 (Two Children) in
the textbook.
It is true that conditioned on specific two of the three coins matching,
the probability of the third coin matching is 12 , but the way the problem
statement is phrased, at least two of the coins match. According to the Two
Children problem, the result is no longer 21 . In fact, the probability of all the
coins matching given at least two match is 14 .
2.6.5 problem 64
Let Ri , Gi , and Bi be the events that the i-th drawn ball is red, green or
blue respectively. Let A be the event that a green ball is drawn before a blue
ball.
CHAPTER 2. CONDITIONAL PROBABILITY 49
(a) Note that if a red ball is drawn, it is placed back, as if the experiment
never happened. Draws continue until a green or a blue ball is drawn.
The red balls are irrelevant in the experiment. Thus, the problem
reduces to removing all the red balls, and finding the probability of the
first, randomly drawn ball being green.
Thus,
g g
P (A) = = .
1−r g+b
(b) We are interested in draws in which the first ball is green. Each com-
pleted sequence of g + b + r draws is equally likely. Since the red balls
are once again irrelevant, we focus on the g + b draws of green or blue
balls.
Thus,
g+b−1
g−1 g
P (A) = = .
g+b
g+b
g
(c) Let Ai,j be the event that type i occurs before type j. Generalizing
part a, we get
pi
P (Ai,j ) = .
pi + pj
2.6.6 problem 65
(a) All (n+1)! permutations of the balls are equally likely, so the probability
that we draw the defective ball is n+11
irrespective of when we choose
to draw.
(b) Consider the extreme case of the defective ball being super massive
(v >> nw). Then, it is more likely that a person draws the defective
ball rather than a non defective ball, so we want to draw last. On the
CHAPTER 2. CONDITIONAL PROBABILITY 50
other hand, if v is much smaller than nw, then, at any stage of the
experiment, drawing the defective ball is less likely than not, but after
each draw of a non defective ball, the probability of it being drawn
increases since there are less balls left in the urn. Thus, we want to be
one of the first ones to draw.
So the answer depends on the relationship of w and v.
2.6.7 problem 66
Let Si,k be the event that sum after i rolls of the die is k. Let l denote the
roll after which k ≥ 100. Let Xi be the event that the die lands on i.
99
1 X99
P (Sl,100 ) = P (Sl−1,i )P (X100−i |Sl−1,i ) = P (Sl−1,i )
X
i=94 6 i=94
99
1 X99
P (Sl,101 ) = P (Sl−1,i )P (X101−i |Sl−1,i ) = P (Sl−1,i )
X
i=95 6 i=95
99
1 X99
P (Sl,102 ) = P (Sl−1,i )P (X102−i |Sl−1,i ) = P (Sl−1,i )
X
i=96 6 i=96
99
1 X99
P (Sl,103 ) = P (Sl−1,i )P (X103−i |Sl−1,i ) = P (Sl−1,i )
X
i=97 6 i=97
99
1 X99
P (Sl,104 ) = P (Sl−1,i )P (X104−i |Sl−1,i ) = P (Sl−1,i )
X
i=98 6 i=98
99
1 X99
P (Sl,105 ) = P (Sl−1,i )P (X105−i |Sl−1,i ) = P (Sl−1,i )
X
i=99 6 i=99
Thus, Sl,100 is the most likely.
2.6.8 problem 67
(a) Unconditionally, each of the c + g + j donuts is equally likely to be the
last one. Thus, the probability that the last donut is a chocolate donut
is c+g+j
c
.
CHAPTER 2. CONDITIONAL PROBABILITY 51
(b) We are interested in the event that the last donut is chocolate and the
last donut that is either glazed or jelly is jelly. The probability that
the last donut is chocolate is c+g+j
c
. Since any ordering of glazed and
jelly donuts is equally likely, the probability that the last one is a jelly
donut is g+j
j
. Thus, the probability of the desired event is c+g+j
c j
∗ g+j .
2.6.9 problem 68
(a)
P (D|C) P (Dc |C c )
OR = ∗
P (Dc |C) P (D|C c )
Since the disease is rare among both exposed and not exposed groups,
P (Dc |C) ≈ 1 and P (Dc |C c ) ≈ 1. Thus,
P (D|C)
OR ≈ = RR
P (D|C c )
(b)
P (C, D)P (C c , Dc ) P (C)P (D|C)P (C c )P (D|C c )
= = OR
P (C, Dc )P (C c , D) P (C)P (Dc |C)P (C c )P (D|C c )
(c) Since P (C, D) also equals P (D)P (C|D), reversing the roles of C and
D in part b gives the result.
2.6.10 problem 69
(a)
y = dp + (1 − d)(1 − p)
(b) The worst choice for p is 21 , because then the fraction of "yes" responses
is 12 irrespective of the fraction of drug users. In other words, the
number of "yes" responses tells us nothing.
A person who has not used drugs says "yes" only in the case that they
get a "I was born in winter" slip and they were, in fact, born in winter.
1 1
y = d(p + (1 − p)) + (1 − d)(1 − p)
4 4
Thus,
4y + p − 1
d=
2p
2.6.11 problem 70
Let F be the event that the coin is fair, and let Hi be the even that the i-th
toss lands Heads.
(a) Both Fred and his friend are correct. Fred is correct in that the proba-
bility of there being no Heads in the entire sequence is very small. For
example, there are 92 45
sequences with 45 Heads and 47 Tails, but only
1 sequence of all Heads.
On the other hand, Fred’s friend is correct in his assessment that any
particular sequence has the same likelihood of occurance as any other
sequence.
(b)
92
1
P (F )P (H1≤i≤92 |F ) p 2
P (F |H1≤i≤92 ) = = 92
P (F )P (H1≤i≤92 |F ) + P (F )P (H1≤i≤92 |F )
c c
p 21 + (1 − p)
92
(c) For P (F |H1≤i≤92 ) to be larger than 21 , p must be greater than 292 2
+1
,
which is approximately equal to 1, where as for P (F |H1≤i≤92 ) to be less
92
than 201
, p must be less than 2922+19 , which is also approximately equal
to 1. In other words, unless we know for a fact that the coin is fair, 92
Heads in a row will convince us otherwise.
CHAPTER 2. CONDITIONAL PROBABILITY 53
2.6.12 problem 71
(a) To have j toy types after sampling i toys, we either have j −1 toy types
after sampling i−1 toys, and the i-th toy is of a previously unseen type,
or, we have j toy types after sampling i − 1 toys, and the i-th toy has
an already seen type.
Thus,
n−j+1 j
pi,j = pi−1,j−1 + pi−1,j
n n
(b) Note that p1,0 = 0, p1,1 = 1 and pi,j = 0 for j > i. Using strong
induction, a proof of the recursion in part a follows.
2.6.13 problem 72
(a)
pn = an a + (1 − an )b = (a − b)an + b
an+1 = an a + (1 − an )(1 − b) = an (a + b − 1) + 1 − b
(b)
pn+1 = (a − b)an+1 + b
pn+1 = (a − b)((a + b − 1)an + 1 − b) + b
!
pn − b
pn+1 = (a − b) (a + b − 1) +1−b +b
a−b
pn+1 = (a + b − 1)pn + a + b − 2ab
(c) Let p = limn→∞ pn . Taking the limit of both sides of the result of part
b, we get
p = (a + b − 1)p + a + b − 2ab
a + b − 2ab
p=
2 − (a + b)
CHAPTER 2. CONDITIONAL PROBABILITY 54
2.6.14 problem 74
a. See the first paragraph of part d.
b.
48
j−1
∗ (j − 1)! ∗ 4 ∗ 3 ∗ (52 − j − 1)! 3
P (B|Cj ) = P (B∧Cj )/P (Cj ) = =
52 − j
48
j−1
∗ (j − 1)! ∗ 4 ∗ (52 − j)!
The first equality comes from Bayes’ theorem. The i−1
48
∗ (i − 1)! terms
come from the ways to have the first j-1 cards be non-aces. 4*3 refers to
combinations of 2 adjacent aces in the numerator. The ending factorials in
the numerator and denominator come from ordering the rest of the cards.
c. We have
48
j−1
∗ (j − 1)! ∗ 4 ∗ (52 − j)!
P (Cj ) =
52!
With the LOTP, part b, and the power of R, we can compute
49
P (B) = (P (B|Cj ) ∗ P (Cj ))
X
i=1
d. Argument by symmetry: Consider the events "the first card after the
first ace is an ace" and "the last card after the first ace is an ace". The second
event is equivalent to the last card in the deck being an ace. In addition,
the two events must have the same probability, as every card drawn after the
first ace is equally likely to be an ace. Therefore, the probability of the first
event is 1/13.
Consider the ace of hearts and the ace of spades. The probability that
the ace of hearts is the first ace to appear followed immediately by the ace
of spades (call this event A) is the probability that they appear adjacent to
each other in that order (call this event B) and that those two aces appear
before the other two aces (call this event C).
We have
P (B ∧ C) = P (C|B) ∗ P (B)
CHAPTER 2. CONDITIONAL PROBABILITY 55
Now, to compute P (C|B) consider that if the aces of hearts and of spades
appear adjacent to each other in that order, we can consider them as a card
"glued together". There are 3!=6 possible orderings of the glued together
card and the other 2 aces - in 2 of them, the glued together card is first. So
P (C|B) = 1/3.
Now, instead of the aces of hearts and spades specifically, consider there
are 12 possible pairs of aces that can be adjacent to each other. Then the
total probability that the card after the first ace is another ace is
3.1.2 problem 2
(a) Since the trials are independent, the probability that the first k − 1
trials fail is ( 12 )k−1 , and the probability that the k-th trial is successful
56
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS57
is 12 . Thus, for k ≥ 1,
1 1
P (X = k) = ( )k−1 ∗ .
2 2
(b) This problem reduces to part a once a trial is performed. Whatever it’s
outcome, we label it failure and proceed to perform more trials until
the opposite outcome is observed. Thus, for k ≥ 2,
1 1
P (X = k) = ( )k−2 ∗ .
2 2
3.1.3 problem 3
P (Y ≤ k) = P (X ≤ k−µ
σ
) = F ( k−µ
σ
).
3.1.4 problem 4
To show that F (x) is a CDF, we need to show that F is increasing, right-
continuous, and converges to 0 and 1 in the limits.
The first condition is true since ⌊x⌋ is increasing.
Since limx→a+ F (x) = F (a) when a ∈ N by the definition of F (x), the
second condition is satisfied.
limx→∞ F (x) = 1 by the definition of F (x), and also, by definition,
limx→−∞ F (x) = 0. Thus, the third condition is satisfied, and F (x) is a
CDF.
The PMF F corresponds to is
1
P (X = k) =
n
for 1 ≤ k ≤ n and 0 everywhere else.
3.1.5 problem 5
(a) p(n) is clearly non-negative. Also,
∞
1X ∞
1 1 1
p(n) = = ∗ = 1.
X
n=0 2 n=0 2n 2 1− 1
2
(b)
⌊x⌋
1X⌊x⌋
1 1 1 − 2⌊x⌋+1
1
1
F (x) = p(n) = = ∗ = 1 − ⌊x⌋+1
X
n=0 2 n=0 2n 2 1− 2 1
2
for x ≥ 0 and 0 for x < 0.
3.1.6 problem 7
To find the probability mass function (PMF) of X, we need to determine the
probabilities of Bob reaching each level from 1 to 7.
P (X = 1) is simply the probability of Bob reaching level 1, which is 1
since he starts at level 1.
For 2 ≤ j ≤ 6, P (X = j) is the probability of reaching level j but not
reaching level j + 1. This can be calculated as
P (X = j) = p1 p2 · · · pj−1 (1 − pj ).
Since the game has only 7 levels,
P (X = 7) = p1 p2 p3 p4 p5 p6 .
3.1.7 problem 8
(k−1
4 )
P (X = k) = for k ≥ 5.
(100
5 )
P (X = k) = 0 for k < 5.
3.1.8 problem 9
(a) F (x) = pF1 (x) + (1 − p)F2 (x).
Let x1 < x2 . Then
F (x1 ) = pF1 (x1 ) + (1 − p)F2 (x1 ) < pF1 (x2 ) + (1 − p)F2 (x2 ) = F (x2 ).
Similarly,
lim F (x) = 0.
x→−∞
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS59
(b) Let X be an r.v. created as described. Let H be the event that coin
lands heads, and T be the event that the coin lands tails.
Then, F (X = k) = P (H)F1 (k) + P (T )F2 (k) = pF1 (k) + (1 − p)F2 (k).
Note that this is the same CDF as in part a.
3.1.9 problem 10
(a) Let P (n) = k
for n ∈ N. By principles of probability, P (n) must
P
n n∈N
equal 1.
1
P (n) = k
X X
.
n∈N n∈N n
3.1.10 problem 12
(a) https://drive.google.com/file/d/1vAAxLU7hvihAHOEcHx8Nc-9xapGlzc-I/
view?usp=sharing
(b) Let I ⊂ X be the subset of the support where P1 (x) < P2 (x). Then
3.1.11 problem 13
P (X = a) = P (Z = z)P (X = a|Z = z) = P (Z = z)P (Y = a|Z = z) = P (Y = a).
X X
z∈Z z∈Z
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS60
3.1.12 problem 14
(a)
1 − P (X = 0) = 1 − e−λ
P (X ≥ 2) = 1 − P (X = 0) − P (X = 1) = (1 − e−λ ) − e−λ λ
(b)
P (X = k) λk
P (X = k|X > 0) = = λ
P (X > 0) (e − 1)k!
where ⌊x⌋ equals the largest integer that is less than or equal to x .
3.2.2 problem 16
1
|C| 1
P (X = k|X ∈ B) = |B|
=
|C|
|B|
3.2.3 problem 17
100
110
!
P (X ≤ 100) = (0.9)i (0.1)110−i
X
i=0 i
3.2.4 problem 19
The pmf of the number of games ending in a draw is P (X = k) = nk (0.6)k (0.4)n−k
for 0 ≤ k ≤ n.
Let X be the number of games that end in draws. The number of players
whose games end in draws is Y = 2X.
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS61
3.2.5 problem 20
(a) P (X = k) = n
k
pk (1 − p)n−k for 0 ≤ k ≤ 3.
3
P (X > 0) = P (IXi = 1) − 3p2 + P (∩3i=1 IXi = 1) = 3p − 3p2 + p3 .
X
i=1
3.2.6 problem 22
(a) Let Ci be the event that i-th type of coin is chosen. Let Hk be the
event that k out of the n flips land heads.
1 n k 1 n k
! !
P (X = k) = P (C1 )P (Hk |C1 )+P (C2 )P (Hk |C2 ) = p1 (1−p1 )n−k + p (1−p2 )n−k
2 k 2 k 2
(c) If p1 ̸= p2 , then the Bernoulli trials are not independent. If, for instance,
p1 is small and p2 is large, and after the first million flips we see two
heads, this increases the likelihood that we are using the coin with
probability p1 of landing heads, which in turn tells us that subsequent
flips are unlikely to be land heads.
3.2.7 problem 23
Let Ii be the indicator of the i-th person voting for Kodos. Then, P (Ii =
1) = p1 p2 p3 . Since the voters make their decisions independently, we have
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS62
3.2.8 problem 24
(a) Since tosses are independent, we expect information about two of the
tosses to not provide any information about the remaining tosses. In
other words, we expect the required probability to be
8 8
! !
(0.5)k (0.5)8−k = (0.5)8
k k
for 0 ≤ k ≤ 8.
To prove this, let X be the number of Heads out of the 10 tosses, and
let X1,2 be the number of Heads out of the first two tosses.
P (X = k ∩ X1,2 = 2)
P (X = k|X1,2 = 2) =
P (X1,2 = 2)
(0.5)2 8
k−2
(0.5)k−2 (0.5)8−k+2
=
(0.5)2
8
!
= (0.5)k−2 (0.5)8−k+2
k−2
8
!
= (0.5)8
k−2
for 2 ≤ k ≤ 10, which is equivalent to 8
k
(0.5)8 for 0 ≤ k ≤ 8.
(b) Let X≥2 be the event that at least two tosses land Heads.
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS63
P (X = k ∩ X≥2 )
P (X = k|X≥2 ) =
X≥2
10
k
(0.5)k (0.5)10−k
=
1 − (0.510 + 10 ∗ 0.510 )
for 2 ≤ k ≤ 10.
To see that this answer makes sense, notice that if we over all values of
k from 2 to 10, we get exactly the denominator, which means the said
sum equals to 1.
3.2.9 problem 26
If X ∼ HGeom(w, b, n), then n − X ∼ HGeom(b, w, n).
If X counts the number of items sampled from the set of w items in a
sample of size n, then n − X counts the number of items from the set of b
items in the same sample.
To see this, notice that
w b
n−k k
P (n − X = k) = P (X = n − k) =
w+b
n
3.2.10 problem 27
X is not Binomial becuase the outcome of a card is not independent of the
previous cards’ outcomes. For instance, if the first n − 1 cards match, then
the probability of the last card matching is 1.
The Hypergeometric story requires sampling from two finite sets, but the
matching cards isn’t a set of predetermined size, so the story doesn’t fit.
n
k
!(n − k)
P (X = k) =
n!
where !(n − k) is a subfactorial.
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS64
3.2.11 problem 30
(a) The distribution is hypergeometric. We select a sample of t employees
and count the number of women in the sample.
n m
k t−k
P (X = k) =
n+m
t
(c) Once the total number of promotions is fixed, they are no longer inde-
pendent. For instance, if the first t people are promoted, the probability
of the t + 1-st person being promoted is 0.
The story fits that of the hypergeometric distribution. t promoted
employees are picked and we count the number of women among them.
n
k
pk (1 − p)n−k m
t−k
pt−k (1 − p)m−t+k n
k
m
t−k
P (X = k|T = t) = =
n+m
t
pt (1 − p)n+m−t n+m
t
3.2.12 problem 31
(a) Note that the distribution is not Binomial, since the guesses are not
independent of each other. If, for instance, the woman guesses the first
three cups to be milk-first, and she is correct, then the probability of
her guessing milk-first on subsequent guesses is 0, since it is known in
advance that there are only 3 milk-first cups.
Hypergeometric story fits. Let Xi be the probability that the lady
guesses exactly i milk-first cups correctly.
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS65
3 3
i 3−i
P (Xi ) =
6
3
(b) Let M be the event that the cup is milk first, and let T be the event
that the lady claims the cup is milk first. Then,
P (M |T ) P (M ) p1 p1
= =
P (M |T )
c P (M ) 1 − p2
c 1 − p2
3.2.13 problem 32
(a) The problem fits the story of Hypergeometric distributions.
s 100−s
k 10−k
P (X = k) =
100
10
for 0 ≤ k ≤ s.
(b) > x = 75
> y_interval <- sum(dhyper(7:10, x, 100-x, 10))
> y_interval
[1] 0.7853844
3.2.14 problem 33
(a) The probability of a typo being caught is p1 + p2 − p1 p2 . Then,
!
n
P (X = k) = (p1 + p2 − p1 p2 )k (1 − (p1 + p2 − p1 p2 ))n−k
k
(b) When we know the total number of caught typos in advance, the typos
caught by the first proofreader are no longer independent. For example,
if we know that first proofreader has caught the first t typos, and the
total number of caught typos is t, then the probability of the first
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS66
3.2.15 problem 34
(a) Let Y be the number of Statistics students in the sample of size m.
n n
! i n−i
n i n−i k m−k
P (Y = k) = P (X = i)P (Y = k|X = i) = p (1−p)
X X
i n
i=k i=k m
3.2.16 problem 36
(a)
n 1
!
n
P (X = ) = n ( )n
2 2
2
Thus,
s
n 2 n1 1
P (X = ) = 2 n = q πn
2 πn 2
2
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS67
(b) Let X be the value of a toss of a six sided die, with values 1 to 6. Let
Y be the value of a toss of a six sided die, with values 7 to 12. Tosses
of the two die are independent, but P (X < Y ) = 1.
3.3.2 problem 39
Let X have a discrete uniform distribution over values 1, 2, 3, ...10. Let Y =
11 − X. Then Y is also discrete uniform over the same sample space, but
P (X = Y ) = 0.
If X and Y are independent, then P (X = Y ) = i∈S P (X = i)P (Y =
P
i) > 0.
3.3.3 problem 40
(a) Suppose, toward a contradiction, that X and Y do not have the same
PMF. Then there is at least one k in the support of X such that
P (X = k) and P (Y = k) are not equal.
Note that if P (X = Y ) = 1, then P (X = Y |X = k) = P (X = Y |Y =
k) = P (X = k|Y = k) = P (Y = k|X = k) = 1, as an event with
probability 1 will still have probability 1 conditioned on any non-zero
event.
Using the above and examining Bayes’ theorem, we have P (X = k|Y =
k) = P (Y = k|X = k) ∗ P (X = k)/P (Y = k), which simplifies to
1 = P (X = k)/P (Y = k) as the conditional probabilities equal 1 as
previously shown. However, this equality is impossible if P (X = k) =
/ = P (Y = k). This contradicts the assumption that P (X = Y ) = 1 -
therefore, X and Y must have the same PMF if they are always equal.
3.3.4 problem 41
Let X be the event that Tom woke up at 8 in the morning. Let Y be the
event that Tom has blue eyes. Let Z be the event that Tom made it to his
7 a.m. class.
Clearly Tom’s eye color is independent of the time he woke up and whether
he made it to his early morning class or not. However, if Tom woke up at 8,
then he definitely did not make it to his 7 am class.
3.3.5 problem 43
(a) Let X ≡ a (mod b) and Y ≡ X + 1 (mod b). Then, limb→∞ P (X <
Y ) = 1.
For finite random variables X and Y , the case of P (X < Y ) ≥ 1 is
not possible, since then Y can never achieve the smallest value of X,
contradicting the assumption that X and Y have the same distribution.
(b) If X and Y are independent random variables with the same distribu-
tion, then P (X < Y ) ≤ 21
3.3.6 problem 44
(a) P (X Y ) ∼ Bern( p2 )
L
probability of Y = 1 at 12
Thus, YJ ∼ Bern( 12 )
First, let us note that for all J, J ′ that are disjoint, YJ , YJ ′ are in-
dependent - this follows from the independence of the Xi .
Now, we have
P (YJ = a∧YJ ′ = b) = P (YJ = a, YJ ′ = b|YA = 1)P (YA = 1)+P (YJ = a, YJ ′ = b|YA = 0)P (YA =
= P (YB = 1−a, YC = 1−b|YA = 1)P (YA = 1)+P (YB = a, YC = b|YA = 0)P (YA = 1)
= P (YB = 1−a)P (YC = 1−b)P (YA = 1)+P (YB = a)P (YC = b)P (YA = 0)
To prove that the YJ are not all independent, consider the subsets
S = {1}, S ′ = {2}, S ′′ {1, 2}. It is clear that if YS = 1 and YS ′ = 1, then
YS ′′ = YS ⊕ YS ′ = 0. However, this implies that
P( YJ = 1) = 0
\
J⊆{1..n}
P (YJ = 1) = (1/2)2n−1 ̸= 0
Y
J⊆{1..n}
(a) If we think of the Bernoulli trial success as a win for player A, and the
Bernoulli trial failure as a loss for player A, then have more than twice
as many failures as successes is analogous to A losing the Gambler’s
Ruin starting with 1 dollar. For instance, if A wins the first gamble,
then A has 3 dollars, and B needs 2 ∗ 1 + 1 gamble wins for A to lose
the entire game.
Thus, we need to find p1 .
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS71
3.4.2 problem 47
(a) Consider the simple case of m < n2 . Then, the trays don’t have enough
pages to print n copies. Desired probability is 0.
On the other hand, if m ≥ n, then desired probability is 1, since each
tray individually has enough pages.
Now, consider the more interesting case that n2 ≤ m < n. Associate n
pages being taken from the trays with n independent Bernoulli trials.
Sample from the first tray on success, and sample from the second tray
on failure. Thus, the assignment of trays can be modeled as a Binomial
random variable, X ∼ Bin(n, p). As long as not too few pages are
sampled from the first tray, the remaining pages can be sampled from
the second tray. What is too few? n − m − 1 is too few, because
n − m − 1 + m < n.
Hence,
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS72
0
m < n2
P = pbinom(m, n, p) − pbinom(n − m − 1, n, p) n2 ≤ m < n
1
m≥n
(b) Typing out the hinted program in the R language, we get that the
smallest number of papers in each tray needed to have 95 percent con-
fidence that there will be enough papers to make 100 copies is 60.
Chapter 4
Expectation
4.1.2 problem 2
Let N be the number of days in a randomly chosen year.
3 1
E(N ) = 365 + 366 = 365.25
4 4
73
CHAPTER 4. EXPECTATION 74
3 1
Var(N ) = E(N 2 ) − (E(N ))2 = 3652 + 3662 − 365.252 = 0.1875
4 4
4.1.3 problem 3
(a) Let D be the value of the die roll.
1
E(D) = (1 + 2 + 3 + 4 + 5 + 6) = 3.5
6
(b) Let T4 be the total sum of the four die rolls, and let Di be the value of
the i-th roll. Note that T4 = D1 + D2 + D3 + D4 . Then, by linearity of
expectation,
4.1.4 problem 4
Let’s start defining some convenient r.v.s for this problem:
The optimal strategy is to stop if the value of the last roll is greater than
the expected winning if one keeps playing. In other words, keep rolling if
doing so brings winnings that are, on average, greater than the last roll:
6
1
E(w2 ) = E(D3 ) = x P (D3 = x) = ∗ (1 + 2 + 3 + 4 + 5 + 6)
X
x=1 6
3 1
P (w1 = x) = P (D2 < 4, D3 = x) = ∗
6 6
3
= for x = 1, 2, 3
36
1 3 1
P (w1 = x) = P (D2 = x ∪ D2 < 4, D3 = x) = + ∗
6 6 6
9
= for x = 4, 5, 6
36
Using those probabilities to calculate the expected winning w1 from the
definition of expectation:
6
E(w1 ) = x P (w1 = x) = 4.25 dollars
X
x=1
Now we can fully describe the optimal strategy, which maximizes the
expected winnings:
4 3 1
P (W ∗ = x) = P (D1 < 5, D2 < 4, D3 = x) = ∗ ∗
6 6 6
12
= for x = 1, 2, 3
63
CHAPTER 4. EXPECTATION 76
x=1
4.1.5 problem 5
Let X ∼ DUniform(n).
1X n
1
E(X) = i = (n + 1)
n i=1 2
1X n
1 1 1
Var(X) = E(X 2 )−(E(X))2 = i2 −( (n+1))2 = (n+1)(2n+1)− (n+1)2
n i=1 2 6 4
4.1.6 problem 6
Let N be the number of games played. Then the probability than N = i is
the probability of exactly 3 wins
in the first i − 1 games,
and the last game
being a win. P (N = i) = 2 3 ( 2 ) ( 2 )
i−1 1 3 1 i−1−3 1
2
= 2 i−1
3
( 12 )i . Note that the
factor of 2 in P (N = i) is to account for either of the two players winning
after i games.
Then,
7
i−1 1 i
!
E(N ) = 2 ( ) ≈ 5.81
X
i
i=4 3 2
4.1.7 problem 7
(a) Let R be the birthrank of the chosen child. Then,
P (R = 3) = 20 1
100 3
= 60
4
P (R = 2) = 100
50 1
2
+ 100
20 1
3
= 19
60
P (R = 1) = 100 + 100 2 + 100 3 = 37
30 50 1 20 1
60
E(R) = 1 37
60
+ 2 19
60
+ 3 4
60
= 29
20
Var(R) = E(R ) − (E(R)) = 149
2 2
60
− 841
400
≈ 0.38
4.1.8 problem 8
(a) Let Ci be the population of the i-th city, such that the first four cities
are in the Northern region, the next three cities are in the Eastern
region, the next two cities are in the Southern region, and the last city
is in the Western region.
Let C be the population of a randomly chosen city.
Then E(C) = Ci = 2million.
1 P10
10 i=1
(b) Var(C) = E(C 2 ) − (E(C))2 . E(C 2 ) can not be computed without the
knowledge of population sizes of individual cities.
(d) Since regions with smaller population have more cities, if a city is ran-
domly selected, it is more likely that the city belongs to a low popu-
lation region. On the other hand, if a region is selected uniformly at
random first, then a randomly selected city is as likely to belong to
a region with a large population as it is to belong to a region with a
smaller population.
4.1.9 problem 9
Let X be the amount of money Fred walks away with.
CHAPTER 4. EXPECTATION 78
(a) E(X) = 16000. There is no variance under this scenario, since Fred’s
take home amout is fixed.
Option b has a higher expected win than option c, but it also has a higher
variance.
4.1.10 problem 10
The probability that the game lasts n rounds is 1/2n .
We know that ∞ i=1 (x ) = 1−x . Deriving both sides with respect to x gives
i x
P
i=1 (ix ) = (1−x) 2 . Multiplying by both sides gives i=1 (ix ) = (1−x)2 .
P∞ i−1 1 P∞ i x
We know ∞ i=1 (ix ) = (1−x)2 . Deriving both sides with respect to x again,
i x
P
x+x2
by x gives i=1 (i x ) = . Plugging in x = 1/2 gives the answer of 6.
P∞ 2 i
(1−x)3
4.1.11 problem 11
Note that 31 = 24 + 23 + 22 + 21 + 1. Thus, Martin can play at most 5 rounds.
For every possible win, Martin makes 1 dollar. If the game reaches the fifth
round, it is also possible that Martin loses and walks away with nothing.
Let X be Martin’s winnings.
CHAPTER 4. EXPECTATION 79
Then,
5
1 1
E(X) = ( i 1) + ( 5 0) ≈ 0.97
X
i=1 2 2
4.1.12 problem 12
Since P (X = k) = P (X = −k), i=1 (iP (X = i) + (−i)P (X = −i)) = 0.
Pn
Hence, E(X) = 0.
4.1.13 problem 14
E(X) = c ∞ i=1 p = c( 1−p − 1) = − log(1−p) 1−p
k 1 1 p
P
(− log(1−p)
1 p 2
1−p
)
4.1.14 problem 15
(a) Let X be the earnings by player B. Suppose B guesses a number j
with probability bj . Then,
100
E(X) =
X
jpj bj
j=1
To maximize E(X) then, B should set bj = 1 for the j for which jpj is
maximal. Since pj are known, this quantity is known.
100
cA
E(X) = (k bk ) = c A
X
k=1 k
.
Thus, irrespective of what strategy B adopts, their expected earnings
are the same, so B has no incentive to change strategies. Similar argu-
ment can be made for A.
4.1.15 problem 16
(a) From the student’s perspective, the average class size is E(X) = 200
360
100+
160
360
10 = 60. From the dean’s perspective, the average class size is
E(X) = 1618
10 + 18
2
100 = 20. The discrepancy comes from the fact that
when surveying the dean, there are only two data points with a large
number of students. However, when surveying students, there are two
hundred data points with a large number of students. In a sense, the
student’s perspective overcounts the classes.
average class size is E(X) = ni=1 (ci Pnci ci ). In the dean’s perspective,
P
i=1
all ci are equally weighted - n1 . However, in the students’ perspective,
weights scale with the size of the class. Thus, the students’ perspective
will always be larger than the dean’s, unless all classes have the same
number of students.
4.1.16 problem 17
(a) The expected number of children in a randomly selected family during
a particular era is E(X) = ∞ P∞nk =m .
P
k=0 k
1
n m0k
k=0
(c) answer in part b is larger than the answer in part a. Since the average
in part a is taken over randomly selected families, families with fewer
children are weighted the same as families with more children. The
average in part b, on the other hand, is taken over individual children,
skewing the weights in favor of families with more children.
(b) Let X be the number of contestants who enter a tournament, and let
Y be the number of contestants who pass the first round. Clearly,
P (X ≥ Y ) = 1.
4.2.2 problem 24
One way to think about the problem is that the event X < r counts all
sequences of n independent Bernoulli trials, where the number of failures
is larger than n − r. If we extend the number of trials indefinitely, this
implies that more than n − r failures occured before the r-th success, because
otherwise, we’d have X ≥ r. The probability of this event is P (Y > n − r).
Implication in the reverse direction can be shown analogously.
4.2.3 problem 26
(a) Let Z represent the number of flips until both Nick and Penny flip
Heads. Then is Z ∼ FS(p1 p2 ), since Nick’s and Penny’s flips are inde-
pendent. E(Z) = p11p2
(b) The logic is analogous to part a, but success probability is p1 +p2 −p1 p2 .
(c)
∞
p
P (X1 = X2 ) = (((1 − p)2 )k−1 p2 ) =
X
k=1 2−p
(d) By symmetry,
1 − ( 2−p
p
) 1−p
P (X1 < X2 ) = =
2 2−p
4.2.4 problem 28
Let Ik be the indicator variable for the k-th location, so that Ik = 1 if k-th
location has a treasure and Ik = 0 otherwise.
CHAPTER 4. EXPECTATION 82
t−1 n−1−t+1 n−t
t t−1 k−1−t+1 t k−t
P (X = k) = P (Ik = 1)P (Xk−1 = t − 1) =
n−1
=
n−1
n n
k−1 k−1
n−t
n n
t k−t (n + 1)t
E(X) = kP (Ik = 1)P (Xk−1 = t − 1) = (k n−1 ) =
X X
4.2.5 problem 29
Random variable f (X) takes values that are the probabilities of a random
value taken by X. Since X ∼ Geom(p), f (X) ∈ {(1 − p)k p|k ∈ Z≥0 }, and
each value (1 − p)k p of f (X) occurs with probability (1 − p)k p. Thus,
∞
p
E(X) = ((1 − p)k p)2 = −
X
k=0 p−2
for |p − 1| < 1.
2
4.2.6 problem 30
(a)
∞
e−λ (λ)x
E(Xg(X)) =
X
xg(x)
x=0 x!
∞
e−λ (λ)x
=
X
xg(x)
x=1 x!
∞
e−λ (λ)x−1
=λ
X
g(x)
x=1 (x − 1)!
∞
e−λ (λ)x
=λ g(x + 1) = λE(g(X + 1))
X
x=0 (x)!
CHAPTER 4. EXPECTATION 83
(b)
E(X 3 ) = E(XX 2 )
= λE((X + 1)2 )
= λ(E(X 2 ) + E(2X) + 1)
= λ(λE(X + 1) + 2λ + 1) = λ(λ(λ + 1) + 2λ + 1)
= λ(λ2 + 3λ + 1)
4.2.7 problem 31
(a)
p + (1 − p)Poiss(X = k) k=0
(
P (X) =
(1 − p)Poiss(X = k) k>0
4.2.8 problem 33
Suppose w = r = 1. The white ball is equally likely to be any of the w + b
balls. Also, note that the event k-th drawn ball is the white ball is equivalent
to the event k−1 black balls are drawn until the white ball is drawn. Thus, for
X ∼ NHGeom(1, n − 1, 1), P (X = k) = P (k + 1-th drawn ball is white) = n1
for 0 ≤ k ≤ n − 1.
(1+k−1)(1+n−1−1−k)
P (X = k) = 1−1 1+n−11−1 = n1
( 1 )
CHAPTER 4. EXPECTATION 84
4.3.2 problem 39
Let Ij,1 and Ij,2 be the indicator random variables for the j-th person being
sampled by the first and second researchers respectively.
(N −1) (N −1)
P (Ij,1 = 1) = m−1 . P (Ij,2 = 1) = n−1 . Since sampling is done
(m)
N
(Nn )
(N −1)(Nn−1−1
)
independently by the two researchers, P (Ij,1 = 1, Ij,2 = 1) = m−1 .
( N
m )(N
n )
Let X = j=1 (Ij,1 Ij,2 ) be the number of people sampled by both re-
Pn
searchers. Then,
n n n N −1 N −1 N −1 N −1
m−1 n−1 m−1 n−1
E(X) = E( (Ij,1 Ij,2 )) = E(Ij,1 Ij,2 ) = =n
X X X
N N N N
j=1 j=1 j=1 m n m n
4.3.3 problem 40
Let Ij be the indicator random variable for HTH pattern starting on the j-th
toss. Since the tosses are independent, P (Ij = 1) = 18 for 1 ≤ j ≤ n − 2.
Let X = n−2 j=1 Ij be the number of HTH patterns in n independent coin
P
tosses. Then,
n−2 n−2 n−2
1 n−2
E(X) = E( Ij ) = E(Ij ) = =
X X X
4.3.4 problem 41
Let Ij be the indicator variable for j-th card being red. Let Rj = Ij Ij+1 be the
indicator variable for the j-th and j + 1-st cards being red. Let X = 51
P
j=1 Rj
CHAPTER 4. EXPECTATION 85
4.3.5 problem 42
Let Ij be the indicator variable for the j-th toy being of a new type. The
number of toy types after collecting t toys is X = tj=1 Ij . P (Ij = 1) =
P
( n−1
n
)j−1 . Thus,
t t t
n − 1 j−1 n−1 t
E(X) = E( Ij ) = E(Ij ) = ( ) = n − n( )
X X X
4.3.6 problem 43
(a) This problem is a special case of problem 42 with t = k and n−1 floors.
Thus, the expected number of stops is (n − 1) − (n − 1)( n−2
n−1
)k .
(b) Let Ij be the indicator variable for the j-th floor being selected for
2 ≤ j ≤ n. Then, the number of stops is X = nj=2 Ij . Thus,
P
n n n
E(X) = E( Ij ) = E(Ij ) = (1 − (1 − pj )k )
X X X
4.3.7 problem 45
Notice that n
I(A1 ∩ A2 · · · ∩ An ) ≥ I(Ai ) − n + 1
X
i=1
because the left-hand side is either 0, or 1, so the question reduces to whether
the left-hand side is ever 0, while the right-hand side is 1. Notice that this
is not possible, becuase if the left-hand side is 0, then Aj = 0 for some j.
Thus, ni=1 I(Ai ) < n =⇒ R.H.S. < 1.
P
Then,
n n n n n n
Ai ) ≥ I(Ai )−n+1 =⇒ E(I( Ai )) ≥ E( I(Ai )−n+1) =⇒ P ( Ai ) ≥ P (Ai )−n+1
\ X \ X \ X
I(
i=1 i=1 i=1 i=1 i=1 i=1
CHAPTER 4. EXPECTATION 86
4.3.8 problem 46
Let X ∼ NHGeom(4, 48, 1) be the number of non-aces before the first ace.
Then, E(X) = w+1 rb
= 1∗48
5
= 9.6.
Let Y ∼ NHGeom(4, 48, 2) be the number of non-aces before the second
ace is drawn. Then, E(X) = w+1 rb
= 2∗48
5
= 19.2
Let Z = Y −X. Notice that Z represents the number of non-aces between
the first and the second ace. E(Z) = E(Y ) − E(X) = 19.2 − 9.6 = 9.6.
4.3.9 problem 47
(a) Let X = 52 I be the number of cards that are called correctly.
P
P52 i=1 i
E(X) = i=1 P (Ii = 1) = 52 52
1
= 1.
51 1 50 49 1 1 1
P (N Y N N Y ) = × = ×
52 51 51 50 49 52 51
Notice that the second N in the sequence has probability 50 51
, because
the second card is guessed correctly. The only piece of information we
have is that the third card is not the card that was correctly guessed,
leaving a total of 51 possibilities. Generalizing, the probability of a
string of length i with k Y s is (52−k)!
52!
. There are k−1
i−1
strings of length
i with k Y s that end in a Y , and since 1 ≤ k ≤ i,
i
i − 1 (52 − k)!
!
P (Ii = 1) =
X
Thus,
CHAPTER 4. EXPECTATION 87
i
52 X
i − 1 (52 − k)!
!
E(X) =
X
k=1 52! k
52
1
=
X
k=1 k!
P∞ xi
Note that ex = i=0 i! =⇒ e1 ≈ 1 + E(X) + 10−15 . Thus,
E(X) ≈ e − 1
(c) Since at any given time, we know all the cards remaining in the deck,
the probability of the i-th card being the card guessed correctly is
. Thus, E(X) = 52 i=1 E(Ii ) = i=1 P (Ii = 1) = i=1 52−i+1 =
1 P P52 P52 1
52−i+1
i=0 52−i ≈ 4.54.
P51 1
4.3.10 problem 49
Let Ij be the indicator variable for the j-th prize being selected. The value
recieved from the j-th prize is jIj . Then, the total value X is nj=1 (jIj ).
P
(n−1) Pn
E(X) = nj=1 jP (Ij = 1) = nj=1 j k−1 = j=1 j nk = nk n(n+1) = k(n+1)
P P
.
(nk) 2 2
4.3.11 problem 50
Let C1 be a random chord that spans a minor arch of length x on a circle
of radius r. To generate a chord C2 , with endponts A and B, such that C2
intersects C1 , either A is on the minor arch and B is on the major arch, or
A is on the major arch and B is on the minor arch.
CHAPTER 4. EXPECTATION 88
4.3.12 problem 52
Let Ij be the indicator variable for the j-toss landing on an outcome different
from the previous toss for 2 ≤ j ≤ n. Then, the total number of such
tosses is X = nj=2 Ij . The total number of runs is Y = X + 1. Since
P
4.3.13 problem 53
Let Ij be the indicator variable for tosses j and j + 1 landing heads for
1 ≤ j ≤ 3. Then, the expected number of such pairs is E(X) = 3j=1 P (Ij =
P
4.3.14 problem 54
(Nn−1
−1
)1
(a) Since P (Wj = yk ) for 1 ≤ k ≤ N is = n 1
= 1
,
(Nn ) n N n N
1 XN
E(Wj ) = yk .
N k=1
Thus,
CHAPTER 4. EXPECTATION 89
1X n
1 X
N
E(W ) = ( yk )
n j=1 N k=1
1 XN
= yk
N k=1
=y
(b) Since
1X N
W = (Ij yj )
n j=1
where Ij is the indicator variable for the j-th person being in the sam-
ple. Then,
1X N
n
E(W ) = yj
n j=1 N
1 XN
= yk
N k=1
=y
4.3.15 problem 56
(a) Let Ij be the indicator variable for shots j to j + 6 being successful.
The total number of succesful, consecutive, 7 shots is X = i=1 Ij .
Pn−6
Then,
n−6 n−6 n−6
E(X) = E(Ij ) = P (Ij = 1) = p7 = (n − 6)p7
X X X
4.3.16 problem 59
(a) WLOG, let m1 > m be the second median of X. Then, by the definition
of medians, P (X ≤ m) ≥ 12 and P (X ≥ m1 ) ≥ 12 . Then, P (X ∈
(m, m1 )) = 0. If m1 > m + 1, then there exists an m2 ∈ (m, m1 ), such
that P (X = m2 ) = 0. This implies that m2 = 1, since that is the only
value of X with probability 0. However, then m < 1, which precludes
m from being a median. Thus, m1 must be 1 + m. Since we know 23
to be a median of X, we need to check whether 22 or 24 are medians
of X. Computation via the CDF of X shows that niether 22, nor 24
are medians. Hence, 23 is the only median of X.
(b) Let Ij be the indicator variable for the event X ≥ j. Notice that the
event X = k (the first occurance of a birthday match happens when
there are k people) implies that Ij = 1 for j ≤ k and vice versa. Thus,
366
X=
X
Ij
j=1
.
Then,
366 366 366
E(X) = P (Ij = 1) = 1 + 1 + P (Ij = 1) = 2 +
X X X
pj
j=1 j=3 j=3
(d) E(X 2 ) = E(I12 + · · · + I366 +2 i=1 (Ii Ij )). Note that Ii2 = Ii and
2 P366 Pj−1
j=2
Ii Ij = Ij for i < j. Thus,
CHAPTER 4. EXPECTATION 91
366 j−1
E(X ) = E(I1 + · · · + I366 + 2
2
Ij )
X X
j=2 i=1
366 366
=2+ pj + 2 ((j − 1)E(Ij ))
X X
j=3 j=2
366 366
=2+ pj + 2 ((j − 1)pj )
X X
j=3 j=2
≈ 754.61659
4.3.17 problem 60
(a) By the story of the problem, X ∼ NHGeom(n, N − n, m). Then,
Y = X + m.
−n)
(b) According to part a, E(Y ) = E(X) + m = m(N n+1
+ m. The implied
indicator variables are the same as in the proof of the expectation of
Negative Hypergeometric random variables.
0 =⇒ (n+1)N
N −n
> 0 =⇒ n < N for positive n and N , E(Z) < m.
4.4 LOTUS
4.4.1 problem 62
(2λ)k
E(2X ) = 2k P (X = k) = e−λ = e−λ e2λ = eλ .
P∞ P∞
k=0 k=0 k!
4.4.2 problem 63
E(2X ) = 2k (1 − p)k p = p k=0 (2 − 2p)k = when 2 − 2p < 1 =⇒
P∞ P∞ p
k=0 2p−1
p > 12 .
CHAPTER 4. EXPECTATION 92
Therefore, the number of pairs in the sample that are the same person can
be approximated by Pois(1/2).
Then the probability that there is at least one pair in the sample that are
the same person is 1 − e−0.5 = 0.393. This can be verified as a close approx-
imation in R - the probability that every individual in the sample is unique
is the last value resulting from the command cumprod(1-(0:999)/1000000),
which is .6067. 1 minus this value gives .3933, the actual probability some two
sampled individuals are the same person, which is very close to our Poisson
approximation.
4.5.2 problem 71
Let Ij be the indicator random variable for pair j having the aforementioned
property. P (Ij = 1) = 365 1
2 , under the assumption that the probability
4.5.3 problem 72
(a) Suppose the population consists of n people (excluding me). Let Ij be
the indicator variable for the j-th person having the same birthday as
me. Then, the expected number of people with the same birthday as
me is E(X) = ni=1 P (Ij = 1) = ni=1 365 1
= 365
n
.
P P
n n
1 − e− 365 >= 0.5 =⇒ > − ln(0.5) =⇒ n > 252
365
(n2 ) (n2 )
(b) By similar logic to part a, E(X) = 365×24
. 1 − P (Z = 0) = 1 − e − 365×24
.
(n2 ) n
2
1−e − 365×24
>= 0.5 =⇒ > − ln(0.5) =⇒ n > 110
365 × 24
(c) Since Poisson approximation is completely determined by the expecta-
tion of the underlying random variable, we need to increase the popula-
tion size so that the expectation of the number of pairs with the desired
property is the same as the expectation of the number of pairs with
the same birthday when population size is 23. Since, E(X) = 24 1
E(Y ),
where Y is the number of pairs of people that share a birthday, the
population needs to be increased to have 24 times more pairs.
23
! !
n
= 24 ∗ =⇒ n ≈ 110
2 2
.
(d) Let X be the number of triplets with the same birthday. Let Ij be
the indicator random
variable for triplet j having the same birthday.
Then, E(X) = 100 3
( 365
1 2
) ≈ 1.21. Then, X can be approximated with
Z ∼ Poiss(1.21). P (at least one triplet with the same birthday) ≈ 1 −
P (Z = 0) = 1 − e−1.21 ≈ 0.7.
Another way to approximate the desired probability is to let Ij be the
indicator variable that there is a triplet born on day j. P (Ij = 1) =
CHAPTER 4. EXPECTATION 94
1 − (( 364
365
)100 + 100 365 ( 365 ) + 100
1 364 99 1
( 364 )98 ) ≈ 0.003. Then, the
2 3652 365
expected number of days for which there is a triplet born on that day
is approximately equal to 365 ∗ 0.003 = 1.095.
Then, the probability that there is at least one triplet born on the same
day can be approximated using Z ∼ Poiss(1.095) - the number of days
for which there is a triplet born on that day. The desired probability
is 1 − P (Z = 0) = 1 − e−1.095 ≈ 0.66.
Thus, the second method is a closer approximation for the desired
probability.
4.5.4 problem 73
(a) Let X be the number of people that play the same opponent in both
rounds. Let Ij be the indicator variable that person j plays against the
same opponent twice. P (Ij = 1) = 99 1
. Then, E(X) = 100 j=1 P (Ij =
P
1) = 100/99.
(c) Consider the 50 pairs that played each other in round one. Let Ij be
the indicator variable for pair j playing each other again in the second
round. P (Ij = 1) = 991
. Then, the expected number of pairs that play
the same opponent twice is E(Z) = 50 99
≈ 12 .
We can approximate the number of pairs that play against one another
in both rounds with Z ∼ Poiss( 12 ). Note that X = 2Z. P (X = 0) ≈
1
P (Z = 0) = e− 2 ≈ 0.6.
1
( 12 )1 e− 2
P (X = 2) ≈ P (Z = 1) = 1!
≈ 0.32.
Note that the approximation in part C is more accurate - the indepen-
dence of the same pairs playing against each other is much stronger
than the independence of individuals who play the same opponent.
Knowing that the players in Game 1 of round 2 played against each
other in round 1 gives us very little information about whether players
CHAPTER 4. EXPECTATION 95
in any other games also played against each other. Whereas, knowing
that Player 1 in round 2 plays against the same player (say, player
71) guarantees that we know that that player 71 also plays against the
same player.
4.6.2 problem 80
(a) X ∼ FS( 20−m+1
20
) =⇒ E(X) = 20−m+1
20
4.6.3 problem 86
(nxA )(nyB )(nzC )
(a) P (X = x, Y = y, Z = z) =
(mn )
CHAPTER 4. EXPECTATION 96
(b) Let Ij be the indicator variable for person j in the sample being a
member of party A. Then, X = m =⇒ E(X) = m nnA by
P
i=1 Ii
symmetry.
(c) Let’s find E(X 2 ). If we square the expression for the sum of X’s con-
stituent indicator r.v.s, we get
Additionally, for any pair i, j, the r.v. Ii Ij equals 1 only when some
pair of samples are both members
of party A, which occurs with prob-
(nA −1)
ability nAn(n−1) . There are m2 pairs i, j. Therefore, the expression
nA m(m−1)(nA −1)
2∗ E(Ii Ij ) evaluates to .
P
i<j,1≤j≤m n(n−1)
When m = 1, Var(X) = nA
n
− ( nnA )2 = nA
n
(1 − nA
n
) = nA
n
× nB +nC
n
.
When m = n, Var(X) = 0. This makes sense, as if the sample is the
entire population, we always get the same number of members of party
A in our sample (all of them), so there is no variation.
4.6.4 problem 87
(a) Let Ij be the indicator random variable for j person in the sample being
a democrat. Let X be the total number of democrats in the sample.
Then, E(X) = cj=1 Ij = c 100d
P
(b) Let Ij be the indicator random variable for state j being represented by
(98c )
at least one person in the sample. Then P (Ij = 1) = 1 − 100 . Then,
(c)
the expected number of states represented in the sample is E(X) =
(98c )
50(1 − 100 ).
(c)
CHAPTER 4. EXPECTATION 97
(c−2
98
)
(c) Similarly to part b, E(X) = 50 100 .
(c)
(50k )(20−k
50
)
(d) P (X = k) = for 0 ≤ k ≤ 20.
( 20 )
100
18)
(98
(e) Similar to part b, E(X) = 50 100 .
( 20 )
4.6.5 problem 88
(a) X ∼ Geom( g+b
g
) =⇒ E(X) = b
g
4.6.6 problem 89
(a) Since E(NC ) = 115pC , Var(NC ) = k 2 pkC − (115pC )2 .
P115
k=0
(b) Let Ij be the indicator random variable that CATCAT starts at position
j. Then, the expected number of CATCAT is E(X) = 110(pC pA pT )2 .
(c) In a sequence of length 6, the desired options are CATxxx, xxxCAT.
Thus, P (at least one CAT) = 2(pC pA pT (1 − pC pA pT )) + (pC pA pT )2 .
4.6.7 problem 90
(a) Let Ij be the indicator variable that j person in Bob’s sample is also
sampled by Alice. Then, P (Ij = 1) = 10 1
. Then, the expected number
of people in Bob’s and Alice’s samples is 2.
(b) |A ∪ B| = 100 + 20 − |A ∩ B| =⇒ E(|A ∪ B|) = 100 + 20 − E(|A ∩ B|) =
100 + 20 − 2 = 118.
(c) Let Ij be the indicator random variable for couple j being in Bob’s
(998
18 )
sample. Then P (Ij = 1) = 1000 . Thus, the expected number of
( 20 )
(998
18 )
couples in Bob’s sample is E(X) = 500 1000 ≈ 0.2.
( 20 )
CHAPTER 4. EXPECTATION 98
4.6.8 problem 91
(a) If F = G, Then, Xj is equally likely to be in any of the m + n positions
in the ordered list.
m m
(m + n)(m + n + 1) 1 m+n+1
E(R) = E(Rj ) = =m
X X
.
j=1 j=1 2 m+n 2
(b) Rj = ( nk=1 IYk + k̸=j IXk + 1) where IYk are the indicator random
P P
variables for Xj being larger than Yk and IXk are the indicator random
variables for Xj being larger than Xk . Note that E(IYk ) = p for all k
since the Ys are iid, and E(IXk ) = 1/2 - Xj and Xk are iid and never
equal, so they are equally likely to be bigger or smaller than the other.
Then E(Rj ) = np + (m − 1)/2 + 1.Thus, E(R) = m(np + (m − 1)/2 + 1).
4.6.9 problem 92
(a) Let S be the sum of the ranks of the dishes we eat during both phases.
S = (m − k + 1)X + k−1 j=1 Rj , where Rj is the rank of dish j, excluding
P
the highest ranked dish, from the exploration phase. Since E(Rj ) =
(X−2
k−2 )
(X−1)X
× (k−1)(X−1
= (X−1)X 1
× X−1 = X2 , E(S) = (m − k + 1)E(X) +
k−1 )
2 2
(k − 1) E(X)
2
= (m − k)E(X) + (k + 1) E(X) 2
.
(x−1
k−1)
(b) P (X = x) = .
(nk)
CHAPTER 4. EXPECTATION 99
(c)
1 X
n
i−1
!
E(X) = n i
i=k k−1
k
1 X
n
!
i
= n k
i=k k
k
n
!
k X i
= n
ki=k k
n+1
!
k
= n
k+1
k
k(n + 1)
=
k+1
(a)
100
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 101
5.1.2 problem 2
Take Unif(0, 12 ),
1
f (x) = {2, x ∈ (0, )}
2
Z
f (x)dx = 1
C
5.1.3 problem 3
(a) The new PDF is,
g(x) = 2F (x)f (x)
g(x) ≥ 0 in the same range as f
Z ∞ Z ∞
g(x)dx = 2F (x)f (x)dx (5.6)
−∞ −∞
Z ∞
= d(F 2 (x)) (5.7)
−∞
= x→∞
lim F 2 (x) − lim F 2 (x) (5.8)
x→−∞
=1−0 (5.9)
Z ∞
1Z ∞ 1Z ∞
g(x)dx = f (x)dx + f (x)dx (5.10)
−∞ 2 −∞ 2 −∞
=1 (5.11)
5.1.4 problem 5
a. We have A = πR2 , so E(A) = πE(R2 ). We have E(R2 ) = ∗ 1 dx =
R1 2
0 x
1/3, since the PDF of R is always 1. Then E(A) = π/3.
q q
b. CDF: P (A < k) = P (πR2 < k) = P (R < k/π) = k/π for
0 < k < π using the CDF of Unif(0,1). The CDF of A is 0 for k < 0 and
k > π.
q
PDF: d
dk
( k/π) = √1
2 kπ
for 0 < k < π and 0 elsewhere.
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 103
5.1.5 problem 6
a. For X ∼ U nif (0, 1),
√ F (k) = k for 0 < k < 1, E(X) = 1/2, V ar(X) =
1/12, ST D(X) = 1/2 3.
Then
√ the probability
√ X is√within one√ standard deviation of its mean is
P ( 2 3 < X < 2 3 ) = F ( 23+1
3−1
√ 3+1
√ √ ) − F ( 3−1
3
√ ) = √1 .
2 3 3
5.1.6 problem 7
a. F(x) is continuous at its given endpoints: F (1) = π2 arcsin(1) = 2π
π2
=1
and F (0) = 0, and F(x) is differentiable between 0 and 1:
√
b. F ′ (x) = f (x) = 2 d
π dx
(arcsin( x)) = π2 ( √1−x
1
)( √1x )
of f (x) from 0 to 1 converges, and f (x) is always positive - the same reason
why √1x has a discontinuity at x = 0 but can be integrated from 0 to any
positive real number.
5.2.2 problem 51
(a) We know that, X 2 ≤ X with probability 1. So E[X 2 ] ≤ EX
But
0 ≥ (X − 1/2)2 ≤ 1/4
with probability 1. To get E[(X − 1/2)2 ] = 1/4, we need (X − 1/2)2 = 1/4
with probability 1. So,
0 with prob p
X= (5.18)
1 with prob 1 - p
5.2.3 problem 52
Z ∞
E[X] = xf (x)dx (5.19)
0
Z ∞
x2
= x2 e− 2 dx (5.20)
0
1 Z ∞ 2 − x2
= x e 2 dx (5.21)
2 −∞
1
= (5.22)
2
Z ∞
E[X ] =
2
x2 f (x)dx (5.23)
Z0∞
x2
= x3 e 2 dx (5.24)
0
Z ∞
= 2ueu du = 2 (5.25)
0
(5.26)
5.2.4 problem 56
Z ∞ Z x
x2
E[Z Φ(z)] =
2
x 2
e− 2 dx (5.27)
−∞ −∞
Now Z ∞ Z ∞
f (z)dz f (z)dz
−∞ −∞
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 106
x2
Z
E[z 2 Φ(z)] = x2 Φ(x)e− 2 dx (5.28)
x2
Z
= x2 Φ(−x)e− 2 dx (5.29)
x2
Z
= x2 (1 − Φ(x))e− 2 dx (5.30)
So
Z
x2
E[Φ(z)z ] = x Φ(x)e − )dx
2 2 (
(5.31)
2
1 Z 2 − x2
= xe 2 (5.32)
2
1
= (5.33)
2
(b)
2 2 2
P (Φ(z) ≤ ) = P (z ≤ Φ−1 ( )) = Φ(Φ−1 ( ))
3 3 3
(c)
1
3!
5.2.5 problem 57
(a)
5.2.6 problem 58
(a)
1 Z ∞
E[Y ] = · 0 + xf (x)dx (5.39)
2 0
Z ∞
1 x2
= √ xe− 2 dx (5.40)
0 2π
5.2.7 problem 59
(a) Length biased sampling
L1 + L2 + L3 = 2π
2π
E[L1 ] = E[L2 ] = E[L3 ] =
3
But our point is more likely to be a part of the longest arc. If there was a 1
3
chance of the point being in any one of the three points then E[L] = 2π 3
(b)
θ1 = Unif(0, 2π)
θ2 = Unif(0, 2π)
θ3 = Unif(0, 2π)
L1 = min(θ1 , θ2 , θ3 )
CDF,
PDF,
d
f (y) = F (y) (5.44)
dy
3 y 2
= (1 − ) (5.45)
2π 2π
(c)
5.2.8 problem 61
(a)
1 if k arrives when fun
Ik = (5.51)
0 if k arrives when not fun
Now I = I1 + · · · + In
Out of the possible 4! orderigns of Tyrion, Cersei, 1, and 2 for both 1 and 2
to arrive when fun, the following orderings are possible
Tyrion 1 2 Cersei Tyrion 2 1 Cersei Cersei 1 2 Tyrion Cersei 2 1 Tyrion
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 109
So P (I1 I2 ) = 1
4!
4 = 1
6
5.2.9 problem 62
(a)
1 if k sets a low or high record
Ik = (5.54)
0 if k doesn’t set a low or high record
P (I1 ) = 1
P (I2 ) = 1
2
P (I3 ) =
3
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 110
2
P (I4 ) =
4
and so on.
Now I = I1 + · · · + In
E[I] = 1 + 1 + 32 · · · 100
2
(b)
1 if k sets a low followed by a high
Ik = (5.55)
0 otherwise
1 1
P (Ik ) =
kk+1
Now I = I1 + · · · + In
E[I] = 1·2
1
+ · · · + 100·101
1
E[I] = 1 − 101
1
(c)
(d)
∞
E[N ] = iP (N = i) (5.60)
X
i=1
∞
1
= (5.61)
X
1 i+1
is unbounded.
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 111
5.3 Exponential
5.3.1 problem 37
a. We need to find the value of t such that F (t) = 1/2 - this will indicate
that there is a 1/2 chance that the particle has decayed before time t.
b. We need to compute P (t < T < t+ϵ|T > t) = P (t < T < t+ϵ)/P (T >
−λ(t+ϵ) )−(1−e−λt )
t). This is (1−e e−λt
= 1 − e−λϵ . Using the approximation given in
the hint and the assumption that ϵ is small enough that ϵλ ≈ 0, this is about
1 − (1 − ϵλ) = ϵλ.
c. P (L > t) = P (T1 > t)P (T2 > t)...P (Tn > t) = e−nλt , so L ∼
Expo(nλ). Therefore, if X ∼ Expo(1), we have L = X/nλ. Then since
E(X) = 1 and V ar(X) = 1, we can get E(L) = 1/(nλ) and V ar(L) =
1/(n2 λ2 )
0.577)
5.3.2 problem 39
Let O1 , O2 , O3 be the offers received, all distributed as Expo(1/12000). We
want to find E(max(O1 , O2 , O3 )). Imagine ordering the offers as lowest,
middle, and highest price. Let D1 be the lowest price, let D2 be how much
more the middle price is than the lowest price, and D3 be how much more
the highest price is than the middle price. Then D1 is the minimum of 3
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 112
Then E(D1 + D2 + D3 ) = 1
3λ
+ 1
2λ
+ 1
λ
= 4000 + 6000 + 12000 = 22000.
5.3.3 problem 45
Let N be the number of emails received within the first 0.1 hours. Then
P (T > 0.1) = 1 − P (T < 0.1) = 1 − P (N ≥ 3) =
1−(1−P (N = 0)−P (N = 1)−P (N = 2)) = P (N = 0)+P (N = 1)+P (N =
2)
That is, to find the probability that it takes longer than 0.1 hours for
3 emails to arrive, we find the probability it takes less than 0.1 hours for 3
emails to arrive. To do this, we realize that this is equivalent to at least 3
emails arriving in the first 0.1 hours. And to find that probability, we realize
we can find the probabilities of exactly 0, 1, or 2 emails arriving in the first
0.1 hours.
5.4 Normal
5.4.1 problem 26
a. Let Tw ∼ N (w, σ 2 ) be the time it takes Walter to arrive, Tc ∼ N (c, 4σ 2 )
be the time it takes Carl to arrive. We have −Tw ∼ N (−c, σ 2 ) since flipping
the sign of the rv flips the sign of the expectation but does not change the
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 113
For Carl to arrive first, we require Tc − Tw < 0 (the time it takes Carl is
less than the time it takes Walter). Let us find this probability:
w−c w−c
P (Tc − Tw < 0) = P (Z < √ ) = Φ( √ )
σ 5 σ 5
√ ) >
b. If Carl has a greater than 1/2 chance of arriving first, then Φ( w−c
σ 5
1/2. Since Φ is an increasing function and equals 1/2 when its input is 0,
this implies we need w−c√ > 0, which in turn implies c > w. So, as long as
σ 5
Carl’s car lets him be faster on average than Walter’s walking, Carl has a
better than 1/2 chance of arriving first.
w + 10 − c
P (Tc < w + 10) = P (2σZ + c < w + 10) = Φ( )
2σ
10
P (Tw < w + 10) = P (σZ + w < w + 10) = Φ( )
σ
Since Φ is an increasing function, if we want Carl to have a greater chance
than Walter to make it on time, then we require w+10−c 2σ
> 10
σ
. This then
implies that we need w > c + 10.
5.4.2 problem 35
Let g(Z) = max(Z − c, 0). We have g(Z) = 0 for Z < c and g(Z) = Z − c
for Z > c.
Then
Z ∞ Z ∞
E(g(Z)) = g(k)φ(k)dk = (k − c)φ(k)dk
−∞ c
since the expression inside the integral is 0 for k < c. Next we have
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 114
2 2
Z ∞ Z ∞
ke−k /2 Z ∞
e−c /2 Z −c
(k−c)φ(k)dk = √ dk− cφ(k)dk = √ −c φ(k) = φ(c)−cΦ(c)
c c 2π c 2π −∞
with the first equality following from splitting the integral via subtraction
and expanding out φ for the left integral, the second equality comes from
observing that the antiderivative of kφ(k) is −φ(k) and that we can change
the limits of the right integral due to the symmetry of tail areas of the curve
of φ, and the last equality comes from applying the definitions of the PDF
and CDF of the standard normal distribution.
1−Y 1 2 2k
P (X/Y < k) = P ( < k) = P (Y > )=1−( − 1) =
Y k+1 k+1 k+1
To find the PDF, we derive the CDF using the quotient rule:
d 2k
( ) = 2(k + 1)−2
dk k + 1
b. Note that X/Y is minimized at when X is 0 and Y is 1 and maxi-
mized at 1 when X and Y are both 1/2. So, to find E(X/Y ), we must find
0 2k(k + 1) dk. This can be done with integration by parts, factoring out
R1 −2
Z 1
−k 1 Z 1 1
2k(k+1)−2 dk = 2(( )|0 − −(k+1)−1 dk) = 2(− +ln(2)) = 2ln(2)−1
0 k+1 0 2
Then, we need to evaluate the same integral as in part b to find E(Y /X),
but now with the limits set from 1 to infinity, since Y /X is minimized when
X=Y and maximized when Y=1 and X=0:
Z ∞
−k ∞ Z ∞
2k(k+1) dk = 2((
−2
)| − −(k+1)−1 dk) = 2((−1/2)+ln(∞)−ln(2)) = ∞
1 k+1 1 1
Chapter 6
Moments
116