0% found this document useful (0 votes)
11 views118 pages

Probability Solution Manual

The document is a solution manual for an introduction to probability, covering various topics such as counting, conditional probability, random variables, and expectation. It includes detailed solutions to problems related to these topics, providing step-by-step explanations and calculations. The content is organized into chapters, with each chapter focusing on a specific aspect of probability theory.

Uploaded by

Bijay Nag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views118 pages

Probability Solution Manual

The document is a solution manual for an introduction to probability, covering various topics such as counting, conditional probability, random variables, and expectation. It includes detailed solutions to problems related to these topics, providing step-by-step explanations and calculations. The content is organized into chapters, with each chapter focusing on a specific aspect of probability theory.

Uploaded by

Bijay Nag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

Introduction to Probability

Solution Manual

May 25, 2025


Contents

1 Probability and Counting 1

2 Conditional Probability 26

3 Random Variables and Their Distributions 56

4 Expectation 73

5 Continuous Random Variables 100

6 Moments 116

i
Chapter 1

Probability and Counting

1.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Story Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Naive Definition Of Probability . . . . . . . . . . . . . . . . . 14
1.4 Axioms Of Probability . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Inclusion Exclusion . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6 Mixed Practice . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.1 Counting
1.1.1 problem 1
There are
11
!

4
ways to select 4 positions for I,

7
!

ways to select 4 postions for S,

3
!

1
CHAPTER 1. PROBABILITY AND COUNTING 2

ways to selection 2 positions for P leaving us with a single choice of position


for M . In total, we get
11 7 3 1
! ! ! !

4 4 2 1
permutations.

1.1.2 problem 2
(a) If the first digit can’t be 0 or 1, we have eight choices for the first
digit. The remaining six digits can be anything from 0 to 9. Hence,
the solution is
8 × 106

(b) We can subtract the number of phone numbers that start with 911 from
the total number of phone numbers we found in the previous part.
If a phone number starts with 911, it has ten choices for each of the
remaining four digits.

8 × 106 − 104

1.1.3 problem 3
(a) Fred has 10 choices for Monday, 9 choices for Tuesday, 8 choices for
Wednesday, 7 choices for Thursday and 6 choices for Friday.

10 × 9 × 8 × 7 × 6

(b) For the first restaurant, Fred has 10 choices. For all subsequent days,
Fred has 9 choices, since the only restriction is that he doesn’t want to
eat at the restaurant he ate at the previous day.

10 × 94
CHAPTER 1. PROBABILITY AND COUNTING 3

1.1.4 problem 4
 
(a) There are n
2
matches.
For a given match, there are two outcomes. Each match has two pos-
sible outcomes. We can use the multiplication rule to count the total
possible outcomes.

2( 2 )
n

(b) Since every player plays every other player exactly once, the number of
games is the number of ways to pair up n people.
!
n
2

1.1.5 problem 5
(a) By the end of each round, half of the players participating in the round
are eliminated. So, the problem reduces to finding out how many times
the number of players can be halved before a single player is left.
The number of times 2N can be divided by two is log2 2N which means
the total amount of rounds in the tournament is

(b) The number of games in a given round is Nr


2
. We can sum up these
values for all the rounds.

N N N N
f (N ) = + + + · · · + log N
2 4 8 2 2
log2 N
X 1
=N
i=0 2 (1.1)
i

N −1
=N×
N
=N −1
CHAPTER 1. PROBABILITY AND COUNTING 4

(c) Tournament is over when a single player is left. Hece, N − 1 players


need to be eliminated. As a result of a match, exactly one player is
eliminated. Hence, the number of matches needed to eliminate N − 1
people is

N −1

1.1.6 problem 6
Line up the 20 players in some order then say the first two are a pair, the
next two are a pair, etc. This overcounts by a factor of 10! because we don’t
care about the order of the games. So in total we have
20!
10!
ways for them to play. This correctly counts for the whether player A plays
white or black. If we didn’t care we would need to divide by 210 .
Another way to look at it is to choose the 10 players who will play white
then let each of them choose their opponent from the other 10 players. This
gives a total of

20
!
× 10!
10
possibilities of how they are matched up. We don’t care about the order
of the players who play white but once we’ve chosen them the order of the
players who play black matters since different orders mean different pairings.

1.1.7 problem 7
 
(a) There are 7
3
ways to assign three wins to player A. For a specific
 
combination of three games won by A, there are 42 ways to assign two
draws to A. There is only one way to assign two losses to A from the
remaining two games, namely, A losses both games.

7 4 2
! ! !
× ×
3 2 2
CHAPTER 1. PROBABILITY AND COUNTING 5

(b) If A were to draw every game, there would need to be at least 8 games
for A to obtain 4 points, so A has to win at least 1 game. Similarly, if
A wins more than 4 games, they will have more than 4 points.

Case 1: A wins 1 game and draws 6.


This case amounts to selecting 1 out of 7 for A to win and assigning a
draw for the other 6 games. Hence, there are 7 possibilities.
Case 2: A wins 2 games and draws 4.
 
There are 7
2
ways to assign 2 wins to A. For each of them, there
 
are 5
4
ways to assign four draws to A out of the remaining 5 games.
Player B wins the remaining
  game. The total number of possibilities
for this case is 2 × 4 .
7 5

Case 3: A wins 3 games and draws 2.


 
There are 7
3
ways to assign 3 wins to A. For each of them, there are
 
4
2
ways to assign two draws to A out of the remaining 4 games. B
wins the
 remaining
  2 games. The total number of possibilities for this
case is 3 × 2 .
7 4

Case 4: A wins 4 games and loses 3.


 
There are 7
4
ways to assign 4 wins to A. B wins the remaining 3
 
games. The total number of possibilities for this case is 7
4
.
Summing up the number of possibilities in each of the cases we get

7 7 5 7 4 7
! ! ! ! ! !
+ × + × +
1 2 4 3 2 4

(c) If B were to win the last game, that would mean that A had already
obtained 4 points prior to the last game, so the last game would not
be played at all. Hence, B could not have won the last game. The last
game must have ended in either A winning (case 1) or a draw (case 2).

Case 1: A wins the last game. This means A had 3 points after 6
games.
There are four possibilities for A to earn 3 points in 6 games:
CHAPTER 1. PROBABILITY AND COUNTING 6

1.1. 6 draws
1.2. 3 wins and 3 losses
1.3. 2 wins, 2 draws, and 2 losses
1.4. 1 win, 4 draws, and 1 loss.

Let’s calculate the number of possibilities for each of these subcases.

1.1. There is only one way to assign 6 draws to 6 games: The number
of possibilities is 1.
 
1.2. There are 63 ways to assign 3 wins to A out of the first 6 games.
The remaining
  3 games are losses for A. The number of possibili-
ties is 3 .
6

 
1.3. There are 6
ways to assign 2 wins to A out of the first 6 games.
 2
There are ways to assign 2 draws out of the remaining 4 games.
4
2
The remaining
   2 games are losses for A. The number of possibili-
ties is 62 × 42 .
 
1.4. There are 6
ways to assign 1 win to A out of the first 6 games.
 1
There are 54 ways to assign 4 draws out of the remaining 5 games.
The
 remaining
  game is a loss for A. The number of possibilities
is 61 × 54 .

Case 2: The last game ends in a draw. This means A had 3.5 points
after 6 games.
There are three possibilities for A to earn 3.5 points in 6 games:

2.1. 3 wins, 1 draw, and 2 losses


2.2. 2 wins, 3 draws, and 1 loss
2.3. 1 win, 5 draws.

Let’s calculate the number of possibilities for each of these subcases.


 
2.1. There are 6
ways to assign 3 wins to A out of the first 6 games.
 3
There are ways to assign 1 draw out of the remaining 3 games.
3
1
The remaining
   2 games are losses for A. The number of possibili-
ties is 63 × 31 .
CHAPTER 1. PROBABILITY AND COUNTING 7
 
2.2. There are 6
ways to assign 2 wins to A out of the first 6 games.
 2
There are 43 ways to assign 3 draws out of the remaining 4 games.
The
 remaining
  game is a loss for A. The number of possibilities
is 62 × 43 .
 
2.3. There are 61 ways to assign 1 win to A out of the first 6 games.
The remaining
  5 games are losses for A. The number of possibili-
ties is 61 .

The total number of possibilities then is:

6 6 4 6 5 6 3 6 4 6
! ! ! ! ! ! ! ! ! !
1+ + × + × + × + × +
3 2 2 1 4 3 1 2 3 1

1.1.8 problem 10
(a) Case 1: Student takes exactly one statistics course.
 
There are 5 choices for the statistics course. There are 15
6
choices of
selecting 6 non-statistics courses.
Case 2: Student takes exactly two statistics courses.
   
There are 52 choices for the two statistics course. There are 15
5
choices of selecting 5 non-statistics courses.
Case 3: Student takes exactly three statistics courses.
   
There are 53 choices for the three statistics course. There are 15
4
choices of selecting 4 non-statistics courses.
Case 4: Student takes exactly four statistics courses.
   
There are 54 choices for the four statistics course. There are 15
3
choices of selecting 3 non-statistics courses.
Case 5: Student takes all the statistics courses.
 
There are 15
2
choices of selecting 2 non-statistics courses.
So the total number of choices is

5 15 5 15 5 15 5 15 5 15
! ! ! ! ! ! ! ! ! !
× + × + × + × + ×
1 6 2 5 3 4 4 3 5 2
CHAPTER 1. PROBABILITY AND COUNTING 8

An Alternative Approach
 
There are 20 7
choices of selecting 7 courses which is the maximum
number of choices if there were no restriction as choosing at least one
statistics course.
 
15
7
is the number of choices without any statistics course.
So the total number of choices with at least one statistics course is

20 15
! !

7 7
   
(b) It is true that there are 51 ways to select a statistics course, and 19
6
ways to select 6 more courses from the remaining 19 courses, but this
procedure results in overcounting.
For example, consider the following two choices.

(a) STAT110, STAT134, History 124, English 101, Calculus 102, Physics
101, Art 121
(b) STAT134, STAT110, History 124, English 101, Calculus 102, Physics
101, Art 121

Notice that both are selections of the same 7 courses.

1.1.9 problem 11
(a) Each of the n inputs has m choices for an output, resulting in

mn

possible functions.

(b) If n ≥ m, at least two inputs will be mapped to the same output, so


no one-to-one function is possible.
If n < m, the first input has m choices, the second input has m − 1
choices, and so on. The total number of one-to-one functions then is
m!
m(m − 1)(m − 2) . . . (m − n + 1) =
(m − n)!
CHAPTER 1. PROBABILITY AND COUNTING 9

1.1.10 problem 12
(a)
52
!

13

(b) The number of ways to break 52 cards into 4 groups of size 13 is


    
52 39 26 13
13 13 13 13
4!
.
The reason for dividision by 4! is that all permutations of specific 4
groups describe the same way to group 52 cards.
Since we do care about the order of the 4 groups, we should not divide
by 4!. The final answer then is

52 39 26 13
! ! ! !

13 13 13 13

(c) The key is to notice that the sampling is done without replacement.
 4  
52
13
assumes that all four players have 52
13
choices of hands available
to them. This would be true if sampling was done with replacement.

1.1.11 problem 13
The problem amounts to sampling with replacement where order does not
matter, since having 10 copies of each card amounts to replacing the card.
This is done using the Bose-Einstein method.
Thus, the answer is

52 + 10 − 1 61
! !
=
10 10

1.1.12 problem 14
There are 4 choices for sizes and 8 choices for toppings, of which any combi-
nation (including no toppings) can be selected.
CHAPTER 1. PROBABILITY AND COUNTING 10
 
The total number of possible choices of toppings is 8i=0 8i = 28 = 256.
P

Thus, the total number of possible size-topping combinations is 4 ∗ 256 =


1024.
We wish to sample two pizzas, with replacement,
  out of the 1024 possi-
bilities. By Einstein-Bose, there are a total of 2 choices.
1025

A common mistake is to use multiplication rule to get (28 ) ∗ (28 ) as total


possible combinations for two pizzas, and try to adjust for overcounting by
dividing the result with 2 (as order between pizzas doesn’t matter). This
fails because the possibilities with identical pizzas are counted only once.

1.2 Story Proofs


1.2.1 problem 17
 
2n
n
counts the number of ways to sample n objects from a set of 2n. Instead
of sampling from the whole set, we can break the set into two sets of size n
each. Then, we have to sample n objects in total from both sets.
We can sample all n objects from the first set, or 1 object from the first
set and n − 1 objects from the second set, or 2 objects from the first set and
n − 2 objects from
  the second set and so on.   
There are nn ways to sample all n objects from the first set, n1 n−1 n

waystosample
 1 object from the first set and n − 1 objects from the second
set, 2 n−2 ways to sample 2 objects from the first set and n − 2 objects
n n

from the second set. The pattern is clear


n
! ! n
!2
n n n
=
X X

k=0 k n−k k=0 k

1.2.2 problem 18
Consider the right hand side of the equation. Since a committe chair can
only be selected from the first group, there are n ways
 to choose them.
Then, for each choice of a committee chair, there are 2n−1
n−1
ways to choose
 
the remaining members. Hence, the total number of committees is n 2n−1n−1
.
Now consider the left side of the equation. Suppose we pick k people
from the first group and n − k people from the second group, then there are
k ways to assign a chair from the members of the first group we have picked.
CHAPTER 1. PROBABILITY AND COUNTING 11
    2
k can range from 1 to n giving us a total of nk=1 k nk n−k
n
= nk=1 k nk
P P

possible committees.
Since, both sides of the equation count the same thing, they are equal.

1.2.3 problem 21
(a) Case 1: If Tony is in a group by himself, then we have to break the
remaining n people into k − 1 groups. This can be done in
( )
n
k−1
ways.
Case 2: If Tony is not in a group by himself, then we first break up the
remaining n people into k groups. Then, Tony can join any of them.
The number of possible groups then is
( )
n
k
k

Case 1 and 2 together count the number of ways to break up n + 1


people into k non empty groups, which is precisely what the left side
of the equation counts.

(b) Say Tony wants to have m in his group. That is to say he does not
want n − m people. These n − m people must then be broken into k
groups.
The number of people Tony wants to join his group can range from 0
to n − k. The reason for the upper bound is that at least k people are
required to make up the remaining k groups.
Taking the sum over the number of people in Tony’s group we get
n−k
!( )
X n n−j
j=0 j k

Now, instead of taking the sum over the number of people Tony wants
in his group, we can equivalently take the sum over the number of
people Tony does not want in his group. Hence,
CHAPTER 1. PROBABILITY AND COUNTING 12

n−k k
!( ) !( )
n n−j n i
=
X X

j=0 j k i=n i k
Since the sum counts all possible ways to group n + 1 people into k + 1
groups, we have
k
n+1
!( ) ( )
n i
=
X

i=n i
k k+1
as desired.

1.2.4 problem 22
(a) Let us count the number of games in a round-robin tournament with
n + 1 participants in two ways.
Method 1: Since every player plays against all other players exactly
once, the problem reduces
 to finding the number of ways to pair up
n + 1 people. There are n+1
2
ways to do so.
Method 2: The first player participates in n games. The second one
also participates in n games, but we have already counted the game
against the first player, so we only care about n − 1 games. The third
player also participates in n games, but we have already counted the
games against the first and second players, so we only care about n − 2
games.
In general, player i will participate in n + 1 − i games that we care
about. Taking the sum over i we get

n + (n − 1) + (n − 2) + · · · + 2 + 1

Since both methods count the same thing, they are equal.
An Alternative Approach The RHS expression counts number of
ways to pair two people from a group of n + 1 people of different ages.
Let’s say the eldest person in the subgroup of two is also the eldest
person in the whole group. Then, we would have n people to choose
from as the second person of the sub group. If subgroup’s eldest is
the second-eldest in the whole group, we’d have n − 1 people to choose
from, and so on all the way to 1. By adding these cases, we get the
CHAPTER 1. PROBABILITY AND COUNTING 13

LHS expression which covers all possibilities of pairing two people from
group of n + 1, and hence is equivalent to RHS.

(b) LHS: If n is chosen first, then the subsequent 3 numbers can be any of
0, 1, . . . , n − 1. These 3 numbers are chosen with replacement resulting
in n3 possibilities. Summing over possible values of n we get 13 + 23 +
· · · + n3 total number of possibilities.
RHS: We can count the number of permutations of the 3 numbers
chosen with replacement from a different perspective. The 3 numbers
can either all be distinct, or all be the same, or differ in exactly 1 value.
Case 1: All 3 numbers are distinct.
Selecting 4 (don’t forget the very
 first, largest selected number) dis-
tinct numbers can be done in 4 ways. The 3 smaller numbers are
n+1
 
free to permute amongst themselves. This gives us a total of 6 n+1
4
possibilities.
Case 2: All 3 numbers are the same.
In this case, we have to select 2 digits. The smaller digit will be sampled
3 times and there are no ways  to permute identical numbers, so the
number of possiblities is 2 .n+1

Case 3: Two of the 3 numbers are distinct.


In this case, we have to select 3 digits in total. One of the smaller 2
digits will be sampled twice, giving us 3 permutations. Since, there
are 2 choices for which digit gets sampled twice, we get a total
 of 6
permutations. The total number of possibilities then is 6 3 .
n+1

Adding up the number of possibilities in each of the cases we get a total


of
n+1 n+1 n+1
! ! !
6 +6 +
4 3 2
possibilities.
Since the LHS and the RHS count the same set, they are equal.
CHAPTER 1. PROBABILITY AND COUNTING 14

1.3 Naive Definition Of Probability


1.3.1 problem 23
We are interested in the case of 3 consecutive floors. There are 7 equally
likely possibilities

(2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9, 10).

For each of this possibilities, there are 3 ways for 1 person to choose
button, 2 for second and 1 for third (3! in total by multiplication rule).
So number of favorable combinations is

7 ∗ 3!

Generally each person have 9 floors to choose from so for 3 people there
are 93 combinations by multiplication rule.
Hence, the probability that the buttons for 3 consecutive floors are pressed
is
7 ∗ 3!
93

1.3.2 problem 26
(a) The problem is isomorphic (having same structure) to the birthday
problem. When sampling with replacement, each person corresponds
to a date in the birthday problem, and the size of sample corresponds
to the number of people in birthday problem. Hence, taking a random
sample of 1000 from a population of a million corresponds to asking a
thousand people their birth date where there are a total of a million
dates. Number of ways to take such a sample is K 1000 where K is size
of population. Similarly, number of ways to take sample without re-
placement corresponds to number of ways of having no birthday match
in that situation: K(K − 1) . . . (K − 1000 + 1)

(b)
K(K − 1) . . . (K − 1000 + 1)
P (A) = 1 − P (Ac ) = 1 −
K 1000
where K = 1000000.
CHAPTER 1. PROBABILITY AND COUNTING 15

1.3.3 problem 27
For each of the k names, we sample a memory location from 1 to n with
equal probability, with replacement. This is exactly the setup of the birthday
problem. Hence, the probability that at least one memory location has more
than 1 value is
n(n − 1) . . . (n − k + 1)
P (A) = 1 − P (Ac ) = 1 −
nk
Also, P (A) = 1 if n < k.

1.3.4 problem 30
Suppose the word consists of 7 letters. Once we choose the first letter, the
seventh one has to be the same. Once we choose the second letter, the sixth
one has to be the same. In general, we are free to choose 4 letters. Hence,
the probability that a 7 letter word is a palindrome is
264 1
=
267 263
If the word consists of 8 letters, then there are 268 possible words, but for
a palindrome, the number of letters we are free to choose is still 4. Hence,
the probability is
264 1
= 4
26 8 26

1.3.5 problem 32
Call the two black cards B1 , B2 and the two red cards R1 , R2 . Since every
configuration of the 4 cards is equally likely, each outcome has a probability
of 24
1
of occurance.
Case 1: j = 0.
If both guesses are incorrect, then both of them are black cards. There
are two choices for the configuration of the black cards and for each, there are
two choices for the configuration of the red cards for a total of 4 possibilities.
4 1
P (j = 0) = =
24 6
Case 2: j = 4
CHAPTER 1. PROBABILITY AND COUNTING 16

Notice that to guess all the cards correctly, we only need to guess correctly
the two red cards, which, by symmetry, is as likely as guessing both of them
wrong.
Hence,
1
P (j = 4) = P (j = 0) =
6
Case 3: j = 2
One of the guesses is red the other is black. Like before, there are two
choices for the red and two choices for the black cards. This undercounts the
possibilities by a factor of 2, since we can switch the places of the red and
the black cards. Hence,
2 2 2
P (j = 2) = + =
6 6 3
Notice that getting both right, none right and one right are all the possible
outcomes. Hence,
P (j = 1) = P (j = 3) = 0

1.3.6 problem 35
We can generate a random hand of 13 cards with the desired property by the
following process:

1. Pick a suite to sample 4 cards from

2. Sample 3 cards for each one of the other suites


 
There are 4 suites and 13
4
ways to sample 4 cards for any of one of them.
 3
By the multiplication rule, there are 13
3
ways to sample 3 cards of every
one of the remaining 3 suits.   3
By the multiplication rule, the total number of possibilities is 4 13
4
13
3
.
 
The unconstrained number of 13-card hands is 52 13
.
Since each hand is equally likely, by the naive definition of probability,
the desired likelihood is
  3
4 13 13
4 3
 
52
13
CHAPTER 1. PROBABILITY AND COUNTING 17

1.3.7 problem 36
We can think of the problem as sampling with replacement where order
matters.
There are 630 possible sequences of outcomes. We are interested in the
cases where each face of the die is rolled exactly 5 times. Since each sequence
is equally likely,
 we
 can use the naive definition of probability.  
There are 5 ways to select the dice that fall on a 1. Then, 25
30
5
ways
     
to select the dice falling on a 2, 20
5
falling on a 3, 15
5
falling on a 4, 10
5
 
falling on a 5 and finally, 55 falling on a 6.
Thus, the desired probability is
      
30 25 20 15 10 5
5 5 5 5 5 5
630
Alternatively, imagining the sample space to be a 30 digit long sequence
of 1, 2 . . . 6, we want the cases in which each of 1, 2 . . . 6 numbers appear
exactly five times. There are (5!)
30!
6 ways to arrange such a sequence. Hence,

the probability is
30!
(5!)6 630

1.3.8 problem 37
(a) Ignore all the cards except J, Q, K, A. There are 16 of those, 4 of which
are aces. Each card has an equal chance of being first in the list, so the
answer is 41 .
Source: https://math.stackexchange.com/a/3726869/649082

(b) Ignore all the cards except J, Q, K, A. There are 4 choices for a king, 4
choices for a queen and 4 choices for a jack with 3! permutations of the
cards. Then, there are 4 choices for an ace. The remaining 12 cards
3
can be permuted in 12! ways, so the answer is 4 ×3!×4×12!
16!
.

1.3.9 problem 38
(a) There are 12 choices of seats for Tyron and Cersei so that they sit next
to each other (11 cases, where they take i−1 and i positions and 1 case,
CHAPTER 1. PROBABILITY AND COUNTING 18

where they take 1 and 12th position, because table is round). Tyron
can sit to the left or to the right of Cersei. The remaining 10 people
can be ordered in 10! ways, so the answer is
24 × 10! 2
=
12! 11
 
(b) There are 122
choices of seats to be assigned to Tyron and Cersei,
but only 12 choices where they sit next to each other. Since every
assignment of seats is equally likely the answer is
12 2
  =
12 11
2

1.3.10 problem 39
   
There are a total of 2N K
possible committees of K people. There are Nj
ways to select j couples for the committee. K − 2j people need to be selected
from the remaining N − j couples such that only one person is selected from
a couple. First, we select K − 2j couples from the remaining N − j couples.
Then, for each of the selected couples, there are 2 choices for committee
membership.
  
N
j
N −j
K−2j
2K−2j
 
2N
K

1.3.11 problem 40
(a) Counting strictly increasing sequences of k numbers amounts to count-
ing the number of ways to select k elements out of the n, since for
any such selection, there is exactly one increasing ordering. Thus, the
answer is  
n
k
nk
(b) The problem can be thought of sampling with replacement where order
doesn’t matter, since there is only one non decreasing ordering of a
given sequence of k numbers. Thus, the answer is
 
n−1+k
k
nk
CHAPTER 1. PROBABILITY AND COUNTING 19

1.3.12 problem 41
We can treat this problem as sampling numbers 1 to n with replacement
with each number being equally likely. There are nn possible sequences. To
count the number of sequences with exactly one of the numbers missing, we
first select the missing number. There are n ways to do this. The rest of the
numbers have to be sampled at least once with one number being sampled
exactly twice. There are n − 1 choice to select the number that will be
sampled twice. Finally, we have n sampled numbers which can be ordered in
any of n!2 ways, since one of the numbers is repeated. Thus, the answer is

n(n − 1) n!2 n(n − 1)n!


=
n n 2nn

1.4 Axioms Of Probability


1.4.1 problem 43
(a) Inequality can be demonstrated using the first property of probabilities,

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

and the first axiom of probabilities,

P (S) = 1.

P (A) + P (B) − P (A ∩ B) ≤ 1 =⇒ P (A) + P (B) − 1 ≤ P (A ∩ B).


Strict equality holds if and only if A ∪ B = S where S is the sample
space.

(b) Since A ∩ B ⊆ A ∪ B, P (A ∩ B) ≤ P (A ∪ B) by the second property


of probabilites.
Strict equality holds if and only if A = B.

(c) Inequality follows directly from the first property of probabilities with
strict equality if and only if P (A ∩ B) = 0.
CHAPTER 1. PROBABILITY AND COUNTING 20

1.4.2 problem 44
Since B = (B − A) ∪ A, P (B) = P (A) + P (B − A) by the second axiom of
probability. Rearranging terms,

P (B − A) = P (B) − P (A)

1.4.3 problem 45
B △ A = (A ∪ B) − (A ∩ B). By problem 44,

P (B △ A) = P (A ∪ B) − P (A ∩ B)
= P (A) + P (B) − P (A ∩ B) − P (A ∩ B)
= P (A) + P (B) − 2P (A ∩ B)

1.4.4 problem 46
Bk = Ck − Ck+1 . Since Ck+1 ⊆ Ck , P (Bk ) = P (Ck ) − P (Ck+1 ).

1.4.5 problem 47
(a) Consider the experiment of flipping a fair coin twice. The sample space
S is {HH, HT, T H, T T }. Let A be the event that the first flip lands
heads and B be the event that the second flip lands heads. P (A∩B) = 41
since A ∩ B corresponds to the outcome HH.
On the other hand, A corresponds to the outcomes {HH, HT } and B
corresponds to the outcomes {HH, T H}. Thus, P (A) = P (B) = 12 .
Since P (A ∩ B) = P (A)P (B), A and B are independent events.

(b) A1 and B1 should intersect such that the ratio of the area of A1 ∩ B1
to the area of A1 equals the ratio of the area of B1 to the area of R.
As a simple, extreme case, if A1 = B1 , then A and B are dependent,
since the condition above is violated.
CHAPTER 1. PROBABILITY AND COUNTING 21

(c)

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= P (A) + P (B) − P (A)P (B)
= P (A)(1 − P (B)) + P (B)
= P (A)P (B c ) + P (B)
= P (A)P (B c ) + 1 − P (B c )
= 1 + P (B c )(P (A) − 1)
= 1 − P (B c )P (Ac )

1.5 Inclusion Exclusion


1.5.1 problem 49
Let Ai be the event that i is never rolled for 1 ≤ i ≤ 6. The event of
6
interested then is
S
Ai .
i=1
6
By inclusion-exclusion, P ( Ai ) = P (Ai ) − P (Ai ∩ Aj ) +
S P6 P
i=1 1≤i<j≤6
i=1
6
P (Ai ∩ Aj ∩ Ak ) − · · · − P ( Ai ).
P T
1≤i<j<k≤6
i=1
Now,
n
P (Ai ) = 56n = ( 65 )n
n
P (Ai ∩ Aj ) = 64n = ( 64 )n
n
P (Ai ∩ Aj ∩ Ak ) = 63n = ( 63 )n
n
P (Ai ∩ Aj ∩ Ak ∩ Aw ) = 26n = ( 26 )n
n
P (Ai ∩ Aj ∩ Ak ∩ Aw ∩ Az ) = 61n = ( 16 )n
6
P( Ai ) = 0
T
i=1
6        
Thus, P ( Ai ) = 6( 56 )n − 6
( 46 )n + 6
( 63 )n − 6
( 26 )n + 6
( 16 )n
S
2 3 4 5
i=1

1.5.2 problem 52
Let Ai be the event that the i-th student takes the same seat on both days.
20
The desired probability then is 1−P ( Ai ). By inclusion exclusion principle,
S
i=1
CHAPTER 1. PROBABILITY AND COUNTING 22

20
P( Ai ) = P (Ai ) − P (Ai ∩ Aj ) + P (Ai ∩ Aj ∩ Ak ) − · · · +
S P P P
i i<j i<j<k
i=1
(−1) P (A1 ∩ · · · ∩ A20 ),
21

where P (Ai ) = 20!


19!
, P (Ai ∩ Aj ) = 18!
20!
and so on by naive definition of proba-
bility.
Hence,

20 20
1 1 1 1
P( Ai ) = + − ··· +
[ X X X

i=1 i=1 20 1≤i<j≤20 20 ∗ 19 1≤i<j<k≤20 20 ∗ 19 ∗ 18 20!
20 1 20 1 1
! !
=1− + − ··· +
2 20 ∗ 19 3 20 ∗ 19 ∗ 18 20!
1 1 1
= 1 − + − ··· +
2! 3! 20!
≈1−e −1

1.5.3 problem 53
(a) 628 − 368

(b) 628 − 368 − 368 + 108

(c) 628 − 368 − 368 − 108 + 2(628 − 368 − 528 + 268 ) + 628 − 368 − 368 + 108

1.5.4 problem 55
(153)(222)
(a)
(375)
(375)−(275)−(255)−(225)+(155)+(105)+(125)
(b)
(375)

1.6 Mixed Practice


1.6.1 problem 56
(a) >

(b) <
CHAPTER 1. PROBABILITY AND COUNTING 23

(c) =
We are interested in two outcomes of the samme sample space. This is,
S = {(a1 , a2 , a3 ) : ai ∈ {1, 2, 3, ..., 365}} The first outcome is (1, 1, 1),
and the second outcome is (1, 2, 3). The answer follows, since every
outcome of the sample space is equally likely.

(d) <
If the first toss is T , Martin can never win, since as soon as H is seen
on any subsequent toss, the game stops, and Gale is awarded the win.
If the first toss is H, then if the second toss is also H, Martin wins. Oth-
erwise, if the second toss is T , Gale wins, since as soon as a subsequent
toss shows H, Gale is awarded a win.
Thus, Martin loses 3
4
of the time.

1.6.2 problem 57
S22
10
The desired event can be expressed as Ai , where Ai is the event that
i=1
the i-th molecule in my breath is shared with Caesar. We can compute the
desired probability using inclusion exclusion.
Since every molecule in the universe is equally likely to be shared with
Caesar, and we assume our breath samples molecules with replacement,
n
P ( Ai ) = ( 10122 )n .
T
i=1
Thus,

22 22
10 10
1
 i
P( Ai ) = (−1)i+1
[ X

i=1 i=1 1022


1
 1022
= 1−
1022
≈ e−1

1.6.3 problem 58
Explanation: https://math.stackexchange.com/questions/1936525/inclusion-
exclusion-problem
CHAPTER 1. PROBABILITY AND COUNTING 24

(a) Let A be the event that at least 9 widgets need to be tested.


 
8
3
3!9!
P (A) = 1 − P (A ) = 1 −
c
12!

(b) Similar to part a,


 
9
3
3!9!
P (A) = 1 − P (A ) = 1 −
c
12!

1.6.4 problem 59
 
(a) 15+9
9
 
(b) 5+9
9

(c) Each of 15 bars can be given to any of 10 children, so by orderd sampling


with replcement formula we have 1015 combinations

(d) To count amount of suitable combinations, we can subtract amount of


combination, where at least one child doesn’t get any bars (is example
of inclusion-exclusion
 usage
 case) from total amount of combinations.
10 − i=1 (−1) i+1 10
(10 − i)15
15 P9
i

1.6.5 problem 60
(a) nn
 
(b) 2n−1
n−1

(c) The least likely bootstrap sample is one where a1 = a2 = · · · = an .


Such a sample occurs with probability n1n . The most likely bootstrap
sample is one where all the terms are different. Such a sample occurs
with probability nn!n . Thus, the ratio of the probabilities is n!

1.6.6 problem 62
(a) 1 − k!ek (→

p)
CHAPTER 1. PROBABILITY AND COUNTING 25

(b) Consider the extreme case where p1 = 1 and pi = 0 for i ̸= 1. Then, the
probability that there is at least one birthday match is 1. In general, if
1
pi > 365 for a particular i, then a birthday match is more likely, since
that particular day is more likely to be sampled multiple times. Thus,
it makes intuitive sense that the probability of at least one birthday
match is minimized when pi = 365 1
.
(c) First, consider ek (x1 , ..., xn ). We can break up this sum into the sum
of three disjoint cases.
(a) Sum of terms that contain both x1 and x2 . This sum is given by
x1 x2 ek−2 (x3 , ..., xn )
(b) Sum of terms that contain either x1 or x2 but not both. This sum
is given by (x1 + x2 ) ek−1 (x3 , ..., xn )
(c) Sum of terms that don’t contain either x1 or x2 . This sum is give
by ek (x3 , ..., xn )
Thus,

ek (x1 , ..., xn ) = x1 x2 ek−2 (x3 , ..., xn )+(x1 +x2 )ek−1 (x3 , ..., xn )+ek (x3 , ..., xn )

Next, compare ek (→−p ) and ek (→



r ). Expanding the elementary symmet-
ric polynomials, it is easy to see that the only difference between the
two are the terms that contain either the first, the second or both terms
from →−
p and →−r respectively.
Notice that because r1 = r2 = p1 +p 2
2
, the sum of the terms with only r1
and only r2 but not both is exactly equal to (p1 + p2 )ek−1 (x3 , ..., xn ).
Thus, the only difference between ek (→ −p ) and ek (→

r ) are the terms
p1 p2 ek−2 (x3 , ..., xn ) and r1 r2 ek−2 (x3 , ..., xn ).
By the arithmetic geometric mean inequailty, r1 r2 ek−2 (x3 , ..., xn ) ≥
p1 p2 ek−2 (x3 , ..., xn ). Hence, 1 − k!ek (→

p ) ≥ 1 − k!ek (→

r ).


In other words, given birthday probabilities p , we can potentially re-
duce the probability of having at least one birthday match by taking
any two birthday probabilities and replacing them with their average.
For a minimal probability of at least one birthday match then, all val-
ues pi in →

p must be equal, so that averaging any pi and pj does not
change anything.
Chapter 2

Conditional Probability

2.1 Conditioning On Evidence . . . . . . . . . . . . . . . . . . . . 26


2.2 Independence and Conditional Independence . . . . . . . . . . 35
2.3 Monty Hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 First-step Analysis and Gambler’s Ruin . . . . . . . . . . . . . 42
2.5 Simpson’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 Mixed Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.1 Conditioning On Evidence


2.1.1 problem 3
Let S be event that a man in the US is a smoker and C be event man has
cancer. From problem conditions:

P (S) = 0.216
P (C|S) = 23P (C|S c )
1
P (C|S c ) = P (C|S)
23

26
CHAPTER 2. CONDITIONAL PROBABILITY 27

Lets use Bayes’ theorem

P (S)P (C|S)
P (S|C) =
P (C)
P (S)P (C|S)
=
P (S)P (C|S) + P (S c )P (C|S c )
P (S) · 23P (C|S c )
=
P (S) · 23P (C|S c ) + P (S c )P (C|S c )
P (S)
=
23P (S) + P (S c )
23 · 0.216
=
23 · 0.216 + 0.784
≈ 0.864

2.1.2 problem 4
(a)

P (K)P (R|K)
P (K|R) =
P (R)
P (K)P (R|K)
=
P (K)P (R|K) + P (K c )P (R|K c )
p
=
p + (1 − p) n1

(b) Since p + (1 − p) n1 ≤ 1, P (K|R) ≥ p with strict equality only when


p = 1. This result makes sense, since if Fred gets the answer right, it
is more likely that he knew the answer.

2.1.3 problem 5
By symmetry, all 50 of the remaining cards are equally likely. Thus, the
probability that the third card is an ace is 50
3
.
We can reach the same answer using the definition of conditional prob-
ability. Let A be the event that the first card is the Ace of Spades, B be
the event that the second card is the 8 of Clubs and C be the event that the
CHAPTER 2. CONDITIONAL PROBABILITY 28

third card is an ace. Then,

P (C, A, B) 3∗49!
3
P (C|A, B) = = 52!
=
P (A, B) 50!
52!
50

2.1.4 problem 6
Let H be the event that 7 tosses of a coin land Heads. Let A be the event
that a randomly selected coin is double-headed.

P (A)P (H|A) 1
P (A|H) = = 100
P (A)P (H|A) + P (Ac )P (H|Ac )
 7
1
100
+ 99
100
∗ 1
2

2.1.5 problem 7
(a)

P (D)P (H|D)
P (D|H) =
P (H)
P (D)P (H|D)
=
P (D)P (H|D) + P (Dc )P (H|Dc )
 7
(
1 1
2 100
+ 99
100 2
1
)
=  7  7
(
1 1
2 100
+ 99 1
100 2
) + 1 1
2 2

= 0.69

(b) Let C be the event that the chosen coin is double-headed.

P (C|H) = P (D|H)P (C|D, H) + P (Dc |H)P (C|Dc , H)


= 0.69 ∗ 0.56 + 0
= 0.39

2.1.6 problem 8
Let A1 be the event that the screen is produced by company A, B1 be the
event that the screen is produced by company B, and C1 be the event that
CHAPTER 2. CONDITIONAL PROBABILITY 29

the screen is produced by company C. Let D be the event that the screen is
defective.
P (A1 )P (D|A1 )
P (A1 |D) =
P (A1 )P (D|A1 ) + P (Ac1 )P (D|Ac1 )
P (A1 )P (D|A1 )
=
P (A1 )P (D|A1 ) + P (A1 )(P (B1 |Ac1 )P (D|B1 , Ac1 ) + P (C1 |Ac1 )P (D|C1 , Ac1 ))
c

0.5 ∗ 0.01
=
0.5 ∗ 0.01 + 0.5 ∗ (0.6 ∗ 0.02 + 0.4 ∗ 0.03)
= 0.29

2.1.7 problem 9
(a) P (A1 |B) = P (A1 )P (B|A1 )
P (B)
= P (A1 )
P (B)
= P (A2 ) P (A2 )P (B|A2 )
P (B) P (B)
= P (A2 |B).

(b) If B is implied by both A1 and A2 , knowing that B occured does not


tip the probability of occurence in favor of either A1 or A2 .
For example, let A1 be the event that the card in my hand is the Ace
of Spades. Let A2 be the event that the card in my hand is the Ace of
Hearts. Let B be the event that there are 3 aces left in the deck.
B is implied by both A1 and A2 , and P (A1 ) = P (A2 ). Knowing that B
occured does not give one any information on whether they are holding
the Ace of Spades or the Ace of Hearts, since B would have occured in
both cases. Thus, P (A1 |B) = P (A2 |B).

2.1.8 problem 10
(a)

P (A3 |A1 ) = P (A2 |A1 )P (A3 |A2 , A1 ) + P (Ac2 |A1 )P (A3 |Ac2 , A1 )
= 0.8 ∗ 0.8 + 0.2 ∗ 0.3 = 0.7

(b)

P (A3 |Ac1 ) = P (A2 |Ac1 )P (A3 |A2 , Ac1 ) + P (Ac2 |Ac1 )P (A3 |Ac2 , Ac1 )
= 0.3 ∗ 0.8 + 0.7 ∗ 0.3 = 0.45
CHAPTER 2. CONDITIONAL PROBABILITY 30

P (A3 ) = P (A1 )P (A3 |A1 ) + P (Ac1 )P (A3 |Ac1 )


= 0.75 ∗ 0.7 + 0.25 ∗ 0.45 = 0.64

2.1.9 problem 11
Using the odds form of Baye’s Theorem,

P (A|W ) P (A) P (W |A)


=
P (A |W )
c P (Ac ) P (W |Ac )
0.6 P (A) 0.7
=
0.4 P (Ac ) 0.3
P (A)
= 0.643
P (Ac )

P (A) = 0.39

2.1.10 problem 12
(a) Let Ai be the event that Alice sends bit i. Let Bj be the event that
Bob recieves bit j.

P (A1 )P (B1 |A1 )


P (A1 |B1 ) =
P (A1 )P (B1 |A1 ) + P (A0 )P (B1 |A0 )
0.5 ∗ 0.9
=
0.5 ∗ 0.9 + 0.5 ∗ 0.05
= 0.95

(b) Let Bj,k,l be the event that Bob recieves bit tuple j, k, l.

P (A1 )P (B110 |A1 )


P (A1 |B110 ) =
P (A1 )P (B110 |A1 ) + P (A0 )P (B110 |A0 )
0.5 ∗ 0.92 ∗ 0.1
=
0.5 ∗ 0.92 ∗ 0.1 + 0.5 ∗ 0.052 ∗ 0.95
= 0.97
CHAPTER 2. CONDITIONAL PROBABILITY 31

2.1.11 problem 13
(a) Let B be the event that the test done by company B is successfull. Let
A be the event that the test done by company A is successfull. Let D
be the event that a random person has the disease.

P (B) = P (D)P (B|D) + P (Dc )P (B|Dc )


= 0.01 ∗ 0 + 0.99 ∗ 1
= 0.99

P (A) = P (D)P (A|D) + P (Dc )P (A|Dc )


= 0.01 ∗ 0.95 + 0.99 ∗ 0.95
= 0.95

Thus, P (B) > P (A).

(b) Since the disease is so rare, most people don’t have it. Company B
diagnoses them correctly every time. However, in the rare cases when
a person has the disease, company B fails to diagnose them correctly.
Company A however shows a very good probability of an accurate
diagnoses for afflicted patients.

(c) If the test conducted by company A has equal specifity and sensitivity,
then it’s accuracy surpasses that of company B’s test if the specifity
and the sensitivity are larger than 0.99. If company A manages to
achieve a specifity of 1, then any positive sensitivity will result in a
more accurate test. If company A achieves a sensitivity of 1, it still
requires a specificity larger than 0.98, since positive cases are so rare.

2.1.12 problem 14
(a) Intuitively, P (A|B) > P (A|B c ), since Peter will be in a rush to install
his alarm if he knows that his house will be burglarized before the end
of next year.
CHAPTER 2. CONDITIONAL PROBABILITY 32

(b) Intuitively P (B|Ac ) > P (B|A), since Peter is more likely to be robbed
if he doesn’t have an alarm by the end of the year.

(c) See https://math.stackexchange.com/a/3761508/649082.

(d) An explanation might be that in part a, we assume Peter to be driven


to not let burglars rob him, but in part b we assume the burglars to
not necessarily be as driven, since we assume that if the burglers know
that Peter will install an alaram before the end of the next year they
might not rob him. If the burglers are driven, they might actually be
more inclined to rob Peter sooner, before he actually installs the alarm.

2.1.13 problem 15
Given the inequailities and the fact that P (A∩B) = P (A)+P (B)−P (A∪B),
to maximize P (A ∩ B) we maximize the smallest of the three expressions.
Namely, P (A). Thus, we would like to know that event A occured.

2.1.14 problem 16
P (A) = P (B)P (A|B) + P (B c )P (A|B c ).
Given P (A|B) ≤ P (A), if P (A|B c ) < P (A), then the right hand side
of the equation above is strictly less than the left hand side, and we have a
contradiction.
We can intuitively think of this problem as asking "How likely is X to be
elected as president?" and hearing "It depends" in response. The implication
is that there exists some latent event (major states vote against X) that
reduces the chances of X getting elected, and if we know that the former
does not occure, the chances of X getting elected improve.

2.1.15 problem 17
(a) P (B|A) = P (B)P (A|B)
P (B)P (A|B)+P (B c )P (A|B c )
= 1 =⇒ P (B c )P (A|B c ) = 0.
Since P (B c ) ̸= 0 by assumption, P (A|B c ) = 0 =⇒ P (Ac |B c ) = 1.

(b) Let A and B be independent events. Then, P (B|A) ≈ 1 =⇒ P (B) ≈


1. Thus, P (B c ) ≈ 0, and so the term P (A|B c ) in the denominator in
part a may be large, implying P (Ac |B c ) ≈ 0.
CHAPTER 2. CONDITIONAL PROBABILITY 33

For example, consider a deck of 52 cards, where all but one of the cards
are the Queen of Spades. Let A be the event that the first turned card
is a Queen of Spades, and let B be the event that the second turned
card is a Queen of Spades, where sampling is done with replacement.
Then, P (A) = P (B) ≈ 1. Then, by independence, P (A|B c ) ≈ 1 =⇒
P (Ac |B c ) ≈ 0.

2.1.16 problem 18
P (B) = P (A ∩ B) + P (Ac ∩ B).
P (Ac ∩ B) = P (Ac )P (B|Ac ) = 0, since P (Ac ) = 0.
Thus, P (B) = P (A ∩ B) = P (B)P (A|B) =⇒ P (A|B) = 1.

2.1.17 problem 19
See https://math.stackexchange.com/q/3292400/649082

2.1.18 problem 20
(a) Since the second card is equally likely to be any of the remaining 3
cards, the probability that both cards are queens is 13 .

(b) Our sample space now consists of all order pairs of the two queens and
the two jacks, where at least one card is a queen. Since all the outcomes
are equally likely, the answer is 10
2
= 15 .

(c) Now, the sample space consists of all order pairs of the two queens and
the two jacks, where one of the cards is the Queen of Hearts. Thus,
the answer is 26 = 13 .

2.1.19 problem 21
(a) The sample space is (H, H, H), (H, H, T ), (H, T, H), (T, H, H). Since
each outcome is equally likely, the answer is 41 .

(b) Since the last throw is independent of the first two, the probability that
all three throws landed heads given two of them landed heads equals
the probability that the third throw landed heads, which is 12 .
CHAPTER 2. CONDITIONAL PROBABILITY 34

2.1.20 problem 27
Let G be the event that the suspect is guilty. Let T be the event that one of
the criminals has blood type 1 and the other has blood type 2.
Thus,
P (G)P (T |G) pp2 p
P (G|T ) = = =
P (G)P (T |G) + P (G )P (T |G )
c c pp2 + (1 − p)2p1 p2 p + 2p1 (1 − p)
For P (G|T ) to be larger than p, p1 has to be smaller than 12 . This result
makes sense, since if p1 = 21 , then half of the population has blood type 1,
and finding it at the crime scene gives us no information as to whether the
suspect is guilty.

2.1.21 problem 28
P (D) P (T |D)
(a) P (D|T )
P (Dc |T )
= P (Dc ) P (T c |Dc )
.

(b) Suppose our population consists of 10000 people, and only one percent
of them is afflicted with the disease. So, 100 people have the disease
and 9900 people don’t. Suppose the specificity and sensitivity of our
test are 95 percent. Then, out of the 100 people who have the disease,
95 test positive and 5 test negative, and out of the 9900 people who do
not have the disease, 9405 test negative and 495 test positive.
Thus, P (D|T ) = 95
95+495
.
Here, we can see why specificity matters more than sensitivity. Since,
the disease is rare, most people do not have it. Since specificity is
measured as a percentage of the population that doesn’t have the dis-
ease, small changes in specificity equate to much larger changes in the
number of people than in the case of sensitivity.

2.1.22 problem 29
Let Gi be the event that the i-th child is a girl. Let Ci be the event that the
i-th child has property C.
0.25(2p−p2 )
P (G1 ∩ G2 |(G1 ∩ C1 ) ∪ (G2 ∩ C2 )) = 0.5p+0.5p−0.25p 2 = 2−0.5p = 4−p .
0.5(2−p) 2−p

This result confirms the idea that the more rare characteristic C is, the
closer we get to specifying which child we mean when we say that at least
one of the children has C.
CHAPTER 2. CONDITIONAL PROBABILITY 35

2.2 Independence and Conditional Indepen-


dence
2.2.1 problem 33
(a) 1
2|C|
 |A|
(b) 1
2

(c) Let p be a randomly selected person from C sampled without replace-


ment.
P (p ∈ A ∪ p ∈ B) = 1
2
+ 21 − 1
4
= 34 .
 |C|
P (A ∪ B = C) = (P (p ∈ A ∪ p ∈ B))|C| = 3
4
.

2.2.2 problem 34
(a) A and B are not independent, since knowing that A occured makes Gc
more likely, which in turn makes B makes more likely.
P (G)P (Ac |G)
(b) P (G|Ac ) = P (G)P (Ac |G)+P (Gc )P (Ac |Gc )
= g(1−p1 )
g(1−p1 )+(1−g)(1−p2 )

(c) P (B|Ac ) = P (G|Ac )P (B|G, Ac )+P (Gc |Ac )P (B|Gc , Ac ) = g(1−p1 )


p+
g(1−p1 )+(1−g)(1−p2 ) 1
(1 − g(1−p1 )
)p
g(1−p1 )+(1−g)(1−p2 ) 2

2.2.3 problem 36
(a) Since any applicant who is good at baseball is accepted to the college,
the proportion of admitted students good at baseball is higher than the
proportion of applicants good at baseball, because applicants include
people who aren’t good at either math or baseball.

(b) Let S denote the sample space. Then,


P (A|B, C) = P (A|B) = P (A) = P (A|S) < P (A|C).

2.2.4 problem 37
See https://math.stackexchange.com/a/3789043/649082
CHAPTER 2. CONDITIONAL PROBABILITY 36

2.2.5 problem 38
Let S be the event that an email is spam. Let L = W1c , ..., W22
c c
, W23 , W24 c
, ..., W64 c
, W64 , W65 , W66 , ..., W
Let q = j (1 − pj ) where 1 ≤ j ≤ 100 : j ∈ / 23, 64, 65.
Q

Let x = j (1 − rj ) where 1 ≤ j ≤ 100 : j ∈ / 23, 64, 65.


Q

P (S)P (L|S) pp23 p64 p65 q


P (S|L) = = .
P (L) pp23 p64 p65 q + (1 − p)r23 r64 r65 x

2.3 Monty Hall


2.3.1 problem 41
Let Gi be the event that the i-th door contains a goat, and let Di be the
event that Monty opens door i.

P (G1 )P (D2 , G2 |G1 )


P (G1 |D2 , G2 ) = .
P (D2 , G2 )

2
P (G1 )P (D2 , G2 |G1 ) = (P (G2 |G1 )P (D2 |G1 , G2 ))
3
2 1 1

= (p + (1 − p) )
3 2 2
2 1 1 1

= p+ − p
3 2 4 4
1
= (p + 1).
6
Thus,

(p
+ 1)
1
P (G1 |D2 , G2 ) = 6
1
6
(p
+ 1) + 13 × 1
2
p+1
= .
p+2
Note that when p = 1, the result matches that of the basic Monty Hall
problem.
CHAPTER 2. CONDITIONAL PROBABILITY 37

2.3.2 problem 42
Let Gi be the event that the i-th door contains a goat, and let Di be the
event that Monty opens door i.
Let S be the event of success under the specified strategy.

(a)

P (S) = P (G1 )P (S|G1 ) + P (Gc1 )P (S|Gc1 )


2
= p+0
3
2
= p.
3

Note that when p = 1, the problem reduces to the basic Monty Hall
problem, and we get the correct solution 23 . In the case when p =
0, Monty never gives the contestant a chance to switch their initial,
incorrect choice to the correct one, resulting in a definite failure under
the specified strategy.

(b)

P (G1 )P (D2 |G1 )


P (G1 |D2 ) =
P (D2 )
P (G1 )P (D2 |G1 )
=
P (G1 )P (D2 |G1 ) + P (Gc1 )P (D2 |Gc1 )
2
p
= 2 6 1
6
p+ 6
2p
= .
2p + 1

Note that if p = 1, the problem reduces to the basic Monty Hall prob-
lem, and the solution matches that of the basic, conditional Monty Hall
problem. If p = 0 on the other hand, then the reason Monty has opened
a door is because the contestant’s initial guess (Door 1) is correct. By
choosing the strategy to switch, the contestant always loses.
CHAPTER 2. CONDITIONAL PROBABILITY 38

2.3.3 problem 43
Let Ci be the event that Door i contains the car. Let Di be the event that
Monty opens Door i. Let Oi be the event that Door i contains the computer,
and let Gi be the event that Door i contains the goat.

(a)

P (C3 )P (D2 , G2 |C3 )


P (C3 |D2 , G2 ) =
P (D2 , G2 )
P (C3 )P (D2 , G2 |C3 )
=
P (C3 )P (D2 , G2 |C3 ) + P (C3c )P (D2 , G2 |C3c )
1
∗ 12
= 1 1 2 3

3
∗ 2 + 3 P (G2 |C3c )P (D2 |G2 , C3c )
1
∗ 12
= 3
1
3
∗ 1
2
+ 23 ∗ 1
4
1
= 6
1
6
+ 1
6
1
= .
2

(b)

P (C3 )P (D2 , O2 |C3 )


P (C3 |D2 , O2 ) =
P (C3 )P (D2 , O2 |C3 ) + P (C3c )P (D2 , O2 |C3c )
P (C3 )P (D2 , O2 |C3 )
=
P (C3 )P (D2 , O2 |C3 ) + P (C3c )P (D2 , O2 |C3c )
1
P (O2 |C3 )P (D2 |O2 , C3 )
= 1 3

3
P (O2 |C3 )P (D2 |O2 , C3 ) + P (C3c )P (D2 , O2 |C3c )
1
∗ 12 ∗ p
= 3
1
3
∗ 21 ∗ p + 23 ∗ 14 ∗ q
1
p
= 6
+ 16 q
1
6
p
p
=
p + (1 − p)
= p.
CHAPTER 2. CONDITIONAL PROBABILITY 39

2.3.4 problem 44
Let Gi be the event that the i-th door contains a goat, and let Di be the
event that Monty opens door i. Let S be the event that the contestant is
successful under his strategy.

(a) There are two scenarios which result in the contestant selecting door
3 and Monty opening door 2. Either the car is behind door 3 and
Monty randomly opens door 2, or doors 3 and 2 contain goats, and
Monty opens door 2. Only the latter scenario results in a win for the
contestant.
Thus,
(p1 + p2 ) p1p+p
1
p1
P (S|D2 , G2 ) = 2
= .
p3 12 + (p1 + p2 ) p1p+p
1
2
p1 + 12 p3

(b) We can slightly modify the scenario in part a where doors 3 and 2
contain goats by multiplying the probability of the scenario by 21 to
accomodate the chance that Monty might open the door with the car
behind it.

1
(p + p2 ) p1p+p
2 1
1 1
p
2 1 p1
P (S|D2 , G2 ) = 2
= = .
p3 2 + 2 (p1 + p2 ) p1p+p
1 1 1
2
p 1
p
2 1
+ 1
p
2 3
p1 + p 3

(c)
p3
P (S|D2 , G2 ) = .
p3 + 12 p1

(d)
p3
P (S|D2 , G2 ) = .
p3 + p 1

2.3.5 problem 45
(a) Since the prizes are independent for each door, and since the strategy
is switch doors every time, what is behind Door 1 is irrelevant.
Possible outcomes for doors 2 and 3 are Goat and Car with probability
2pq, in which case the contestant wins, Car and Car with probability
CHAPTER 2. CONDITIONAL PROBABILITY 40

p2 , in which case the contestant wins again, and Goat and Goat with
probability q 2 , in which case the contestant loses.
Thus,
p2 + 2pq p2 + 2pq
P (S) = = = p2 + 2pq.
p2 + 2pq + q 2 (p + q)2

(b) There are two scenarios in which Monty opens Door 2. Either Door 3
contains a Car and Door 2 contains a Goat, which happens with prob-
ability pq, or both doors contain Goats and Monty randomly chooses
to open Door 2, which happens with probability 21 q 2 . Contestant wins
in the first case and loses in the second case.
Thus,
pq
P (S|D2 , G2 ) = .
pq + 12 q 2

2.3.6 problem 46
Let S be the event of successfully getting the Car under the specified strategy.
Let Ci be the event that Door i contains the Car. Let A be the event that
Monty reveals the Apple, and let Ai be the event that Door i contains the
Apple.

(a)

P (S) = P (S ∩ C1 ) + P (S ∩ C2 ) + P (S ∩ C3 ) + P (S ∩ C4 )
= P (C1 )P (S|C1 ) + P (C2 )P (S|C2 ) + P (C3 )P (S|C3 ) + P (C4 )P (S|C4 )
1 1 1
= ∗ 0 + 3 ∗ ∗ (p + q)
4 4 2
1 1
=3∗ ∗
4 2
3
=
8
CHAPTER 2. CONDITIONAL PROBABILITY 41

(b)
P (A) = P (A ∩ G1 ) + P (A ∩ A1 ) + P (A ∩ B1 ) + P (A ∩ C1 )
= P (G1 )P (A|G1 ) + P (A1 )P (A|A1 ) + P (B1 )P (A|B1 ) + P (C1 )P (A|C1 )
1 1 1
= p+0+ q+ q
4 4 4
1
= (1 + q)
4
(c)
P (S ∩ A) 1
∗ p ∗ 12 + 41 ∗ q ∗ 1 1
P (S|A) = = 4 2
= 8
P (A) 1
4
(1 + q) 1
4
(1 + q)

2.3.7 problem 47
(a) Contestant wins under the "stay-stay" strategy if and only if the Car is
behind Door 1.

1
P (S) =
4
(b) If the Car is not behind Door 1, Monty opens one of the two doors
revealing a Goat. Contestant stays. Then, Monty opens the other
door with a Goat behind it. Finally, contestant switches to the Door
concealing the Car.

P (S) = P (C1 )P (S|C1 ) + P (C1c )P (S|C1c )

3 3
P (S) = 0 + ∗1=
4 4
(c) Under the "switch-stay" strategy, if the Car is behind Door 1 the con-
testant loses. Given that the Car is not behind Door 1, Monty opens
one of the Doors containing a Goat. The contestant will win if they
switch to the Door containing the Car and will lose if they switch to
the Door containing the last remaining Goat.
Thus,

3 1 3
P (S) = P (C1 )P (S|C1 ) + P (C1c )P (S|C1c ) = 0 + ∗ =
4 2 8
CHAPTER 2. CONDITIONAL PROBABILITY 42

(d) Under the "switch-switch" strategy, if the car is behind Door 1, then
Monty opens a door with a Goat behind it. The contestant switches
to a door with a Goat behind it. Monty then opens the last door
containing a Goat, at which point the contestant switches back to the
door containing the Car.
If Door 1 contains a Goat, Monty opens another Door containing a Goat
and presents the contestant with a choice. If the contestant switches
to the remaining door containing a Goat, then Monty is forced to open
Door 1, revealing the final Goat. The contestant switches to the one
remaining Door which contains the Car. If, on the other hand, the con-
testant switches to the door containing the Car, then on the subsequent
switch they lose the game.
Thus,

1 3 1 5
P (S) = ∗1+ ∗ =
4 4 2 8
(e) "Stay-Switch" is the best strategy.

2.4 First-step Analysis and Gambler’s Ruin


2.4.1 problem 49
(a)

P (A2 ) = p1 p2 + q1 q2
1 1
  
= (1 − q1 )(1 − q2 ) + b1 + b2 +
2 2
1 1 1 1
     
= b1 − b2 − + b1 + b2 +
2 2 2 2
1
= + 2b1 b2
2

(b) By strong induction,


1
P (An ) = + 2n−1 b1 b2 ...bn
2
for n ≤ 2.
CHAPTER 2. CONDITIONAL PROBABILITY 43

Suppose the statement holds for all n ≤ k − 1. Let Si be the event that
the i-th trial is a success.

P (Ak ) = pk P (Ack−1 |Sk ) + qk P (Ak−1 |Skc )


1 1
    
= pk 1 − + 2k−2 b1 b2 ...bk−1 + qk + 2k−2 b1 b2 ...bk−1
2 2
1 1
   
= pk − 2 b1 b2 ...bk−1 + qk
k−2
+ 2 b1 b2 ...bk−1
k−2
2 2
1
= + (qk − pk )2k−2 b1 b2 ...bk−1
2
1
= + 2bk 2k−2 b1 b2 ...bk−1
2
1
= + 2k−1 b1 b2 ...bk−1 bk
2

(c) if pi = 1
2
for some i, then bi = 0 and P (An ) = 12 .
if pi = 0 for all i, then bi = 12 for all i. Hence, the term 2k−1 b1 b2 ...bk−1 bk
equals 21 . Thus, P (An ) = 1. This makes sense since the number of
successes will be 0, which is an even number.
if pi = 1 for all i, then bi = − 21 for all i. Hence, the term 2k−1 b1 b2 ...bk−1 bk
will either equal to 12 or − 12 depending on the parity of the number of
trials. Thus, P (An ) is either 0 or 1 depending on the parity of the
number of trials.
This makes sense since, if every trial is a success, the number of suc-
cesses will be even if the number of trials is even. The number of
successes will be odd otherwise.

2.4.2 problem 52
The problem is equivalent to betting 1 increments and having A start with
ki dollars, while B starts with k(N − i) dollars.
Thus, p < 21 ,
 ki
1− q
p
pi =  kN .
1− q
p
CHAPTER 2. CONDITIONAL PROBABILITY 44

Note that,

 ki  ki−1
1− q
p
−ki q
p i 1
lim  kN = lim  kN −1 = lim  k(N −i) = 0.
k→∞
1− q k→∞
−kN q N k→∞ q
p p p

This result makes sense, since p < 21 implies that A should lose a game
with high degree of certainty over the long run.

2.4.3 problem 53
See https://math.stackexchange.com/a/2706032/649082

2.4.4 problem 54
(a) pk = ppk−1 + qpk+1 with boundary condition p0 = 1.
(b) Let Aj be the event that the drunk reaches k before reaching −j. Then,
Aj ⊆ Aj+1 since to reach −(j + 1) the drunk needs to pass −j. Note

that Aj is equivalent to the event that the drunk ever reaches k,
S
j=1
since the complement of this event, namely the event that the drunk
reaches −j before reaching k for all j implies that the drunk never has
the time to reach k.

By assumption, P ( Aj ) = limn→+∞ P (An ). P (An ) can be found as a
S
j=1
result of a gambler’s ruin problem.
If p = 21 ,
n
P (An ) = → 1.
n+k
If p > 21 ,  n
1− q
p
P (An ) =  n+k → 1.
1− q
p

If p < 12 ,  n
1− q !k
p p
P (An ) =  n+k → .
1− q q
p
CHAPTER 2. CONDITIONAL PROBABILITY 45

2.5 Simpson’s Paradox


2.5.1 problem 57
(a) Suppose C1 contains 7 green gummi bears and 8 red ones, M1 contains
1 green gummi bear and 2 red gummi bears, C2 contains 5 green gummi
bears and no red gummi bears, M2 contains 12 green gummi bears and
5 red gummi bears.
The proportion of green gummi bears in C1 is 15 7
, which is larger than
that of M1 , which is 3 . The proportion of green gummi bears in C2 is 55 ,
1

which is larger than that of M2 , which is 12


17
. However, the proportion of
green gummi bears in C1 + C2 is 20 , which is less than that of M1 + M2 ,
12

which is 20
13
.

(b) We can imagine that it is much more difficult to get a green gummi
bear out of a jar with subscript 1 than it is out of a jar with subscript
2. C jars have a lower overall success rate, because most of their green
gummi bears are in C1 , which is harder to sample from compared to
the jars with subscript 2.
Let A be the event that a sampled gummi bear is green. Let B be the
event that the jar being sampled from is an M jar. Let C be the event
that the jar being sampled from has subscript 1.
Then, by Simpson’s Paradox, P (A|B, C) < P (A|B c , C), P (A|B, C c ) <
P (A|B c , C c ), however, P (A|B) > P (A|B c ).

2.5.2 problem 58
(a) If A and B are independent, then

P (A|B, C) = P (A|B c , C) = P (A|C).

P (A|B, C c ) = P (A|B c , C c ) = P (A|C c ).

Thus, Simpson’s Paradox does not hold.

(b) If A and C are independent, then P (A|B, C) < P (A|B c , C) =⇒


P (A|B) < P (A|B c ). Thus, Simpson’s Paradox does not hold.
CHAPTER 2. CONDITIONAL PROBABILITY 46

(c) If B and C are independent, then

P (A|B) = P (C)P (A|B, C) + P (C c )P (A|B, C c ).

P (A|B c ) = P (C)P (A|B c , C) + P (C c )P (A|B c , C c ).

Since P (A|B, C) > P (A|B c , C) and P (A|B, C c ) > P (A|B c , C c ), P (A|B) >
P (A|B c ), so Simpson’s Paradox does not hold.

2.6 Mixed Problems


2.6.1 problem 60
Let D be the event that a person has the disease. Let T be the event that a
person tests positive for the disease.

(a)

P (D)P (T |D)
P (D|T ) =
P (T )
p(P (A|D)P (T |D, A) + P (B|D)P (T |D, B))
=
P (A)P (T |A) + P (B)P (T |B)
 
p 1
a
2 1
+ 12 b1
= 1
2
(P (D|A)P (T |D, A) + P (Dc |A)P (T |Dc , A)) + 21 (P (D|B)P (T |D, B) + P (Dc |B)P
1
p(a1+ b1 )
= 2
1
2
(pa1 + (1 − p)(1 − a2 )) + 21 (pb1 + (1 − p)(1 − b2 ))

(b)

P (A)P (T |A)
P (A|T ) =
P (T )
1
(pa1 + (1 − p)(1 − a2 ))
= 2
P (A)P (T |A) + P (B)P (T |B)
1
(pa1 + (1 − p)(1 − a2 ))
= 1 2

2
(pa1 + (1 − p)(1 − a2 )) + 12 (pb1 + (1 − p)(1 − b2 ))
CHAPTER 2. CONDITIONAL PROBABILITY 47

2.6.2 problem 61
(a)
n
P (D)P ( Ti |D)
T
n
P (D| Ti ) = i=1
\
n
P( Ti )
T
i=1
i=1
p ni=1 a
Q
=
+q
Qn Qn
p i=1 a i=1 b
n
pa
=
pan + qbn

(b)
n
P (D)P ( Ti |D)
T
n
P (D| Ti ) = i=1
\
n
P( Ti )
T
i=1
i=1
n n
p(P (G)P ( Ti |D, G) + P (Gc )P ( Ti |D, Gc ))
T T

= i=1
n
i=1

P( Ti )
T
i=1
p( 12 + a )
1 n
= n
2 0
n
P (G)P ( Ti |G) + P (Gc )P ( Ti |Gc )
T T
i=1 i=1
p( 12 + 21 an0 )
= n n
P (G)(P (D|G)P ( Ti |D, G) + P (Dc |G)P ( Ti |Dc , G)) + P (Gc )(P (D|Gc )P (
T T
i=1 i=1 i
p( 21 + 12 an0 )
= 1
2
+ 12 (pan0 + (1 − p)bn0 )
p(1 + an0 )
=
1 + pan0 + (1 − p)bn0

2.6.3 problem 62
Let D be the event that the mother has the disease. Let Ci be the event that
the i-th child has the disease.
CHAPTER 2. CONDITIONAL PROBABILITY 48

(a)

P (C1c ∩ C2c ) = P (D)P (C1c ∩ C2c |D) + P (Dc )P (C1c ∩ C2c |Dc )
1 1 2
= ∗ +
3 4 3
9
=
12

(b) The two events are not independent. If the elder child has the disease,
the mother has the disease, which means the younger child has prob-
ability 21 of having the disease. Unconditionally, the younger child has
probability 16 of having the disease.

(c)

P (D)P (C1c ∩ C2c |D)


P (D|C1c ∩ C2c ) =
P (C1c ∩ C2c )
1
∗1
= 1 31 4 2
3
∗4+3
1
=
9

2.6.4 problem 63
This problem is similar to the variations on example 2.2.5 (Two Children) in
the textbook.
It is true that conditioned on specific two of the three coins matching,
the probability of the third coin matching is 12 , but the way the problem
statement is phrased, at least two of the coins match. According to the Two
Children problem, the result is no longer 21 . In fact, the probability of all the
coins matching given at least two match is 14 .

2.6.5 problem 64
Let Ri , Gi , and Bi be the events that the i-th drawn ball is red, green or
blue respectively. Let A be the event that a green ball is drawn before a blue
ball.
CHAPTER 2. CONDITIONAL PROBABILITY 49

(a) Note that if a red ball is drawn, it is placed back, as if the experiment
never happened. Draws continue until a green or a blue ball is drawn.
The red balls are irrelevant in the experiment. Thus, the problem
reduces to removing all the red balls, and finding the probability of the
first, randomly drawn ball being green.

P (A) = P (R1 )P (A|R1 ) + P (R1c )P (A|R1c )


g
= rP (A) + (g + b)
g+b
= rP (A) + g

Thus,
g g
P (A) = = .
1−r g+b
(b) We are interested in draws in which the first ball is green. Each com-
pleted sequence of g + b + r draws is equally likely. Since the red balls
are once again irrelevant, we focus on the g + b draws of green or blue
balls.
Thus,
 
g+b−1
g−1 g
P (A) = = .
g+b
 
g+b
g

(c) Let Ai,j be the event that type i occurs before type j. Generalizing
part a, we get
pi
P (Ai,j ) = .
pi + pj

2.6.6 problem 65
(a) All (n+1)! permutations of the balls are equally likely, so the probability
that we draw the defective ball is n+11
irrespective of when we choose
to draw.

(b) Consider the extreme case of the defective ball being super massive
(v >> nw). Then, it is more likely that a person draws the defective
ball rather than a non defective ball, so we want to draw last. On the
CHAPTER 2. CONDITIONAL PROBABILITY 50

other hand, if v is much smaller than nw, then, at any stage of the
experiment, drawing the defective ball is less likely than not, but after
each draw of a non defective ball, the probability of it being drawn
increases since there are less balls left in the urn. Thus, we want to be
one of the first ones to draw.
So the answer depends on the relationship of w and v.

2.6.7 problem 66
Let Si,k be the event that sum after i rolls of the die is k. Let l denote the
roll after which k ≥ 100. Let Xi be the event that the die lands on i.
99
1 X99
P (Sl,100 ) = P (Sl−1,i )P (X100−i |Sl−1,i ) = P (Sl−1,i )
X

i=94 6 i=94
99
1 X99
P (Sl,101 ) = P (Sl−1,i )P (X101−i |Sl−1,i ) = P (Sl−1,i )
X

i=95 6 i=95
99
1 X99
P (Sl,102 ) = P (Sl−1,i )P (X102−i |Sl−1,i ) = P (Sl−1,i )
X

i=96 6 i=96
99
1 X99
P (Sl,103 ) = P (Sl−1,i )P (X103−i |Sl−1,i ) = P (Sl−1,i )
X

i=97 6 i=97
99
1 X99
P (Sl,104 ) = P (Sl−1,i )P (X104−i |Sl−1,i ) = P (Sl−1,i )
X

i=98 6 i=98
99
1 X99
P (Sl,105 ) = P (Sl−1,i )P (X105−i |Sl−1,i ) = P (Sl−1,i )
X

i=99 6 i=99
Thus, Sl,100 is the most likely.

2.6.8 problem 67
(a) Unconditionally, each of the c + g + j donuts is equally likely to be the
last one. Thus, the probability that the last donut is a chocolate donut
is c+g+j
c
.
CHAPTER 2. CONDITIONAL PROBABILITY 51

(b) We are interested in the event that the last donut is chocolate and the
last donut that is either glazed or jelly is jelly. The probability that
the last donut is chocolate is c+g+j
c
. Since any ordering of glazed and
jelly donuts is equally likely, the probability that the last one is a jelly
donut is g+j
j
. Thus, the probability of the desired event is c+g+j
c j
∗ g+j .

2.6.9 problem 68
(a)
P (D|C) P (Dc |C c )
OR = ∗
P (Dc |C) P (D|C c )
Since the disease is rare among both exposed and not exposed groups,
P (Dc |C) ≈ 1 and P (Dc |C c ) ≈ 1. Thus,

P (D|C)
OR ≈ = RR
P (D|C c )

(b)
P (C, D)P (C c , Dc ) P (C)P (D|C)P (C c )P (D|C c )
= = OR
P (C, Dc )P (C c , D) P (C)P (Dc |C)P (C c )P (D|C c )

(c) Since P (C, D) also equals P (D)P (C|D), reversing the roles of C and
D in part b gives the result.

2.6.10 problem 69
(a)
y = dp + (1 − d)(1 − p)

(b) The worst choice for p is 21 , because then the fraction of "yes" responses
is 12 irrespective of the fraction of drug users. In other words, the
number of "yes" responses tells us nothing.

(c) We can extend the result from part a.


A drug user says "yes" either if they get a "Have you used drugs" slip,
or if they get a "I was born in winter" slip and they are, in fact, born
in winter.
CHAPTER 2. CONDITIONAL PROBABILITY 52

A person who has not used drugs says "yes" only in the case that they
get a "I was born in winter" slip and they were, in fact, born in winter.

1 1
y = d(p + (1 − p)) + (1 − d)(1 − p)
4 4
Thus,

4y + p − 1
d=
2p

2.6.11 problem 70
Let F be the event that the coin is fair, and let Hi be the even that the i-th
toss lands Heads.

(a) Both Fred and his friend are correct. Fred is correct in that the proba-
bility of there being no Heads in the entire sequence is very small. For
example, there are 92 45
sequences with 45 Heads and 47 Tails, but only
1 sequence of all Heads.
On the other hand, Fred’s friend is correct in his assessment that any
particular sequence has the same likelihood of occurance as any other
sequence.

(b)
 92
1
P (F )P (H1≤i≤92 |F ) p 2
P (F |H1≤i≤92 ) = =  92
P (F )P (H1≤i≤92 |F ) + P (F )P (H1≤i≤92 |F )
c c
p 21 + (1 − p)

92
(c) For P (F |H1≤i≤92 ) to be larger than 21 , p must be greater than 292 2
+1
,
which is approximately equal to 1, where as for P (F |H1≤i≤92 ) to be less
92
than 201
, p must be less than 2922+19 , which is also approximately equal
to 1. In other words, unless we know for a fact that the coin is fair, 92
Heads in a row will convince us otherwise.
CHAPTER 2. CONDITIONAL PROBABILITY 53

2.6.12 problem 71
(a) To have j toy types after sampling i toys, we either have j −1 toy types
after sampling i−1 toys, and the i-th toy is of a previously unseen type,
or, we have j toy types after sampling i − 1 toys, and the i-th toy has
an already seen type.
Thus,
n−j+1 j
pi,j = pi−1,j−1 + pi−1,j
n n
(b) Note that p1,0 = 0, p1,1 = 1 and pi,j = 0 for j > i. Using strong
induction, a proof of the recursion in part a follows.

2.6.13 problem 72
(a)
pn = an a + (1 − an )b = (a − b)an + b

an+1 = an a + (1 − an )(1 − b) = an (a + b − 1) + 1 − b

(b)
pn+1 = (a − b)an+1 + b
pn+1 = (a − b)((a + b − 1)an + 1 − b) + b
!
pn − b
pn+1 = (a − b) (a + b − 1) +1−b +b
a−b
pn+1 = (a + b − 1)pn + a + b − 2ab

(c) Let p = limn→∞ pn . Taking the limit of both sides of the result of part
b, we get
p = (a + b − 1)p + a + b − 2ab
a + b − 2ab
p=
2 − (a + b)
CHAPTER 2. CONDITIONAL PROBABILITY 54

2.6.14 problem 74
a. See the first paragraph of part d.

b.
 
48
j−1
∗ (j − 1)! ∗ 4 ∗ 3 ∗ (52 − j − 1)! 3
P (B|Cj ) = P (B∧Cj )/P (Cj ) = =
52 − j
 
48
j−1
∗ (j − 1)! ∗ 4 ∗ (52 − j)!
 
The first equality comes from Bayes’ theorem. The i−1
48
∗ (i − 1)! terms
come from the ways to have the first j-1 cards be non-aces. 4*3 refers to
combinations of 2 adjacent aces in the numerator. The ending factorials in
the numerator and denominator come from ordering the rest of the cards.

c. We have  
48
j−1
∗ (j − 1)! ∗ 4 ∗ (52 − j)!
P (Cj ) =
52!
With the LOTP, part b, and the power of R, we can compute
49
P (B) = (P (B|Cj ) ∗ P (Cj ))
X

i=1

d. Argument by symmetry: Consider the events "the first card after the
first ace is an ace" and "the last card after the first ace is an ace". The second
event is equivalent to the last card in the deck being an ace. In addition,
the two events must have the same probability, as every card drawn after the
first ace is equally likely to be an ace. Therefore, the probability of the first
event is 1/13.

For a proof using conditional probability:

Consider the ace of hearts and the ace of spades. The probability that
the ace of hearts is the first ace to appear followed immediately by the ace
of spades (call this event A) is the probability that they appear adjacent to
each other in that order (call this event B) and that those two aces appear
before the other two aces (call this event C).
We have

P (B ∧ C) = P (C|B) ∗ P (B)
CHAPTER 2. CONDITIONAL PROBABILITY 55

Now, to compute P (C|B) consider that if the aces of hearts and of spades
appear adjacent to each other in that order, we can consider them as a card
"glued together". There are 3!=6 possible orderings of the glued together
card and the other 2 aces - in 2 of them, the glued together card is first. So
P (C|B) = 1/3.

We also have P (B) = 51∗(50!)


52!
= 1/52, as there are 51 sets of two adjacent
spaces the two aces could be, and the rest of the cards can be ordered in 50!
ways.

Then, we now find P (B ∧ C) = (1/3) ∗ (1/52).

Now, instead of the aces of hearts and spades specifically, consider there
are 12 possible pairs of aces that can be adjacent to each other. Then the
total probability that the card after the first ace is another ace is

12 ∗ (1/3) ∗ (1/52) = 1/13


Chapter 3

Random Variables and Their


Distributions

3.1 PMFs and CDFs . . . . . . . . . . . . . . . . . . . . . . . . . 56


3.2 Named Distributions . . . . . . . . . . . . . . . . . . . . . . . 60
3.3 Independence of r.v.s . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 Mixed Practice . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.1 PMFs and CDFs


3.1.1 problem 1
If the k-th person’s arrival results in the first birthday match, the first k − 1
people have 365 ∗ 364 ∗ · · · ∗ (365 − k + 2) choices of birthday assignments
such that no two people have the same birthday. The k-th person has k − 1
choices of birthdays, since their birthday must match that of one of the first
k − 1 people.
Thus,

365 ∗ 364 ∗ · · · ∗ (365 − k + 2) k − 1


P (X = k) =
365k−1 365

3.1.2 problem 2
(a) Since the trials are independent, the probability that the first k − 1
trials fail is ( 12 )k−1 , and the probability that the k-th trial is successful

56
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS57

is 12 . Thus, for k ≥ 1,

1 1
P (X = k) = ( )k−1 ∗ .
2 2
(b) This problem reduces to part a once a trial is performed. Whatever it’s
outcome, we label it failure and proceed to perform more trials until
the opposite outcome is observed. Thus, for k ≥ 2,

1 1
P (X = k) = ( )k−2 ∗ .
2 2

3.1.3 problem 3
P (Y ≤ k) = P (X ≤ k−µ
σ
) = F ( k−µ
σ
).

3.1.4 problem 4
To show that F (x) is a CDF, we need to show that F is increasing, right-
continuous, and converges to 0 and 1 in the limits.
The first condition is true since ⌊x⌋ is increasing.
Since limx→a+ F (x) = F (a) when a ∈ N by the definition of F (x), the
second condition is satisfied.
limx→∞ F (x) = 1 by the definition of F (x), and also, by definition,
limx→−∞ F (x) = 0. Thus, the third condition is satisfied, and F (x) is a
CDF.
The PMF F corresponds to is
1
P (X = k) =
n
for 1 ≤ k ≤ n and 0 everywhere else.

3.1.5 problem 5
(a) p(n) is clearly non-negative. Also,


1X ∞
1 1 1
p(n) = = ∗ = 1.
X

n=0 2 n=0 2n 2 1− 1
2

Thus, p(n) is a valid PMF.


CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS58

(b)
⌊x⌋
1X⌊x⌋
1 1 1 − 2⌊x⌋+1
1
1
F (x) = p(n) = = ∗ = 1 − ⌊x⌋+1
X

n=0 2 n=0 2n 2 1− 2 1
2
for x ≥ 0 and 0 for x < 0.

3.1.6 problem 7
To find the probability mass function (PMF) of X, we need to determine the
probabilities of Bob reaching each level from 1 to 7.
P (X = 1) is simply the probability of Bob reaching level 1, which is 1
since he starts at level 1.
For 2 ≤ j ≤ 6, P (X = j) is the probability of reaching level j but not
reaching level j + 1. This can be calculated as
P (X = j) = p1 p2 · · · pj−1 (1 − pj ).
Since the game has only 7 levels,
P (X = 7) = p1 p2 p3 p4 p5 p6 .

3.1.7 problem 8
(k−1
4 )
P (X = k) = for k ≥ 5.
(100
5 )
P (X = k) = 0 for k < 5.

3.1.8 problem 9
(a) F (x) = pF1 (x) + (1 − p)F2 (x).
Let x1 < x2 . Then
F (x1 ) = pF1 (x1 ) + (1 − p)F2 (x1 ) < pF1 (x2 ) + (1 − p)F2 (x2 ) = F (x2 ).

Since F (x) is a weighted sum of right continuous functions, it is itself


a right continuous function.

lim F (x) = p lim F1 (x) + (1 − p) lim F2 (x) = p + 1 − p = 1.


x→∞ x→∞ x→∞

Similarly,
lim F (x) = 0.
x→−∞
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS59

(b) Let X be an r.v. created as described. Let H be the event that coin
lands heads, and T be the event that the coin lands tails.
Then, F (X = k) = P (H)F1 (k) + P (T )F2 (k) = pF1 (k) + (1 − p)F2 (k).
Note that this is the same CDF as in part a.

3.1.9 problem 10
(a) Let P (n) = k
for n ∈ N. By principles of probability, P (n) must
P
n n∈N
equal 1.

1
P (n) = k
X X
.
n∈N n∈N n

The sum on the right side of the equality is a divergent, harmonic


series. Hence. the aforementioned principle of probably is violated.
Contradiction.
π2
(b) 1
= . Thus, letting k equal 6
, the principle of probability is
P
n∈N n2 6 π2
satisfied.

3.1.10 problem 12
(a) https://drive.google.com/file/d/1vAAxLU7hvihAHOEcHx8Nc-9xapGlzc-I/
view?usp=sharing

(b) Let I ⊂ X be the subset of the support where P1 (x) < P2 (x). Then

P1 (x) = P1 (x) + P1 (x) < P2 (x) + P2 (x) = 1.


X X X X X

x∈X x∈I x∈X\I x∈I x∈X\I

Thus, having such a property in PMFs is impossible.

3.1.11 problem 13
P (X = a) = P (Z = z)P (X = a|Z = z) = P (Z = z)P (Y = a|Z = z) = P (Y = a).
X X

z∈Z z∈Z
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS60

3.1.12 problem 14
(a)
1 − P (X = 0) = 1 − e−λ

P (X ≥ 2) = 1 − P (X = 0) − P (X = 1) = (1 − e−λ ) − e−λ λ

(b)
P (X = k) λk
P (X = k|X > 0) = = λ
P (X > 0) (e − 1)k!

3.2 Named Distributions


3.2.1 problem 15
0 , if x < 1



FX (x) = P (X ≤ x) = ⌊x⌋
, if 1 ≤ x ≤ n
 n
1 , if x > n

where ⌊x⌋ equals the largest integer that is less than or equal to x .

3.2.2 problem 16
1
|C| 1
P (X = k|X ∈ B) = |B|
=
|C|
|B|

3.2.3 problem 17
100
110
!
P (X ≤ 100) = (0.9)i (0.1)110−i
X

i=0 i

3.2.4 problem 19
 
The pmf of the number of games ending in a draw is P (X = k) = nk (0.6)k (0.4)n−k
for 0 ≤ k ≤ n.
Let X be the number of games that end in draws. The number of players
whose games end in draws is Y = 2X.
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS61

3.2.5 problem 20
 
(a) P (X = k) = n
k
pk (1 − p)n−k for 0 ≤ k ≤ 3.

(b) To use the complement of the desired deven,

P (X > 0) = 1−P (X = 0) = 1−(1−p)3 = 1−(−p3 +3p2 −3p+1) = p3 −3p2 +3p.

To prove the same by Inclusion-Exclusion,

3
P (X > 0) = P (IXi = 1) − 3p2 + P (∩3i=1 IXi = 1) = 3p − 3p2 + p3 .
X

i=1

(c) Since p2 and p3 go to 0 asymptotically faster than p, when p is small,


3p − 3p2 + p3 ≈ 3p.

3.2.6 problem 22
(a) Let Ci be the event that i-th type of coin is chosen. Let Hk be the
event that k out of the n flips land heads.

1 n k 1 n k
! !
P (X = k) = P (C1 )P (Hk |C1 )+P (C2 )P (Hk |C2 ) = p1 (1−p1 )n−k + p (1−p2 )n−k
2 k 2 k 2

(b) if p1 = p2 , then X is Binomial n, k.

(c) If p1 ̸= p2 , then the Bernoulli trials are not independent. If, for instance,
p1 is small and p2 is large, and after the first million flips we see two
heads, this increases the likelihood that we are using the coin with
probability p1 of landing heads, which in turn tells us that subsequent
flips are unlikely to be land heads.

3.2.7 problem 23
Let Ii be the indicator of the i-th person voting for Kodos. Then, P (Ii =
1) = p1 p2 p3 . Since the voters make their decisions independently, we have
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS62

n independent Bernoulli trials, which is precisely the story for a Binomial


distribution.
Thus,
!
n
P (X = k) = (p1 p2 p3 )k (1 − p1 p2 p3 )n−k
k

3.2.8 problem 24
(a) Since tosses are independent, we expect information about two of the
tosses to not provide any information about the remaining tosses. In
other words, we expect the required probability to be

8 8
! !
(0.5)k (0.5)8−k = (0.5)8
k k
for 0 ≤ k ≤ 8.
To prove this, let X be the number of Heads out of the 10 tosses, and
let X1,2 be the number of Heads out of the first two tosses.

P (X = k ∩ X1,2 = 2)
P (X = k|X1,2 = 2) =
P (X1,2 = 2)
 
(0.5)2 8
k−2
(0.5)k−2 (0.5)8−k+2
=
(0.5)2
8
!
= (0.5)k−2 (0.5)8−k+2
k−2
8
!
= (0.5)8
k−2
 
for 2 ≤ k ≤ 10, which is equivalent to 8
k
(0.5)8 for 0 ≤ k ≤ 8.

(b) Let X≥2 be the event that at least two tosses land Heads.
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS63

P (X = k ∩ X≥2 )
P (X = k|X≥2 ) =
X≥2
 
10
k
(0.5)k (0.5)10−k
=
1 − (0.510 + 10 ∗ 0.510 )

for 2 ≤ k ≤ 10.
To see that this answer makes sense, notice that if we over all values of
k from 2 to 10, we get exactly the denominator, which means the said
sum equals to 1.

3.2.9 problem 26
If X ∼ HGeom(w, b, n), then n − X ∼ HGeom(b, w, n).
If X counts the number of items sampled from the set of w items in a
sample of size n, then n − X counts the number of items from the set of b
items in the same sample.
To see this, notice that
  
w b
n−k k
P (n − X = k) = P (X = n − k) = 
w+b

n

3.2.10 problem 27
X is not Binomial becuase the outcome of a card is not independent of the
previous cards’ outcomes. For instance, if the first n − 1 cards match, then
the probability of the last card matching is 1.
The Hypergeometric story requires sampling from two finite sets, but the
matching cards isn’t a set of predetermined size, so the story doesn’t fit.
 
n
k
!(n − k)
P (X = k) =
n!
where !(n − k) is a subfactorial.
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS64

3.2.11 problem 30
(a) The distribution is hypergeometric. We select a sample of t employees
and count the number of women in the sample.
  
n m
k t−k
P (X = k) = 
n+m

t

(b) Decisions to be promoted or not are independent from employee to


employee. Thus, we are dealing with Binomial distributions.
Let X be the number of women who are promoted. Then, P (X =
k) = nk pk (1 − p)n−k . The number of women who are not promoted is
Y = n − X and so is also Binomial.
Distribution of the number of employees who are promoted is also Bi-
nomial, since each employee is equally likely to be promoted and pro-
motions are independent of each other.

(c) Once the total number of promotions is fixed, they are no longer inde-
pendent. For instance, if the first t people are promoted, the probability
of the t + 1-st person being promoted is 0.
The story fits that of the hypergeometric distribution. t promoted
employees are picked and we count the number of women among them.

      
n
k
pk (1 − p)n−k m
t−k
pt−k (1 − p)m−t+k n
k
m
t−k
P (X = k|T = t) =   =  
n+m
t
pt (1 − p)n+m−t n+m
t

3.2.12 problem 31
(a) Note that the distribution is not Binomial, since the guesses are not
independent of each other. If, for instance, the woman guesses the first
three cups to be milk-first, and she is correct, then the probability of
her guessing milk-first on subsequent guesses is 0, since it is known in
advance that there are only 3 milk-first cups.
Hypergeometric story fits. Let Xi be the probability that the lady
guesses exactly i milk-first cups correctly.
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS65

  
3 3
i 3−i
P (Xi ) =  
6
3

Thus, P (X2 ) + P (X3 ) = 10


= 1
(63) 2

(b) Let M be the event that the cup is milk first, and let T be the event
that the lady claims the cup is milk first. Then,

P (M |T ) P (M ) p1 p1
= =
P (M |T )
c P (M ) 1 − p2
c 1 − p2

3.2.13 problem 32
(a) The problem fits the story of Hypergeometric distributions.
  
s 100−s
k 10−k
P (X = k) =  
100
10

for 0 ≤ k ≤ s.

(b) > x = 75
> y_interval <- sum(dhyper(7:10, x, 100-x, 10))
> y_interval
[1] 0.7853844

3.2.14 problem 33
(a) The probability of a typo being caught is p1 + p2 − p1 p2 . Then,
!
n
P (X = k) = (p1 + p2 − p1 p2 )k (1 − (p1 + p2 − p1 p2 ))n−k
k

(b) When we know the total number of caught typos in advance, the typos
caught by the first proofreader are no longer independent. For example,
if we know that first proofreader has caught the first t typos, and the
total number of caught typos is t, then the probability of the first
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS66

proofreader catching subsequent typos is 0, since the total number of


caught typos was t.
 
Thus, we employ a Hypergeometric distribution. Since p1 = p2 , all 2n
t
t-tuples of caught typos are equally likely. Hence,
  
n n
k t−k
P (X1 = k|X1 + X2 = t) =  
2n
t

3.2.15 problem 34
(a) Let Y be the number of Statistics students in the sample of size m.

  
n n
! i n−i
n i n−i k m−k
P (Y = k) = P (X = i)P (Y = k|X = i) = p (1−p)
X X
 
i n
i=k i=k m

(b) Consider a student in a random sample of size m. Independently of


other students, the student has probability p of being a statistics major.
Then,
  the probability of k students in the sample being statistics majors
is m
k
pk (1 − p)m−k . Thus, Y ∼ Binom(m, p).

3.2.16 problem 36
(a)
n 1
!
n
P (X = ) = n ( )n
2 2
2

(b) Using Sterling’s formula


√ √ n
2πn( ne )n 22
!
n
n =q n n
q n n
= √
2 2π n2 ( e2 ) 2 2π n2 ( e2 ) 2 πn

Thus,
s
n 2 n1 1
P (X = ) = 2 n = q πn
2 πn 2
2
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS67

3.3 Independence of r.v.s


3.3.1 problem 38
(a) Let Y = X + 1. Then, X and Y are clearly dependent, and P (X <
Y ) = 1.

(b) Let X be the value of a toss of a six sided die, with values 1 to 6. Let
Y be the value of a toss of a six sided die, with values 7 to 12. Tosses
of the two die are independent, but P (X < Y ) = 1.

3.3.2 problem 39
Let X have a discrete uniform distribution over values 1, 2, 3, ...10. Let Y =
11 − X. Then Y is also discrete uniform over the same sample space, but
P (X = Y ) = 0.
If X and Y are independent, then P (X = Y ) = i∈S P (X = i)P (Y =
P

i) > 0.

3.3.3 problem 40
(a) Suppose, toward a contradiction, that X and Y do not have the same
PMF. Then there is at least one k in the support of X such that
P (X = k) and P (Y = k) are not equal.
Note that if P (X = Y ) = 1, then P (X = Y |X = k) = P (X = Y |Y =
k) = P (X = k|Y = k) = P (Y = k|X = k) = 1, as an event with
probability 1 will still have probability 1 conditioned on any non-zero
event.
Using the above and examining Bayes’ theorem, we have P (X = k|Y =
k) = P (Y = k|X = k) ∗ P (X = k)/P (Y = k), which simplifies to
1 = P (X = k)/P (Y = k) as the conditional probabilities equal 1 as
previously shown. However, this equality is impossible if P (X = k) =
/ = P (Y = k). This contradicts the assumption that P (X = Y ) = 1 -
therefore, X and Y must have the same PMF if they are always equal.

(b) Let X, Y be r.v.s with probability 1 of equalling 1, and probability 0


of equalling any other value.
Then for x = y = 1 P (X = x ∧ Y = y) = 1 = P (X = x)P (Y = y),
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS68

and for all other possible pairs of values x, y, P (X = x ∧ Y = y) =


0 = P (X = x)P (Y = y). Therefore, X, Y can be independent in this
extreme case.

3.3.4 problem 41
Let X be the event that Tom woke up at 8 in the morning. Let Y be the
event that Tom has blue eyes. Let Z be the event that Tom made it to his
7 a.m. class.
Clearly Tom’s eye color is independent of the time he woke up and whether
he made it to his early morning class or not. However, if Tom woke up at 8,
then he definitely did not make it to his 7 am class.

3.3.5 problem 43
(a) Let X ≡ a (mod b) and Y ≡ X + 1 (mod b). Then, limb→∞ P (X <
Y ) = 1.
For finite random variables X and Y , the case of P (X < Y ) ≥ 1 is
not possible, since then Y can never achieve the smallest value of X,
contradicting the assumption that X and Y have the same distribution.

(b) If X and Y are independent random variables with the same distribu-
tion, then P (X < Y ) ≤ 21

3.3.6 problem 44
(a) P (X Y ) ∼ Bern( p2 )
L

(b) If p ̸= 12 , X Y and Y are not independent. Imagine that X = 0 is


L

extremely unlikely. Then, knowing that Y = 0 makes it very likely that


X Y = 1. If p = 21 , then X Y and Y are independent.
L L

X Y and X are independent, since knowledge of X still keeps the


L

probability of Y = 1 at 12

(c) Let the largest element in J be m.


CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS69

P (YJ = 1) = P (Xm = 1)P (YJ\{m} = 0) + P (Xm = 0)P (YJ\{m} = 1)


1
= (P (YJ\{m} = 0) + P (YJ\{m} = 1))
2
1
=
2

Thus, YJ ∼ Bern( 12 )

To prove pairwise independence: Let J, J ′ be two arbitrary subsets


of {1...n}. We want to show that P (YJ = a ∧ YJ ′ = b) = P (YJ =
a)P (YJ = b) = 1/4 for a, b ∈ {0, 1}, with the second equality coming
from our knowledge that YJ ∼ Bern(1/2) for all J.

First, let us note that for all J, J ′ that are disjoint, YJ , YJ ′ are in-
dependent - this follows from the independence of the Xi .

Now, suppose J, J ′ are not disjoint. Let A = J ∩ J ′ , let B = J \ A, and


let C = J ′ \ B. By definition, A, B, C are disjoint.

Now, we have
P (YJ = a∧YJ ′ = b) = P (YJ = a, YJ ′ = b|YA = 1)P (YA = 1)+P (YJ = a, YJ ′ = b|YA = 0)P (YA =

using the LOTP. Continuing, we have

= P (YB = 1−a, YC = 1−b|YA = 1)P (YA = 1)+P (YB = a, YC = b|YA = 0)P (YA = 1)

by noting that if x ⊕ 1 = y, we must have y = 1 − x and if x ⊕ 1 = y,


we have y = x. Continuing, we get

= P (YB = 1−a)P (YC = 1−b)P (YA = 1)+P (YB = a)P (YC = b)P (YA = 0)

We can remove the conditioning since A, B, C are disjoint, and therefore


YB , YC , YA are all independent r.v.s. Finally, we realize that since all Y
are Bern(1/2), this results in
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS70

= (1/2)3 + (1/2)3 = 1/4 = P (YJ = a)P (YJ ′ = b)

as desired - YJ , YJ ′ are independent for any pair J, J ′ .

To prove that the YJ are not all independent, consider the subsets
S = {1}, S ′ = {2}, S ′′ {1, 2}. It is clear that if YS = 1 and YS ′ = 1, then
YS ′′ = YS ⊕ YS ′ = 0. However, this implies that

P( YJ = 1) = 0
\

J⊆{1..n}

i.e. it is impossible for all Y to simultaneously equal 1. However, we


know that

P (YJ = 1) = (1/2)2n−1 ̸= 0
Y

J⊆{1..n}

Thus, the YJ are not independent.

3.4 Mixed Practice


3.4.1 problem 46
If a failure is seen on the first trial, then there are 0 successes and 1 failure,
so it is clearly possible that there are more than twice as many failures as
successes.

(a) If we think of the Bernoulli trial success as a win for player A, and the
Bernoulli trial failure as a loss for player A, then have more than twice
as many failures as successes is analogous to A losing the Gambler’s
Ruin starting with 1 dollar. For instance, if A wins the first gamble,
then A has 3 dollars, and B needs 2 ∗ 1 + 1 gamble wins for A to lose
the entire game.
Thus, we need to find p1 .
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS71

(b) pk = 21 pk+2 + 12 pk−1 with conditions p0 = 1 and limk→∞ pk = 0



The characteristic equation is 12 t3 − t + 1
2
= 0 with roots 1 and −1± 5
2
.
Thus,
√ √
−1 + 5 −1 − 5
p k = c1 + c2 ( ) + c3 (
k
)k
2 2
Using the hint that limk→∞ pk = 0, c1 and c3 must be 0. Thus,

−1 + 5
p k = c2 ( )k
2
Using p0 = 0, we get that c2 = 1. Thus,

−1 + 5
pk = ( )k
2
(c) √
−1 + 5
p1 =
2

3.4.2 problem 47
(a) Consider the simple case of m < n2 . Then, the trays don’t have enough
pages to print n copies. Desired probability is 0.
On the other hand, if m ≥ n, then desired probability is 1, since each
tray individually has enough pages.
Now, consider the more interesting case that n2 ≤ m < n. Associate n
pages being taken from the trays with n independent Bernoulli trials.
Sample from the first tray on success, and sample from the second tray
on failure. Thus, the assignment of trays can be modeled as a Binomial
random variable, X ∼ Bin(n, p). As long as not too few pages are
sampled from the first tray, the remaining pages can be sampled from
the second tray. What is too few? n − m − 1 is too few, because
n − m − 1 + m < n.
Hence,
CHAPTER 3. RANDOM VARIABLES AND THEIR DISTRIBUTIONS72

0


 m < n2
P = pbinom(m, n, p) − pbinom(n − m − 1, n, p) n2 ≤ m < n
1


m≥n

(b) Typing out the hinted program in the R language, we get that the
smallest number of papers in each tray needed to have 95 percent con-
fidence that there will be enough papers to make 100 copies is 60.
Chapter 4

Expectation

4.1 Expectations And Variances . . . . . . . . . . . . . . . . . . . 73


4.2 Named Distributions . . . . . . . . . . . . . . . . . . . . . . . 80
4.3 Indicator r.v.s . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4 LOTUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 Poisson approximation . . . . . . . . . . . . . . . . . . . . . . 92
4.6 Mixed practice . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.1 Expectations And Variances


4.1.1 problem 1
Let N be the number of ameobas in the pond after a minutes.
1
E(N ) = (0 + 2 + 1) = 1
3
1 2
Var(N ) = E(N 2 ) − (E(N ))2 = (0 + 4 + 1) − 1 =
3 3

4.1.2 problem 2
Let N be the number of days in a randomly chosen year.
3 1
E(N ) = 365 + 366 = 365.25
4 4

73
CHAPTER 4. EXPECTATION 74

3 1
Var(N ) = E(N 2 ) − (E(N ))2 = 3652 + 3662 − 365.252 = 0.1875
4 4

4.1.3 problem 3
(a) Let D be the value of the die roll.
1
E(D) = (1 + 2 + 3 + 4 + 5 + 6) = 3.5
6

(b) Let T4 be the total sum of the four die rolls, and let Di be the value of
the i-th roll. Note that T4 = D1 + D2 + D3 + D4 . Then, by linearity of
expectation,

E(T4 ) = 4E(Di ) = 4 ∗ 3.5 = 12.2

4.1.4 problem 4
Let’s start defining some convenient r.v.s for this problem:

• Di : value of the i-th roll.

• w1 : winning if one keeps playing after the first roll.

• w2 : winning if one keeps playing after the second roll.

The optimal strategy is to stop if the value of the last roll is greater than
the expected winning if one keeps playing. In other words, keep rolling if
doing so brings winnings that are, on average, greater than the last roll:

1. If D1 > E(w1 ), STOP after 1 roll.

2. Else if D2 > E(w2 ), STOP after 2 rolls.

Since the rolls are independent, we can calculate the expectations of w1


and w2 in reverse order.
The winning w2 is equal to the value of the third roll:
CHAPTER 4. EXPECTATION 75

6
1
E(w2 ) = E(D3 ) = x P (D3 = x) = ∗ (1 + 2 + 3 + 4 + 5 + 6)
X

x=1 6

E(w2 ) = 3.50 dollars


This reveals the second part of the optimal strategy: stop after 2 rolls if
D2 ≥ 4.
The PMF of w1 is given by

3 1
P (w1 = x) = P (D2 < 4, D3 = x) = ∗
6 6
3
= for x = 1, 2, 3
36

1 3 1
P (w1 = x) = P (D2 = x ∪ D2 < 4, D3 = x) = + ∗
6 6 6
9
= for x = 4, 5, 6
36
Using those probabilities to calculate the expected winning w1 from the
definition of expectation:
6
E(w1 ) = x P (w1 = x) = 4.25 dollars
X

x=1
Now we can fully describe the optimal strategy, which maximizes the
expected winnings:

1. If the value of the first roll is ≥ 5, STOP after 1 roll.


2. Else if the value of the second roll is ≥ 4, STOP after 2 rolls.

Finally, let’s calculate the expected winning W ∗ of the optimal strategy.


The PMF of W ∗ is calculated below

4 3 1
P (W ∗ = x) = P (D1 < 5, D2 < 4, D3 = x) = ∗ ∗
6 6 6
12
= for x = 1, 2, 3
63
CHAPTER 4. EXPECTATION 76

P (W ∗ = 4) = P (D1 < 5, D2 = 4 ∪ D1 < 5, D2 < 4, D3 = 4)


4 1 4 3 1
= ∗ + ∗ ∗
6 6 6 6 6
36
= 3
6
P (W ∗ = x) = P (D1 = x ∪ D1 < 5, D2 = x ∪ D1 < 5, D2 < 4, D3 = x)
1 4 1 4 3 1
= + ∗ + ∗ ∗
6 6 6 6 6 6
72
= 3 for x = 5, 6
6
Plugging these probabilities into the definition of expectation:
6
E(W ∗ ) = x P (W ∗ = x) = 4.67 dollars
X

x=1

4.1.5 problem 5
Let X ∼ DUniform(n).

1X n
1
E(X) = i = (n + 1)
n i=1 2

1X n
1 1 1
Var(X) = E(X 2 )−(E(X))2 = i2 −( (n+1))2 = (n+1)(2n+1)− (n+1)2
n i=1 2 6 4

4.1.6 problem 6
Let N be the number of games played. Then the probability than N = i is
the probability of exactly 3 wins
 in the first i − 1 games,
  and the last game
being a win. P (N = i) = 2 3 ( 2 ) ( 2 )
i−1 1 3 1 i−1−3 1
2
= 2 i−1
3
( 12 )i . Note that the
factor of 2 in P (N = i) is to account for either of the two players winning
after i games.
Then,
7
i−1 1 i
!
E(N ) = 2 ( ) ≈ 5.81
X
i
i=4 3 2

Var(N ) = E(N 2 ) − (E(N ))2 ≈ 1.06


CHAPTER 4. EXPECTATION 77

4.1.7 problem 7
(a) Let R be the birthrank of the chosen child. Then,
P (R = 3) = 20 1
100 3
= 60
4

P (R = 2) = 100
50 1
2
+ 100
20 1
3
= 19
60
P (R = 1) = 100 + 100 2 + 100 3 = 37
30 50 1 20 1
60
E(R) = 1 37
60
+ 2 19
60
+ 3 4
60
= 29
20
Var(R) = E(R ) − (E(R)) = 149
2 2
60
− 841
400
≈ 0.38

(b) E(R) = 1 100


190
+ 2 190
70
+ 3 190
20
= 30
19
Var(R) = 56
19
− ( 30
19
)2 ≈ 0.45

4.1.8 problem 8
(a) Let Ci be the population of the i-th city, such that the first four cities
are in the Northern region, the next three cities are in the Eastern
region, the next two cities are in the Southern region, and the last city
is in the Western region.
Let C be the population of a randomly chosen city.
Then E(C) = Ci = 2million.
1 P10
10 i=1

(b) Var(C) = E(C 2 ) − (E(C))2 . E(C 2 ) can not be computed without the
knowledge of population sizes of individual cities.

(c) Var(C) = 41 ( 14 3million + 31 4million + 12 5million + 8million) ≈ 3million

(d) Since regions with smaller population have more cities, if a city is ran-
domly selected, it is more likely that the city belongs to a low popu-
lation region. On the other hand, if a region is selected uniformly at
random first, then a randomly selected city is as likely to belong to
a region with a large population as it is to belong to a region with a
smaller population.

4.1.9 problem 9
Let X be the amount of money Fred walks away with.
CHAPTER 4. EXPECTATION 78

(a) E(X) = 16000. There is no variance under this scenario, since Fred’s
take home amout is fixed.

(b) E(X) = 12 1000+ 12 43 32000+ 12 14 64000 = 20500.VarX = E(X 2 )−(E(X))2 ≈


4.76 ∗ 108 .

(c) E(X) = 34 1000+ 14 21 32000+ 14 12 64000 = 12750.VarX = E(X 2 )−(E(X))2 ≈


4.78 ∗ 108 .

Option b has a higher expected win than option c, but it also has a higher
variance.

4.1.10 problem 10
The probability that the game lasts n rounds is 1/2n .

Thus, if the winnings for n rounds is n, we must compute i=1 (i/2 ).


P∞ i

We know that ∞ i=1 (x ) = 1−x . Deriving both sides with respect to x gives
i x
P

i=1 (ix ) = (1−x) 2 . Multiplying by both sides gives i=1 (ix ) = (1−x)2 .
P∞ i−1 1 P∞ i x

Plugging in x = 1/2 gives the answer 2.

For the second part of the problem we need to find i=1 (i /2 ).


P∞ 2 i

We know ∞ i=1 (ix ) = (1−x)2 . Deriving both sides with respect to x again,
i x
P

using the quotient rule, gives ∞ i=1 (i x


2 i−1
) = (1−x)
1+x
3 . Multiplying both sides
P

x+x2
by x gives i=1 (i x ) = . Plugging in x = 1/2 gives the answer of 6.
P∞ 2 i
(1−x)3

4.1.11 problem 11
Note that 31 = 24 + 23 + 22 + 21 + 1. Thus, Martin can play at most 5 rounds.
For every possible win, Martin makes 1 dollar. If the game reaches the fifth
round, it is also possible that Martin loses and walks away with nothing.
Let X be Martin’s winnings.
CHAPTER 4. EXPECTATION 79

Then,
5
1 1
E(X) = ( i 1) + ( 5 0) ≈ 0.97
X

i=1 2 2

4.1.12 problem 12
Since P (X = k) = P (X = −k), i=1 (iP (X = i) + (−i)P (X = −i)) = 0.
Pn

Hence, E(X) = 0.

4.1.13 problem 14
E(X) = c ∞ i=1 p = c( 1−p − 1) = − log(1−p) 1−p
k 1 1 p
P

E(X) = E(X 2 )−(E(X))2 = c ∞ i=1 ip −(− log(1−p) 1−p ) = − log(1−p) (p−1)2 −


i 1 p 2 1 p
P

(− log(1−p)
1 p 2
1−p
)

4.1.14 problem 15
(a) Let X be the earnings by player B. Suppose B guesses a number j
with probability bj . Then,

100
E(X) =
X
jpj bj
j=1

To maximize E(X) then, B should set bj = 1 for the j for which jpj is
maximal. Since pj are known, this quantity is known.

(b) Suppose player P (A = k) = cA


k
, and P (B = k) = bk . Then,

100
cA
E(X) = (k bk ) = c A
X

k=1 k
.
Thus, irrespective of what strategy B adopts, their expected earnings
are the same, so B has no incentive to change strategies. Similar argu-
ment can be made for A.

(c) part b answers this part as well.


CHAPTER 4. EXPECTATION 80

4.1.15 problem 16
(a) From the student’s perspective, the average class size is E(X) = 200
360
100+
160
360
10 = 60. From the dean’s perspective, the average class size is
E(X) = 1618
10 + 18
2
100 = 20. The discrepancy comes from the fact that
when surveying the dean, there are only two data points with a large
number of students. However, when surveying students, there are two
hundred data points with a large number of students. In a sense, the
student’s perspective overcounts the classes.

(b) Let C be a set of n classes with ci students for 1 ≤ i ≤ n. The dean’s


view of average class size then is E(X) = ni=1 cni . The students’ view of
P

average class size is E(X) = ni=1 (ci Pnci ci ). In the dean’s perspective,
P
i=1
all ci are equally weighted - n1 . However, in the students’ perspective,
weights scale with the size of the class. Thus, the students’ perspective
will always be larger than the dean’s, unless all classes have the same
number of students.

4.1.16 problem 17
(a) The expected number of children in a randomly selected family during
a particular era is E(X) = ∞ P∞nk =m .
P
k=0 k
1
n m0k
k=0

(b) The expected number of children in the family of a randomly selected


child is E(X) = ∞ P∞knk =m .
P
k=0 k
2
kn m1
k
k=0

(c) answer in part b is larger than the answer in part a. Since the average
in part a is taken over randomly selected families, families with fewer
children are weighted the same as families with more children. The
average in part b, on the other hand, is taken over individual children,
skewing the weights in favor of families with more children.

4.2 Named Distributions


4.2.1 problem 20
(a) This is not possible, since Y has a non-zero probability of being a
number larger than 100, where as X is capped at 100.
CHAPTER 4. EXPECTATION 81

(b) Let X be the number of contestants who enter a tournament, and let
Y be the number of contestants who pass the first round. Clearly,
P (X ≥ Y ) = 1.

(c) This is not possible, because if X always produces values smaller or


equal to the values produced by Y , then E(X) ≤ E(Y ). However,
E(X) = 90, and E(Y ) = 50.

4.2.2 problem 24
One way to think about the problem is that the event X < r counts all
sequences of n independent Bernoulli trials, where the number of failures
is larger than n − r. If we extend the number of trials indefinitely, this
implies that more than n − r failures occured before the r-th success, because
otherwise, we’d have X ≥ r. The probability of this event is P (Y > n − r).
Implication in the reverse direction can be shown analogously.

4.2.3 problem 26
(a) Let Z represent the number of flips until both Nick and Penny flip
Heads. Then is Z ∼ FS(p1 p2 ), since Nick’s and Penny’s flips are inde-
pendent. E(Z) = p11p2

(b) The logic is analogous to part a, but success probability is p1 +p2 −p1 p2 .

(c)

p
P (X1 = X2 ) = (((1 − p)2 )k−1 p2 ) =
X

k=1 2−p

(d) By symmetry,

1 − ( 2−p
p
) 1−p
P (X1 < X2 ) = =
2 2−p

4.2.4 problem 28
Let Ik be the indicator variable for the k-th location, so that Ik = 1 if k-th
location has a treasure and Ik = 0 otherwise.
CHAPTER 4. EXPECTATION 82

Let X be number of locations William checks to get t treasures, and


Xj ∼ HGeom(t − 1, n − 1 − (t − 1), j) be the number of treasures found
within j checked locations.
By symmetry. P (Ik = 1) = nt . Then,

    
t−1 n−1−t+1 n−t
t t−1 k−1−t+1 t k−t
P (X = k) = P (Ik = 1)P (Xk−1 = t − 1) = 
n−1
 = 
n−1

n n
k−1 k−1

 
n−t
n n
t k−t (n + 1)t
E(X) = kP (Ik = 1)P (Xk−1 = t − 1) = (k n−1 ) =
X X

k=t k=t n t+1


k−1

4.2.5 problem 29
Random variable f (X) takes values that are the probabilities of a random
value taken by X. Since X ∼ Geom(p), f (X) ∈ {(1 − p)k p|k ∈ Z≥0 }, and
each value (1 − p)k p of f (X) occurs with probability (1 − p)k p. Thus,

p
E(X) = ((1 − p)k p)2 = −
X

k=0 p−2
for |p − 1| < 1.
2

4.2.6 problem 30
(a)

e−λ (λ)x
E(Xg(X)) =
X
xg(x)
x=0 x!

e−λ (λ)x
=
X
xg(x)
x=1 x!

e−λ (λ)x−1

X
g(x)
x=1 (x − 1)!

e−λ (λ)x
=λ g(x + 1) = λE(g(X + 1))
X

x=0 (x)!
CHAPTER 4. EXPECTATION 83

(b)

E(X 3 ) = E(XX 2 )
= λE((X + 1)2 )
= λ(E(X 2 ) + E(2X) + 1)
= λ(λE(X + 1) + 2λ + 1) = λ(λ(λ + 1) + 2λ + 1)
= λ(λ2 + 3λ + 1)

4.2.7 problem 31
(a)
p + (1 − p)Poiss(X = k) k=0
(
P (X) =
(1 − p)Poiss(X = k) k>0

(b) First, notice that (1 − I)Y ∈ {0, 1, 2, . . . }. (1 − I)Y = 0 if I = 1,


or Y = 0. Thus P ((1 − I)Y = 0) = p + (1 − p)P (Y = 0). For any
other value k of (1 − I)Y , it is achieved if I = 0 and Y = k. Thus,
P ((1 − I)Y = k) = (1 − p)P (Y = k).
−λ λk
(c) E(X) = (1 − p) = (1 − p)E(Poss(λ)) = (1 − p)λ.
P∞
k=1 ke k!
E(X) = E((1 − I)Y ) = E(1 − I)E(Y ) = (1 − p)λ.
k
(d) Var(X) = E(X 2 ) − (E(X))2 . E(X 2 ) = (1 − p)e−λ ∞ k=1 k k! = (1 −

P

p)λ(1 + λ). Thus, Var(X) = (1 − p)λ(1 + λ) − ((1 − p)λ)2 = (1 − p)λ(1 +


pλ).

4.2.8 problem 33
Suppose w = r = 1. The white ball is equally likely to be any of the w + b
balls. Also, note that the event k-th drawn ball is the white ball is equivalent
to the event k−1 black balls are drawn until the white ball is drawn. Thus, for
X ∼ NHGeom(1, n − 1, 1), P (X = k) = P (k + 1-th drawn ball is white) = n1
for 0 ≤ k ≤ n − 1.
(1+k−1)(1+n−1−1−k)
P (X = k) = 1−1 1+n−11−1 = n1
( 1 )
CHAPTER 4. EXPECTATION 84

4.3 Indicator r.v.s


4.3.1 problem 38
Let Ij be the indicator random variable for j-th person drawing the slip of
paper containing their name.
Let X = nj=1 Ij be the number of people who draw their name. Then,
P

by linearity of expectation, E(X) = E( nj=1 Ij ) = nj=1 E(Ij ) = nj=1 n1 = 1.


P P P

4.3.2 problem 39
Let Ij,1 and Ij,2 be the indicator random variables for the j-th person being
sampled by the first and second researchers respectively.
(N −1) (N −1)
P (Ij,1 = 1) = m−1 . P (Ij,2 = 1) = n−1 . Since sampling is done
(m)
N
(Nn )
(N −1)(Nn−1−1
)
independently by the two researchers, P (Ij,1 = 1, Ij,2 = 1) = m−1 .
( N
m )(N
n )
Let X = j=1 (Ij,1 Ij,2 ) be the number of people sampled by both re-
Pn

searchers. Then,
     
n n n N −1 N −1 N −1 N −1
m−1 n−1 m−1 n−1
E(X) = E( (Ij,1 Ij,2 )) = E(Ij,1 Ij,2 ) = =n
X X X
     
N N N N
j=1 j=1 j=1 m n m n

4.3.3 problem 40
Let Ij be the indicator random variable for HTH pattern starting on the j-th
toss. Since the tosses are independent, P (Ij = 1) = 18 for 1 ≤ j ≤ n − 2.
Let X = n−2 j=1 Ij be the number of HTH patterns in n independent coin
P

tosses. Then,
n−2 n−2 n−2
1 n−2
E(X) = E( Ij ) = E(Ij ) = =
X X X

j=1 j=1 j=1 8 8

4.3.4 problem 41
Let Ij be the indicator variable for j-th card being red. Let Rj = Ij Ij+1 be the
indicator variable for the j-th and j + 1-st cards being red. Let X = 51
P
j=1 Rj
CHAPTER 4. EXPECTATION 85

be the number of consecutive red pairs in a well shuffled deck of 52 cards.


Then,    
51 51 51 26 26
2 2
E(X) = E( Rj ) = E(Rj ) = = 51 52
X X X
 
52
j=1 j=1 j=1 2 2

4.3.5 problem 42
Let Ij be the indicator variable for the j-th toy being of a new type. The
number of toy types after collecting t toys is X = tj=1 Ij . P (Ij = 1) =
P

( n−1
n
)j−1 . Thus,
t t t
n − 1 j−1 n−1 t
E(X) = E( Ij ) = E(Ij ) = ( ) = n − n( )
X X X

j=1 j=1 j=1 n n

4.3.6 problem 43
(a) This problem is a special case of problem 42 with t = k and n−1 floors.
Thus, the expected number of stops is (n − 1) − (n − 1)( n−2
n−1
)k .
(b) Let Ij be the indicator variable for the j-th floor being selected for
2 ≤ j ≤ n. Then, the number of stops is X = nj=2 Ij . Thus,
P

n n n
E(X) = E( Ij ) = E(Ij ) = (1 − (1 − pj )k )
X X X

j=2 j=2 j=2

4.3.7 problem 45
Notice that n
I(A1 ∩ A2 · · · ∩ An ) ≥ I(Ai ) − n + 1
X

i=1
because the left-hand side is either 0, or 1, so the question reduces to whether
the left-hand side is ever 0, while the right-hand side is 1. Notice that this
is not possible, becuase if the left-hand side is 0, then Aj = 0 for some j.
Thus, ni=1 I(Ai ) < n =⇒ R.H.S. < 1.
P

Then,

n n n n n n
Ai ) ≥ I(Ai )−n+1 =⇒ E(I( Ai )) ≥ E( I(Ai )−n+1) =⇒ P ( Ai ) ≥ P (Ai )−n+1
\ X \ X \ X
I(
i=1 i=1 i=1 i=1 i=1 i=1
CHAPTER 4. EXPECTATION 86

4.3.8 problem 46
Let X ∼ NHGeom(4, 48, 1) be the number of non-aces before the first ace.
Then, E(X) = w+1 rb
= 1∗48
5
= 9.6.
Let Y ∼ NHGeom(4, 48, 2) be the number of non-aces before the second
ace is drawn. Then, E(X) = w+1 rb
= 2∗48
5
= 19.2
Let Z = Y −X. Notice that Z represents the number of non-aces between
the first and the second ace. E(Z) = E(Y ) − E(X) = 19.2 − 9.6 = 9.6.

4.3.9 problem 47
(a) Let X = 52 I be the number of cards that are called correctly.
P
P52 i=1 i
E(X) = i=1 P (Ii = 1) = 52 52
1
= 1.

(b) Source: https://math.stackexchange.com/a/1078747/649082


Let X = 52 I be the number of cards that are called correctly.
P
P i=1 i
E(X) = 52 i=1 P (Ii = 1). To find P (Ii = 1), consider the first i cards,
with the i-th card correctly guessed. Let k be the number of correctly
guessed cards within the i cards. For instance, for i = 5, k = 2, Y
representing a correctly guessed card and N representing an incorrectly
guessed card, one possible sequence of i draws is N Y N N Y .

51 1 50 49 1 1 1
P (N Y N N Y ) = × = ×
52 51 51 50 49 52 51
Notice that the second N in the sequence has probability 50 51
, because
the second card is guessed correctly. The only piece of information we
have is that the third card is not the card that was correctly guessed,
leaving a total of 51 possibilities. Generalizing, the probability of a
string of length i with k Y s is (52−k)!
52!
. There are k−1
i−1
strings of length
i with k Y s that end in a Y , and since 1 ≤ k ≤ i,
i
i − 1 (52 − k)!
!
P (Ii = 1) =
X

k=1 k−1 52!

Thus,
CHAPTER 4. EXPECTATION 87

i
52 X
i − 1 (52 − k)!
!
E(X) =
X

i=1 k=1 k−1 52!


52 X52
i − 1 (52 − k)!
!
=
X

k=1 i=k k−1 52!


52
(52 − k)! X52
i−1
!
= ( )
X

k=1 52! i=k k − 1


52
(52 − k)! 52
!
= ( )
X

k=1 52! k
52
1
=
X

k=1 k!

P∞ xi
Note that ex = i=0 i! =⇒ e1 ≈ 1 + E(X) + 10−15 . Thus,

E(X) ≈ e − 1

(c) Since at any given time, we know all the cards remaining in the deck,
the probability of the i-th card being the card guessed correctly is
. Thus, E(X) = 52 i=1 E(Ii ) = i=1 P (Ii = 1) = i=1 52−i+1 =
1 P P52 P52 1
52−i+1
i=0 52−i ≈ 4.54.
P51 1

4.3.10 problem 49
Let Ij be the indicator variable for the j-th prize being selected. The value
recieved from the j-th prize is jIj . Then, the total value X is nj=1 (jIj ).
P

(n−1) Pn
E(X) = nj=1 jP (Ij = 1) = nj=1 j k−1 = j=1 j nk = nk n(n+1) = k(n+1)
P P
.
(nk) 2 2

4.3.11 problem 50
Let C1 be a random chord that spans a minor arch of length x on a circle
of radius r. To generate a chord C2 , with endponts A and B, such that C2
intersects C1 , either A is on the minor arch and B is on the major arch, or
A is on the major arch and B is on the minor arch.
CHAPTER 4. EXPECTATION 88

Let Ix be the indicator variable for C2 intersecting C1 when the minor


arch generated by C1 has length x. Then, P (Ix = 1) = 2 2π∗r
x
∗ 2π∗r−x
2π∗r
.
Since the length of the minor arch x generated by C1 can span from 0 to
2π ∗ r, we integrate P (Ix = 1).
1 Z 2π∗r x 2π ∗ r − x 1
2 ∗ dx = .
2π ∗ r 0 2π ∗ r 2π ∗ r 3

4.3.12 problem 52
Let Ij be the indicator variable for the j-toss landing on an outcome different
from the previous toss for 2 ≤ j ≤ n. Then, the total number of such
tosses is X = nj=2 Ij . The total number of runs is Y = X + 1. Since
P

E(X) = nj=2 P (Ij = 1) = nj=2 21 = n−1 , E(Y ) = n−1 + 1 = n+1 .


P P
2 2 2

4.3.13 problem 53
Let Ij be the indicator variable for tosses j and j + 1 landing heads for
1 ≤ j ≤ 3. Then, the expected number of such pairs is E(X) = 3j=1 P (Ij =
P

1) = 3p2 . Var(X) = E(X 2 ) − 9p4 .


E(X 2 ) = E(( 3j=1 Ij )2 ) = E((I1 + I2 )2 + 2(I1 + I2 )I3 + I32 ) = E(I12 + 2I1 I2 +
P

I22 + 2I1 I3 + 2I2 I3 + I32 ).


Note that Ij2 = Ij .
Note that E(I1 I2 ) = E(I2 I3 ) = p3 as these require 3 consecutive heads
to equal 1, but E(I1 I3 ) = p4 as this requires 4 consecutive heads to equal 1.
Thus, E(X 2 ) = (p2 + 2p3 + p2 + 2p4 + 2p3 + p2 ) = 4p3 + 3p2 + 2p4 .
Thus,
Var(X) = 4p3 + 3p2 − 7p4

4.3.14 problem 54
(Nn−1
−1
)1
(a) Since P (Wj = yk ) for 1 ≤ k ≤ N is = n 1
= 1
,
(Nn ) n N n N

1 XN
E(Wj ) = yk .
N k=1

Thus,
CHAPTER 4. EXPECTATION 89

1X n
1 X
N
E(W ) = ( yk )
n j=1 N k=1
1 XN
= yk
N k=1
=y

(b) Since
1X N
W = (Ij yj )
n j=1
where Ij is the indicator variable for the j-th person being in the sam-
ple. Then,

1X N
n
E(W ) = yj
n j=1 N
1 XN
= yk
N k=1
=y

4.3.15 problem 56
(a) Let Ij be the indicator variable for shots j to j + 6 being successful.
The total number of succesful, consecutive, 7 shots is X = i=1 Ij .
Pn−6

Then,
n−6 n−6 n−6
E(X) = E(Ij ) = P (Ij = 1) = p7 = (n − 6)p7
X X X

i=1 i=1 i=1

(b) Thinking of each block of 7 shots as a single trial with probability p7 of


success, let Y ∼ Geom(p7 ) be the number of failed 7-block shots taken
until the first succesful 7-shot block. Then,

7 − 7p7 7p7 + 7 − 7p7 7


E(X) = 7(1 + E(Y )) = 7 + 7
= 7
= 7
p p p
CHAPTER 4. EXPECTATION 90

Note that it is possible that a consecutive sequence of 7 shots could


happen "between" blocks - for example, this way of solving the problem
does not consider the scenario where shots 2 to 8 are made. Therefore,
the above calculation is a "worst case scenario" that assumes the con-
secutive 7 made shots must always happen in the last possible block
- the actual number of blocks (and therefore shots) taken to make 7
consecutive shots is strictly less than or equal to the above calculated
expectation.

4.3.16 problem 59
(a) WLOG, let m1 > m be the second median of X. Then, by the definition
of medians, P (X ≤ m) ≥ 12 and P (X ≥ m1 ) ≥ 12 . Then, P (X ∈
(m, m1 )) = 0. If m1 > m + 1, then there exists an m2 ∈ (m, m1 ), such
that P (X = m2 ) = 0. This implies that m2 = 1, since that is the only
value of X with probability 0. However, then m < 1, which precludes
m from being a median. Thus, m1 must be 1 + m. Since we know 23
to be a median of X, we need to check whether 22 or 24 are medians
of X. Computation via the CDF of X shows that niether 22, nor 24
are medians. Hence, 23 is the only median of X.

(b) Let Ij be the indicator variable for the event X ≥ j. Notice that the
event X = k (the first occurance of a birthday match happens when
there are k people) implies that Ij = 1 for j ≤ k and vice versa. Thus,
366
X=
X
Ij
j=1

.
Then,
366 366 366
E(X) = P (Ij = 1) = 1 + 1 + P (Ij = 1) = 2 +
X X X
pj
j=1 j=3 j=3

(c) 2 + 22.61659 = 24.61659

(d) E(X 2 ) = E(I12 + · · · + I366 +2 i=1 (Ii Ij )). Note that Ii2 = Ii and
2 P366 Pj−1
j=2
Ii Ij = Ij for i < j. Thus,
CHAPTER 4. EXPECTATION 91

366 j−1
E(X ) = E(I1 + · · · + I366 + 2
2
Ij )
X X

j=2 i=1
366 366
=2+ pj + 2 ((j − 1)E(Ij ))
X X

j=3 j=2
366 366
=2+ pj + 2 ((j − 1)pj )
X X

j=3 j=2

≈ 754.61659

Var(X) ≈ 754.61659 − (E(X))2 ≈ 754.61659 − 605.98 = 148.63659.

4.3.17 problem 60
(a) By the story of the problem, X ∼ NHGeom(n, N − n, m). Then,
Y = X + m.
−n)
(b) According to part a, E(Y ) = E(X) + m = m(N n+1
+ m. The implied
indicator variables are the same as in the proof of the expectation of
Negative Hypergeometric random variables.

(c) The problem can be modeled with a Hypergeometric random variable


−n)
Z ∼ HGeom(n, N − n, E(Y )). Then, E(Z) = E(Y ) Nn = ( m(Nn+1
+
m) N = m × n+1 × N . Since n+1 × N < 1 =⇒ (n + 1)N (n − N ) <
n N +1 n N +1 n

0 =⇒ (n+1)N
N −n
> 0 =⇒ n < N for positive n and N , E(Z) < m.

4.4 LOTUS
4.4.1 problem 62
(2λ)k
E(2X ) = 2k P (X = k) = e−λ = e−λ e2λ = eλ .
P∞ P∞
k=0 k=0 k!

4.4.2 problem 63
E(2X ) = 2k (1 − p)k p = p k=0 (2 − 2p)k = when 2 − 2p < 1 =⇒
P∞ P∞ p
k=0 2p−1
p > 12 .
CHAPTER 4. EXPECTATION 92

E(2−X ) = ∞ k=0 2 (1 − p) p = p k=0 ( 2 ) = = when <1


P −k k P∞ 1−p k p 2p 1−p
1+p 1+p 2
2
which is always true.

4.5 Poisson approximation


4.5.1 problem 69
 
There are 1000
2
pairs of sampled individuals - each pair has a 1016 chance of
being the same person. Therefore, we can estimate the "rate of occurrence"
106
of a pair being the same person as 10002
∗ 1016 = 1000∗999
2∗106
≈ 2∗10 6 = 1/2.

Therefore, the number of pairs in the sample that are the same person can
be approximated by Pois(1/2).

Then the probability that there is at least one pair in the sample that are
the same person is 1 − e−0.5 = 0.393. This can be verified as a close approx-
imation in R - the probability that every individual in the sample is unique
is the last value resulting from the command cumprod(1-(0:999)/1000000),
which is .6067. 1 minus this value gives .3933, the actual probability some two
sampled individuals are the same person, which is very close to our Poisson
approximation.

4.5.2 problem 71
Let Ij be the indicator random variable for pair j having the aforementioned
property. P (Ij = 1) = 365 1
2 , under the assumption that the probability

of being born on a particular day is 365 1


. Note that since we don’t know
anything about the age of the kids, we are assuming their mothers are also
equally likely to be born on any of the 365 days.
Then, the
 expected number of pairs with the aforementioned property is
E(X) = 90 1
2 3652
≈ 0.03.
Let Z ∼ Poiss(0.03) model the distribution of pairs with the desired
property. Then, probability that there is at least one such pair is 1 − P (Z =
0) = 1 − e−0.03 ≈ 1 − (1 − 0.03) = 0.03 = 100
3
.
CHAPTER 4. EXPECTATION 93

4.5.3 problem 72
(a) Suppose the population consists of n people (excluding me). Let Ij be
the indicator variable for the j-th person having the same birthday as
me. Then, the expected number of people with the same birthday as
me is E(X) = ni=1 P (Ij = 1) = ni=1 365 1
= 365
n
.
P P

Let Z ∼ Poss( 365 n


) model the distribution of the number of people in the
population with the same birthday as me. Then, the probability that
there is at least one person with the same birthday as me is 1 − P (Z =
n
0) = 1 − e− 365 .

n n
1 − e− 365 >= 0.5 =⇒ > − ln(0.5) =⇒ n > 252
365

(n2 ) (n2 )
(b) By similar logic to part a, E(X) = 365×24
. 1 − P (Z = 0) = 1 − e − 365×24
.
 
(n2 ) n
2
1−e − 365×24
>= 0.5 =⇒ > − ln(0.5) =⇒ n > 110
365 × 24
(c) Since Poisson approximation is completely determined by the expecta-
tion of the underlying random variable, we need to increase the popula-
tion size so that the expectation of the number of pairs with the desired
property is the same as the expectation of the number of pairs with
the same birthday when population size is 23. Since, E(X) = 24 1
E(Y ),
where Y is the number of pairs of people that share a birthday, the
population needs to be increased to have 24 times more pairs.

23
! !
n
= 24 ∗ =⇒ n ≈ 110
2 2
.
(d) Let X be the number of triplets with the same birthday. Let Ij be
the indicator random
  variable for triplet j having the same birthday.
Then, E(X) = 100 3
( 365
1 2
) ≈ 1.21. Then, X can be approximated with
Z ∼ Poiss(1.21). P (at least one triplet with the same birthday) ≈ 1 −
P (Z = 0) = 1 − e−1.21 ≈ 0.7.
Another way to approximate the desired probability is to let Ij be the
indicator variable that there is a triplet born on day j. P (Ij = 1) =
CHAPTER 4. EXPECTATION 94
 
1 − (( 364
365
)100 + 100 365 ( 365 ) + 100
1 364 99 1
( 364 )98 ) ≈ 0.003. Then, the
2 3652 365
expected number of days for which there is a triplet born on that day
is approximately equal to 365 ∗ 0.003 = 1.095.
Then, the probability that there is at least one triplet born on the same
day can be approximated using Z ∼ Poiss(1.095) - the number of days
for which there is a triplet born on that day. The desired probability
is 1 − P (Z = 0) = 1 − e−1.095 ≈ 0.66.
Thus, the second method is a closer approximation for the desired
probability.

4.5.4 problem 73
(a) Let X be the number of people that play the same opponent in both
rounds. Let Ij be the indicator variable that person j plays against the
same opponent twice. P (Ij = 1) = 99 1
. Then, E(X) = 100 j=1 P (Ij =
P

1) = 100/99.

(b) There is a strong dependence between trials. For instance, if we know


that the first 50 players played the same opponent twice, then all of the
players played the same opponents twice. Moreover, knowing each of
the Ii gives us perfect information about one other I - they are strongly
pairwise dependent.

(c) Consider the 50 pairs that played each other in round one. Let Ij be
the indicator variable for pair j playing each other again in the second
round. P (Ij = 1) = 991
. Then, the expected number of pairs that play
the same opponent twice is E(Z) = 50 99
≈ 12 .
We can approximate the number of pairs that play against one another
in both rounds with Z ∼ Poiss( 12 ). Note that X = 2Z. P (X = 0) ≈
1
P (Z = 0) = e− 2 ≈ 0.6.
1
( 12 )1 e− 2
P (X = 2) ≈ P (Z = 1) = 1!
≈ 0.32.
Note that the approximation in part C is more accurate - the indepen-
dence of the same pairs playing against each other is much stronger
than the independence of individuals who play the same opponent.
Knowing that the players in Game 1 of round 2 played against each
other in round 1 gives us very little information about whether players
CHAPTER 4. EXPECTATION 95

in any other games also played against each other. Whereas, knowing
that Player 1 in round 2 plays against the same player (say, player
71) guarantees that we know that that player 71 also plays against the
same player.

4.6 Mixed practice


4.6.1 problem 79
(a) Let X ∼ F S( m1 ) be the number of guesses made by the hacker. Then,
E(X) = m.

(b) Suppose w1 , w2 , w3 , . . . , wm is the sequence of passwords sampled by


the hacker. Since, every permutation of the m words is equally likely,
the probability that the correct password is wi is (m−1)! m!
= m1 . Then
E(X) = m1 m i=1 i = m
1 m(m+1)
= m+1 .
P
2 2

(c) Both m and m+12


are positively sloped lines, intersecting at m = 1. For
m = 2, m > 2 . Thus, m > m+1
m+1
2
for all m > 1. This makes intuitive
sense since when the hacker samples passwords without replacement,
the number of possible passwords reduces.

(d) With replacement, P (X = k) = ( m−1


m
)k−1 m1 for 1 ≤ k < n and P (X =
n) = ( m−1
m
)n−1 m1 + ( m−1
m
)n .
In the case of sampling without replacement, since all orderings of the
passwords sampled by the hacker are equally likely, P (hacker samples k passwords) =
m
1
for 1 ≤ k < n, and P (hacker samples n passwords) = m1 + m−n m
.

4.6.2 problem 80
(a) X ∼ FS( 20−m+1
20
) =⇒ E(X) = 20−m+1
20

√ √ √ m−1 i−1 20−m+1


(b) E( X) = x∈X xP (X = x) = 20 i( 20 )
P P
i=1 20

4.6.3 problem 86
(nxA )(nyB )(nzC )
(a) P (X = x, Y = y, Z = z) =
(mn )
CHAPTER 4. EXPECTATION 96

(b) Let Ij be the indicator variable for person j in the sample being a
member of party A. Then, X = m =⇒ E(X) = m nnA by
P
i=1 Ii
symmetry.
(c) Let’s find E(X 2 ). If we square the expression for the sum of X’s con-
stituent indicator r.v.s, we get

E(X 2 ) = E(Ii2 ) + 2 ∗ E(Ii Ij )


Pm P
i=1 i<j,1≤j≤m

Since Ii2 = Ii , we have E(Ii2 ) = m∗nA


Pm
i=1 n

Additionally, for any pair i, j, the r.v. Ii Ij equals 1 only when some
pair of samples are both members
  of party A, which occurs with prob-
(nA −1)
ability nAn(n−1) . There are m2 pairs i, j. Therefore, the expression
nA m(m−1)(nA −1)
2∗ E(Ii Ij ) evaluates to .
P
i<j,1≤j≤m n(n−1)

Finally, we have E(X 2 ), so now we can write


m ∗ nA nA m(m − 1)(nA − 1) m2 n2A
V ar(X) = E(X 2 ) − EX 2 = + −
n n(n − 1) n2

When m = 1, Var(X) = nA
n
− ( nnA )2 = nA
n
(1 − nA
n
) = nA
n
× nB +nC
n
.
When m = n, Var(X) = 0. This makes sense, as if the sample is the
entire population, we always get the same number of members of party
A in our sample (all of them), so there is no variation.

4.6.4 problem 87
(a) Let Ij be the indicator random variable for j person in the sample being
a democrat. Let X be the total number of democrats in the sample.
Then, E(X) = cj=1 Ij = c 100d
P

(b) Let Ij be the indicator random variable for state j being represented by
(98c )
at least one person in the sample. Then P (Ij = 1) = 1 − 100 . Then,
(c)
the expected number of states represented in the sample is E(X) =
(98c )
50(1 − 100 ).
(c)
CHAPTER 4. EXPECTATION 97

(c−2
98
)
(c) Similarly to part b, E(X) = 50 100 .
(c)
(50k )(20−k
50
)
(d) P (X = k) = for 0 ≤ k ≤ 20.
( 20 )
100

Letting Ij be the indicator variable for person j in the sample being a


junior senator of a state, E(X) = 20 100 50
= 10.

18)
(98
(e) Similar to part b, E(X) = 50 100 .
( 20 )

4.6.5 problem 88
(a) X ∼ Geom( g+b
g
) =⇒ E(X) = b
g

(b) The answer should be equal to b


g
, as since X is geometrically dis-
tributed.

4.6.6 problem 89
(a) Since E(NC ) = 115pC , Var(NC ) = k 2 pkC − (115pC )2 .
P115
k=0

(b) Let Ij be the indicator random variable that CATCAT starts at position
j. Then, the expected number of CATCAT is E(X) = 110(pC pA pT )2 .
(c) In a sequence of length 6, the desired options are CATxxx, xxxCAT.
Thus, P (at least one CAT) = 2(pC pA pT (1 − pC pA pT )) + (pC pA pT )2 .

4.6.7 problem 90
(a) Let Ij be the indicator variable that j person in Bob’s sample is also
sampled by Alice. Then, P (Ij = 1) = 10 1
. Then, the expected number
of people in Bob’s and Alice’s samples is 2.
(b) |A ∪ B| = 100 + 20 − |A ∩ B| =⇒ E(|A ∪ B|) = 100 + 20 − E(|A ∩ B|) =
100 + 20 − 2 = 118.
(c) Let Ij be the indicator random variable for couple j being in Bob’s
(998
18 )
sample. Then P (Ij = 1) = 1000 . Thus, the expected number of
( 20 )
(998
18 )
couples in Bob’s sample is E(X) = 500 1000 ≈ 0.2.
( 20 )
CHAPTER 4. EXPECTATION 98

4.6.8 problem 91
(a) If F = G, Then, Xj is equally likely to be in any of the m + n positions
in the ordered list.
m m
(m + n)(m + n + 1) 1 m+n+1
E(R) = E(Rj ) = =m
X X
.
j=1 j=1 2 m+n 2

(b) Rj = ( nk=1 IYk + k̸=j IXk + 1) where IYk are the indicator random
P P

variables for Xj being larger than Yk and IXk are the indicator random
variables for Xj being larger than Xk . Note that E(IYk ) = p for all k
since the Ys are iid, and E(IXk ) = 1/2 - Xj and Xk are iid and never
equal, so they are equally likely to be bigger or smaller than the other.
Then E(Rj ) = np + (m − 1)/2 + 1.Thus, E(R) = m(np + (m − 1)/2 + 1).

4.6.9 problem 92
(a) Let S be the sum of the ranks of the dishes we eat during both phases.
S = (m − k + 1)X + k−1 j=1 Rj , where Rj is the rank of dish j, excluding
P

the highest ranked dish, from the exploration phase. Since E(Rj ) =
(X−2
k−2 )
(X−1)X
× (k−1)(X−1
= (X−1)X 1
× X−1 = X2 , E(S) = (m − k + 1)E(X) +
k−1 )
2 2

(k − 1) E(X)
2
= (m − k)E(X) + (k + 1) E(X) 2
.

(x−1
k−1)
(b) P (X = x) = .
(nk)
CHAPTER 4. EXPECTATION 99

(c)

1 X
n
i−1
!
E(X) = n i
i=k k−1
k
1 X
n
!
i
= n k
i=k k
k
n
!
k X i
= n
ki=k k

n+1
!
k
= n
k+1
k
k(n + 1)
=
k+1

(d) Plugging k(n+1)


k+1
into the result of part b and derivating (m − k) k(n+1)
k+1
+
q
k n+1
2
with respect to k provides an extremum of k = 2(m + 1) − 1.
Chapter 5

Continuous Random Variables

5.1 PDFs and CDFs . . . . . . . . . . . . . . . . . . . . . . . . . . 100


5.2 Mixed Practice . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4 Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.5 Uniform and Universality . . . . . . . . . . . . . . . . . . . . . 114

5.1 PDFs and CDFs


5.1.1 problem 1
The PDF is
2 /2
f (x) = xe−x
Z a
P (x ≤ a) = f (x)dx (5.1)
Z0a
2 /2
= xe−x dx (5.2)
0
2 /2
= 1 − e−a (5.3)

(a)

P (1 < X < 3) = P (X ≤ 3) − P (X ≤ 1) (5.4)


=e −1/2
−e −9/2
(5.5)

100
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 101

(b) For first quantile q1


1
P (X ≤ q1 ) =
4
2 1
1 − e−q1 /2 =
4
q1 = 0.54
For second quantile q2
2
P (X ≤ q2 ) =
4
2 2
1 − e−q2 /2 =
4
q1 = 0.83
For third quantile q3
1
P (X ≤ q3 ) =
4
2 3
1 − e−q3 /2 =
4
q3 = 1.17

5.1.2 problem 2
Take Unif(0, 12 ),
1
f (x) = {2, x ∈ (0, )}
2
Z
f (x)dx = 1
C

where C is the complete space. Say, f (x) > 1 in a given domain X


f (x) > 0, so (x)dx ≤ (x)dx
R R
DRf C f
Here D 1dx < D f (x)dx
R

We can say, |D| < 1


CHAPTER 5. CONTINUOUS RANDOM VARIABLES 102

5.1.3 problem 3
(a) The new PDF is,
g(x) = 2F (x)f (x)
g(x) ≥ 0 in the same range as f
Z ∞ Z ∞
g(x)dx = 2F (x)f (x)dx (5.6)
−∞ −∞
Z ∞
= d(F 2 (x)) (5.7)
−∞
= x→∞
lim F 2 (x) − lim F 2 (x) (5.8)
x→−∞
=1−0 (5.9)

(b) The new PDF is,


1
g(X) = (f (x) + f (−x))
2

Z ∞
1Z ∞ 1Z ∞
g(x)dx = f (x)dx + f (x)dx (5.10)
−∞ 2 −∞ 2 −∞
=1 (5.11)

5.1.4 problem 5
a. We have A = πR2 , so E(A) = πE(R2 ). We have E(R2 ) = ∗ 1 dx =
R1 2
0 x
1/3, since the PDF of R is always 1. Then E(A) = π/3.

We have V ar(A) = E(A2 ) − E(A)2 = π 2 E(R4 ) − π 2 /9 using linearity.


E(R4 ) = 01 x4 ∗ 1 dx = 1/5, so V ar(A) = π 2 /5 − π 2 /9 = 4π 2 /45
R

q q
b. CDF: P (A < k) = P (πR2 < k) = P (R < k/π) = k/π for
0 < k < π using the CDF of Unif(0,1). The CDF of A is 0 for k < 0 and
k > π.
q
PDF: d
dk
( k/π) = √1
2 kπ
for 0 < k < π and 0 elsewhere.
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 103

5.1.5 problem 6
a. For X ∼ U nif (0, 1),
√ F (k) = k for 0 < k < 1, E(X) = 1/2, V ar(X) =
1/12, ST D(X) = 1/2 3.
Then
√ the probability
√ X is√within one√ standard deviation of its mean is
P ( 2 3 < X < 2 3 ) = F ( 23+1
3−1
√ 3+1
√ √ ) − F ( 3−1
3
√ ) = √1 .
2 3 3

The probability that X is within two standard deviations


√ of its mean is
1, as the mean plus two standard deviations 1/2 + 1/ 3 exceeds 1 and the
mean minus two standard deviations is less than 0 - since X always takes val-
ues between 0 and 1, X is always within 2 standard deviations of its mean.
Similarly, it is always within 3 standard deviations of the mean.

b. We have E(X) = 1 and V ar(X) = 1. F (k) = 1 − e−k . Also note that


P (X < 0) = 0 for an exponential distribution.

1 standard deviation: P (0 < X < 2) = F (2) − F (0) = F (2) = 1 − e−2


2 standard deviation: P (−1 < X < 3) = P (0 < X < 3) = F (3) − F (0) =
F (3) = 1 − e−3
3 standard deviation: P (−2 < X < 4) = P (0 < X < 4) = F (4) − F (0) =
F (4) = 1 − e−4

c. If Y ∼ Expo(1/2), then Y = 2X where X ∼ Expo(1), E(Y ) = 2,


V ar(Y ) = 4, ST D(Y ) = 4. In general, we note that if Y ∼ Expo(λ) then
Y = X/λ and E(Y ) = 1/λ and ST D(Y ) = 1/λ.

Then we can realize the following pattern: the probability that Y is n


standard deviations away from its mean is P (−(n − 1)/λ < Y < (n + 1)λ) =
P (0 < Y < (n + 1)/λ) = F ((n + 1)/λ) = 1 − eλ(n+1)/λ = 1 − en+1

5.1.6 problem 7
a. F(x) is continuous at its given endpoints: F (1) = π2 arcsin(1) = 2π
π2
=1
and F (0) = 0, and F(x) is differentiable between 0 and 1:

b. F ′ (x) = f (x) = 2 d
π dx
(arcsin( x)) = π2 ( √1−x
1
)( √1x )

This is a valid PDF despite the discontinuities at 0 and 1 as the integral


CHAPTER 5. CONTINUOUS RANDOM VARIABLES 104

of f (x) from 0 to 1 converges, and f (x) is always positive - the same reason
why √1x has a discontinuity at x = 0 but can be integrated from 0 to any
positive real number.

5.2 Mixed Practice


5.2.1 problem 50
By symmetry, and by X and Y being independent and identically distributed,
the chance that X should be smaller than Y should not be different than the
chance that Y should be smaller than X. Furthermore, from independence,
joint pdf, g(x, y) = fX (x)fY (y)
Z ∞ Z y
P (X < Y ) = fX (x)fY (y)dtdy (5.12)
y=−∞ t=−∞
Z ∞
= F (y)f (y)dy (5.13)
y=−∞
1
= (5.14)
2
When X and Y are not independent say X = Y + 1, (assume the existence
of such X and Y), then, P (X < Y ) = 0 and P (Y < X) = 1 When X and
Y are not identically distributed, say X ∼ Unif(0, 1) and Y ∼ Unif(−1, 0)
then, P (X < Y ) = 1 and P (Y < X) = 0

5.2.2 problem 51
(a) We know that, X 2 ≤ X with probability 1. So E[X 2 ] ≤ EX

V (X) = E[X 2 ] − E[X]2 (5.15)


≤ µ − µ2 (5.16)
1
≤ taking the minimum of the above quadratic (5.17)
4
(b) I have to show that V (X) = 1/4 leads to a unique distribution. From
(a), V (X) ≤ µ − µ2 ≤ 1/4 implies that, µ = 1/2 Now

E[(X − 1/2)2 ] = 1/4


CHAPTER 5. CONTINUOUS RANDOM VARIABLES 105

But
0 ≥ (X − 1/2)2 ≤ 1/4
with probability 1. To get E[(X − 1/2)2 ] = 1/4, we need (X − 1/2)2 = 1/4
with probability 1. So,

0 with prob p
X= (5.18)
1 with prob 1 - p

Using µ = 1/2 gives us, p = 1/2.

5.2.3 problem 52
Z ∞
E[X] = xf (x)dx (5.19)
0
Z ∞
x2
= x2 e− 2 dx (5.20)
0
1 Z ∞ 2 − x2
= x e 2 dx (5.21)
2 −∞
1
= (5.22)
2

Z ∞
E[X ] =
2
x2 f (x)dx (5.23)
Z0∞
x2
= x3 e 2 dx (5.24)
0
Z ∞
= 2ueu du = 2 (5.25)
0
(5.26)

5.2.4 problem 56
Z ∞ Z x
x2
E[Z Φ(z)] =
2
x 2
e− 2 dx (5.27)
−∞ −∞

Now Z ∞ Z ∞
f (z)dz f (z)dz
−∞ −∞
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 106

if f (z)dz is meant to be lima→∞ f (z)dz So


R∞ Ra
−∞ −a

x2
Z
E[z 2 Φ(z)] = x2 Φ(x)e− 2 dx (5.28)
x2
Z
= x2 Φ(−x)e− 2 dx (5.29)
x2
Z
= x2 (1 − Φ(x))e− 2 dx (5.30)

So
Z
x2
E[Φ(z)z ] = x Φ(x)e − )dx
2 2 (
(5.31)
2
1 Z 2 − x2
= xe 2 (5.32)
2
1
= (5.33)
2
(b)
2 2 2
P (Φ(z) ≤ ) = P (z ≤ Φ−1 ( )) = Φ(Φ−1 ( ))
3 3 3
(c)
1
3!

5.2.5 problem 57
(a)

Φ(W )(t) = P (Φ(z)2 ≤ t) (5.34)



= P (Φ(z) ≤ t) (5.35)

= P (z ≤ Φ−1 ( t)) (5.36)

= Φ(Φ−1 ( t)) (5.37)

= t (5.38)

(b) E[W ] = 01 w3 fW (w)dw E[W ] = 01 Φ(z)6 ϕ(z)dz


R R

(c)√P (X + 2Y < 2Z + 3) = P (X + 2Y − 2Z < 3) here, X + 2Y − 2Z ∼


N (0, 12 + 22 + 22 ) So, P (X + 2Y < 2Z + 3) = Φ(1)
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 107

5.2.6 problem 58
(a)
1 Z ∞
E[Y ] = · 0 + xf (x)dx (5.39)
2 0
Z ∞
1 x2
= √ xe− 2 dx (5.40)
0 2π

(b) N is the first success distribution with p = 1


2
E[N ] = 2
(c) 
0 y<0
FY (y) =  (5.41)
Φ(y) y ≥ 0

5.2.7 problem 59
(a) Length biased sampling

L1 + L2 + L3 = 2π

E[L1 ] = E[L2 ] = E[L3 ] =
3
But our point is more likely to be a part of the longest arc. If there was a 1
3
chance of the point being in any one of the three points then E[L] = 2π 3
(b)
θ1 = Unif(0, 2π)
θ2 = Unif(0, 2π)
θ3 = Unif(0, 2π)

L1 = min(θ1 , θ2 , θ3 )
CDF,

F (y) = 1 − P (min(θ1 , θ2 , θ3 ) > y) (5.42)


2π − y 3
=1− (5.43)

CHAPTER 5. CONTINUOUS RANDOM VARIABLES 108

PDF,
d
f (y) = F (y) (5.44)
dy
3 y 2
= (1 − ) (5.45)
2π 2π
(c)

E[L] = 2E[L1 ] (5.46)


Z 2π
3 y 2
=2 y(1 − ) dy (5.47)
0 2π 2π
3 Z 2π y3 y2
= y + 2 − dy (5.48)
π 0 4π π
3 4π 2
1 16π 4
1 8π 3
= [ + 2 − ] (5.49)
π 2 4π 4 π 3
=π (5.50)

5.2.8 problem 61
(a) 
1 if k arrives when fun
Ik = (5.51)
0 if k arrives when not fun
Now I = I1 + · · · + In

E[I] = nE[I1 ] (5.52)


n
= (5.53)
3
as P (Ik = 1) = 1
3
(b)
P (I1 I2 ) = P (I1 )P (I2 )
Given that Jaime and Robert are guests 1 and 2

P (I1 I2 ) = P (both 1 and 2 arrive when fun)

Out of the possible 4! orderigns of Tyrion, Cersei, 1, and 2 for both 1 and 2
to arrive when fun, the following orderings are possible
Tyrion 1 2 Cersei Tyrion 2 1 Cersei Cersei 1 2 Tyrion Cersei 2 1 Tyrion
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 109

So P (I1 I2 ) = 1
4!
4 = 1
6

P (I1 I2 ) ̸= P (I1 )P (I2 )


(c)
We already know the answer. Conditioning on the event that 1 arrives
when it’s fun, the chances of 2 arriving when it’s fun are higher than the
unconditional probability of 2 arriving when it’s fun. When we have infor-
mation of 1 arriving when it’s fun, we know that there’s someone arriving
betweenTyrion and Cersei and this forces the conditional sample space to
have a skew towards having Tyrion and Cersei further apart than if we have
no information about 1. This skewing of the the conditional sample space
increasing chances that 2 arrives at a time when it’s fun.
However, the events ARE conditionally independent. If we know the
length of the interval of time between Cersei and Tyrion’s arrivals L, then
the probability of any other guest arriving at a fun time just becomes L. It
is no longer true that any order of Tyrion, Cersei, and some specific guest is
equally likely - for example, if we know Tyrion is the first guest and Cersei is
the last guest, it is obvious that the only possible ordering is Tyrion Guest
Cersei. Moreover, since the arrival times of other guests are independent of
each other, knowing that Jaime arrives at a fun time no longer makes it more
likely that Robert will arrive at a fun time - Robert still has to arrive in the
interval L, as opposed to Jaime’s arrival making the expected amount of fun
time larger. Therefore, the probability of both Jaime and Robert arriving
when it is fun, given that the amount of time between Cersei and Tyrion’s
arrival is L, is L2 .

5.2.9 problem 62
(a) 
1 if k sets a low or high record
Ik = (5.54)
0 if k doesn’t set a low or high record

P (I1 ) = 1
P (I2 ) = 1
2
P (I3 ) =
3
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 110

2
P (I4 ) =
4
and so on.
Now I = I1 + · · · + In
E[I] = 1 + 1 + 32 · · · 100
2

(b) 
1 if k sets a low followed by a high
Ik =  (5.55)
0 otherwise
1 1
P (Ik ) =
kk+1
Now I = I1 + · · · + In
E[I] = 1·2
1
+ · · · + 100·101
1

E[I] = 1 − 101
1

(c)

P (N > n) = P (all of 2 to n + 1 fall short of 1) (5.56)


1
= (5.57)
n+1

P (N = n) = P (N > n − 1) − P (N > n) (5.58)


1
= (5.59)
n(n + 1)

(d)

E[N ] = iP (N = i) (5.60)
X

i=1

1
= (5.61)
X

1 i+1

is unbounded.
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 111

5.3 Exponential
5.3.1 problem 37
a. We need to find the value of t such that F (t) = 1/2 - this will indicate
that there is a 1/2 chance that the particle has decayed before time t.

1 − e−λt = 1/2 implies ln(2) = λt, so t = ln(2)/λ

b. We need to compute P (t < T < t+ϵ|T > t) = P (t < T < t+ϵ)/P (T >
−λ(t+ϵ) )−(1−e−λt )
t). This is (1−e e−λt
= 1 − e−λϵ . Using the approximation given in
the hint and the assumption that ϵ is small enough that ϵλ ≈ 0, this is about
1 − (1 − ϵλ) = ϵλ.

c. P (L > t) = P (T1 > t)P (T2 > t)...P (Tn > t) = e−nλt , so L ∼
Expo(nλ). Therefore, if X ∼ Expo(1), we have L = X/nλ. Then since
E(X) = 1 and V ar(X) = 1, we can get E(L) = 1/(nλ) and V ar(L) =
1/(n2 λ2 )

d. M must be equal to the sum of D1 + D2 + D3 + ... + Dn , where Di is the


amount of time between the i − 1th and ith decay event. We observe that
Di must then be the minimum of n − i + 1 Expo(λ) variables - for example,
D1 is the first particle to decay out of n particles, D2 is the first particle
to decay out of the remaining n-1 particles, etc. Since Expo is memoryless,
Di+1 is independent of Di as the amount of time it takes for the next particle
to decay is not affected by the amount of time it took the previous particle
to decay. Therefore, Di ∼ Expo((n − i + 1)λ).

Then E(M ) = E(D1 ) + E(D2 ) + ... + E(Dn ) = λ1 ( i=1 (1/n)) = λ1 (ln(n) +


Pn

0.577)

5.3.2 problem 39
Let O1 , O2 , O3 be the offers received, all distributed as Expo(1/12000). We
want to find E(max(O1 , O2 , O3 )). Imagine ordering the offers as lowest,
middle, and highest price. Let D1 be the lowest price, let D2 be how much
more the middle price is than the lowest price, and D3 be how much more
the highest price is than the middle price. Then D1 is the minimum of 3
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 112

Expo(1/12000) variables (since it is just the minimum of the 3 offers), D2


is the minimum of 2 Expo(1/12000) variables (by the memoryless property,
given that all other offers are at least the lowest offer, the likelihood that
any offer is at least k more than the lowest is just the probability the of-
fer was at least k in the first place), and similarly, D3 is the minimum of
1 Expo(1/12000) variable. In addition, we realize that D1 + D2 + D3 must
equal the highest offer.

Then E(D1 + D2 + D3 ) = 1

+ 1

+ 1
λ
= 4000 + 6000 + 12000 = 22000.

5.3.3 problem 45
Let N be the number of emails received within the first 0.1 hours. Then
P (T > 0.1) = 1 − P (T < 0.1) = 1 − P (N ≥ 3) =
1−(1−P (N = 0)−P (N = 1)−P (N = 2)) = P (N = 0)+P (N = 1)+P (N =
2)

That is, to find the probability that it takes longer than 0.1 hours for
3 emails to arrive, we find the probability it takes less than 0.1 hours for 3
emails to arrive. To do this, we realize that this is equivalent to at least 3
emails arriving in the first 0.1 hours. And to find that probability, we realize
we can find the probabilities of exactly 0, 1, or 2 emails arriving in the first
0.1 hours.

Next, we note that N ∼ P ois(0.1 ∗ λ) = P ois(2), per the definition of a


Poisson process. Then we get

P (T > 0.1) = P (N = 0)+P (N = 1)+P (N = 2) = e−2 +2e−2 +4e−2 /2 = 5e−2 ≈ 0.68

5.4 Normal
5.4.1 problem 26
a. Let Tw ∼ N (w, σ 2 ) be the time it takes Walter to arrive, Tc ∼ N (c, 4σ 2 )
be the time it takes Carl to arrive. We have −Tw ∼ N (−c, σ 2 ) since flipping
the sign of the rv flips the sign of the expectation but does not change the
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 113

variance. Then Tc − Tw ∼ N (c − w, 5σ 2 ) per


√ the important fact given in the
problem. If Z ∼ N (0, 1), then Tc − Tw = 5σZ + c − w.

For Carl to arrive first, we require Tc − Tw < 0 (the time it takes Carl is
less than the time it takes Walter). Let us find this probability:
w−c w−c
P (Tc − Tw < 0) = P (Z < √ ) = Φ( √ )
σ 5 σ 5
√ ) >
b. If Carl has a greater than 1/2 chance of arriving first, then Φ( w−c
σ 5
1/2. Since Φ is an increasing function and equals 1/2 when its input is 0,
this implies we need w−c√ > 0, which in turn implies c > w. So, as long as
σ 5
Carl’s car lets him be faster on average than Walter’s walking, Carl has a
better than 1/2 chance of arriving first.

c. To make it to the meeting at time, either individual needs to make


sure the amount of time they take to arrive is less than w + 10.

w + 10 − c
P (Tc < w + 10) = P (2σZ + c < w + 10) = Φ( )

10
P (Tw < w + 10) = P (σZ + w < w + 10) = Φ( )
σ
Since Φ is an increasing function, if we want Carl to have a greater chance
than Walter to make it on time, then we require w+10−c 2σ
> 10
σ
. This then
implies that we need w > c + 10.

5.4.2 problem 35
Let g(Z) = max(Z − c, 0). We have g(Z) = 0 for Z < c and g(Z) = Z − c
for Z > c.

Then
Z ∞ Z ∞
E(g(Z)) = g(k)φ(k)dk = (k − c)φ(k)dk
−∞ c

since the expression inside the integral is 0 for k < c. Next we have
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 114

2 2
Z ∞ Z ∞
ke−k /2 Z ∞
e−c /2 Z −c
(k−c)φ(k)dk = √ dk− cφ(k)dk = √ −c φ(k) = φ(c)−cΦ(c)
c c 2π c 2π −∞

with the first equality following from splitting the integral via subtraction
and expanding out φ for the left integral, the second equality comes from
observing that the antiderivative of kφ(k) is −φ(k) and that we can change
the limits of the right integral due to the symmetry of tail areas of the curve
of φ, and the last equality comes from applying the definitions of the PDF
and CDF of the standard normal distribution.

5.5 Uniform and Universality


5.5.1 problem 13
Recall from problem 12 that the CDF of Y, the length of the longer piece, is
F (k) = 2k − 1.
a. Let us find the CDF of X/Y .

1−Y 1 2 2k
P (X/Y < k) = P ( < k) = P (Y > )=1−( − 1) =
Y k+1 k+1 k+1
To find the PDF, we derive the CDF using the quotient rule:
d 2k
( ) = 2(k + 1)−2
dk k + 1
b. Note that X/Y is minimized at when X is 0 and Y is 1 and maxi-
mized at 1 when X and Y are both 1/2. So, to find E(X/Y ), we must find
0 2k(k + 1) dk. This can be done with integration by parts, factoring out
R1 −2

the constant 2, with u = k, dv = (k + 1)−2 dk, du = 1, and v = −(k + 1)−1 :

Z 1
−k 1 Z 1 1
2k(k+1)−2 dk = 2(( )|0 − −(k+1)−1 dk) = 2(− +ln(2)) = 2ln(2)−1
0 k+1 0 2

c. Following similar steps as in part A, the CDF of Y /X is P ( 1−Y


Y
< k) =
P (Y < k+1 ) = k+1 . Then the PDF using the quotient rule is 2(k + 1)−2 , the
k k−1
CHAPTER 5. CONTINUOUS RANDOM VARIABLES 115

same as the PDF for X/Y!

Then, we need to evaluate the same integral as in part b to find E(Y /X),
but now with the limits set from 1 to infinity, since Y /X is minimized when
X=Y and maximized when Y=1 and X=0:

Z ∞
−k ∞ Z ∞
2k(k+1) dk = 2((
−2
)| − −(k+1)−1 dk) = 2((−1/2)+ln(∞)−ln(2)) = ∞
1 k+1 1 1
Chapter 6

Moments

6.1 Means, Medians, Modes, Moments . . . . . . . . . . . . . . . 116

6.1 Means, Medians, Modes, Moments


6.1.1 problem 1
The median is (a + b)/2, the solution to P (X < k) = k−a
b−a
= 1/2.
The mode is every real number between a and b, since the PDF is a
constant.

116

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy