Week 1
Week 1
Adam Sawicki
with additions and edition by Nikita Kalinin
Winter 2024
The primary reference book is A First Course in Probability by Sheldon Ross, we refer to it as [S].
Probability theory attempts to build a formal mathematical theory modeling random events in the real world.
Historically, it stems from the desire to win in a card/dice game or to optimize and predict some processes
that have random parts (how to compute an optimal pension system based on expected longevity of life, how
to find most products with defects without checking all the production, etc.).
During the first lecture, we discuss the Monty Hall paradox and Bertrand’s box paradox as precious examples.
Introduction
Assume we perform a random experiment. We want to describe it mathematically. First, we can speak about
the results of our experiment. We call them elementary events. The set of all elementary events is called a
Sample space and is denoted by . Elements of , that is, elementary events, are denoted by Ê œ .
Example 1. • We throw a coin. There are two results: heads or tails. Thus = {H, T}, | | = 2.
• We throw a dice. There are six possible results. Thus = {1, 2, 3, 4, 5, 6}, | | = 6.
Often, we are not interested in a concrete result of an experiment, but we want to know if this result
belongs to a subset of . Such subsets are called events; we denote them with capital letters: A, B, C, D,
etc.
• We throw two dice. Let A be an event that the sum of spots is equal to 5. Then = {(i, j)|1 Æ i Æ
6, 1 Æ j Æ 6}, A = {(1,4), (2,3), (3,2), (4,1)}
• We throw a coin until we get heads. Let A be an event where at most three trials were done.
= {(H), (T, H), (T, T, H), (T, T, T, H), . . .}.A = {(H), (T, H), (T, T, H)}.
Remark 1. 1. - the certain event
2. ÿ - the impossible event
3. A fl B - events A and B both occurred
4. A fl B = ÿ - events A and B are mutually exclusive
5. A fi B - either A or B occurred
6. Ā = \ A - A did not happen
7. A \ B = A fl B̄ - A happened and B did not happen
8. A ™ B - event A ”= ÿ leads to event B
1
Assume we have a fixed sample space . We want to distinguish a family F of events, that we want to
consider. A first choice is to take: 2 = all subsets of . This is a good choice when is at most countable
set. When | | > ›0 , 2 is too big, and there are problems with defining probability on 2 . We will discuss
this problem later. That is why we need to choose a smaller family. On the other hand, F should be closed
with respect to taking unions, intersections and complements.
Definition 1. A family F of subsets of is called ‡-algebra iff:
1. ÿ œ F
2. A œ F ∆ Ā œ F
tŒ
3. A1 , A2 , ..., An , ... œ F ∆ n=1 An œ F
A pair ( , F) is called a measurable space.
Remark 2. For any the pair ( , 2 ) is a measurable space. Also for any nonempty subset A µ the
smallest ‡-algebra that contains A is ‡(A) = {ÿ, A, Ā, }.
Fact 1. Let ( , F) be a measurable space. Then
1. A, B œ F ∆ A \ B œ F
uŒ
2. A1 , A2 , ..., An , ... œ F ∆ n=1 An œ F
Proof. 1. A \ B = A fl B̄ = Ā fi B
uŒ uŒ tŒ
2. k=1 Ak = k=1 Ak = k=1 Āk , but A¯k œ F
Definition 2. Borel ‡-algebra of Rd , B(Rd ), is the smallest ‡-algebra that contains all open subsets of Rd .
Elements of B(Rd ) are called Borel subsets.
Example 2. Let d = 1, B(R) is the smallest ‡-algebra that contains all open sets in R.
• Intervals (a, b), where a, b œ R are in B(R),
uŒ ! "
• (a, b] œ B(R) as (a, b] = n=1 a, b + n1 ,
uŒ ! "
• [a, b) œ B(R) as [a, b) = n=1 a ≠ n1 , b ,
uŒ ! "
• [a, b] œ B(R) as [a, b] = n=1 a ≠ n1 , b + n1 ,
uŒ ! "
• a œ B(R) as {a} = n=1 a ≠ n1 , a + n1 ,
tŒ
• (≠Œ, a) œ B(R) (≠Œ, a) = n=1 (≠n, a) ∆ [a, Œ) œ B(R),
tn=1
• (a, Œ) œ B(R) (a, Œ) = n=1 (a, n) ∆ (≠Œ, a] œ B(R),
• Q œ B(R) as a countable union of points,
• NQ = R \ Q œ B(R) as the complement of Q.
Remark 3. Not every subset of Rd is a Borel subset. An example of a non-Borel set is a Vitali1 set V µ [0, 1].
We next define the notion of probability. First, however, to get some intuition, we consider the frequency of
an event.
1 here is the link https://e.math.cornell.edu/people/belk/measuretheory/NonMeasurableSets.pdf See also measurable
2
Assume that we can repeat an experiment n times. Each repetition happens under the same conditions. We
define a relative frequency of an event A œ F in the series of n experiments by:
When n is large, we expect that fln (A) is close to the chance A occurs in a single trial. We easily check that
fln takes values in [0, 1] and
1. fln ( ) = 1
2. if A1 , ..., Aj are pairwise disjoint, then
j
€ j
ÿ
fln ( Ak ) = fln (Ak ).
k=1 k=1
tj qj
This is because # experiments in which k=1 Ak happened is k=1 (# experiments in which Ak happened)
Definition 3. Let ( , F) be a measurable space. A function P : F æ [0, 1] is called a probability iff
1. P( ) = 1
tŒ qŒ
2. For any pairwise disjoint A1 , ..., An , ... œ F we have: P( k=1 Ak ) = k=1 P(Ak )
A triple ( , F, P) is called the probability space.
Theorem 1. Assume ( , F, P) is a probability space, and A, B, A1 , ..., An , ... œ F, then
1. P(ÿ) = 0
2. P(Ā) = 1 ≠ P (A)
3. If A ™ B, then P(B \ A) = P(B) ≠ P(A) and P(B) Ø P(A)
4. P(A fi B) = P(A) + P(B) ≠ P(A fl B)
tŒ qŒ
5. P( k=1 Ak ) Æ k=1 P(Ak )
3
Proof. (by induction): By Theorem 1 (point 4) we know that Theorem 2 is true for n = 2. Assume it is true
for n Ø 2. We need to show that it is true for n + 1.
P(A1 fi A2 fi ... fi An fi An+1 ) = P(A1 fi A2 fi ... fi An ) + P(An+1 ) ≠ P((A1 fl An+1 ) fi ... fi (An fl An+1 )).
ÿn ÿn
P((A1 fl An+1 ) fi ... fi (An fl An+1 )) = P(Ai fl An+1 ) ≠ P(Ai fl Aj fl An+1 )+
i=1 i<j
n
ÿ
+ P(Ai fl Aj fl Ak fl An+1 ) ≠ . . . + (≠1)n+1 P(A1 fl A2 fl ... fl An fl An+1 )
i<j<k
n
ÿ n
ÿ n
ÿ
P(A1 fi A2 fi ... fi An ) = P(Ai ) ≠ P(Ai fl Aj ) + P(Ai fl Aj fl Ak ) ≠ ... + (≠1)n+1 P(A1 fl A2 fl ... fl An )
i=1 i<j i<j<k
Theorem 3 (Continuity of probability). Assume ( , F, P) is a probability space and (An )n=1 is a sequence
Œ
of events
tŒ
1. If An is increasing, i.e. A1 ™ A2 ™ A3 . . ., then P ( n=1 An ) = limnæŒ P (An )
uŒ
2. If An is decreasing, i.e. A1 ´ A2 ´ A3 ´ . . ., then P ( n=1 An ) = limnæŒ P (An )
Proof. 1. B1 = A1 , B2 = A2 \ A1 , B3 = A3 \ A2 . . .
Z
tkn ’s are mutually exclusive ^
Õ
≠B tŒ tŒ qŒ qn
≠ tk=1 Bk = A t n ∆ P ( k=1 Ak ) = P ( k=1 Bk ) = k=1 P (Bk ) = limnæŒ k=1 P (Bk ) =
\
≠ k=1 Bk = k=1 Ak
Œ Œ
tn
limnæŒ P ( k=1 Bk ) = limnæŒ P (An ).
2. An - decreasing ∆ An is increasing
A Œ
B A Œ
B A Œ
B
‹ € € by1 ! "
P Ak =P Āk =1≠P Āk = 1 ≠ lim P Ān = 1 ≠ lim (1 ≠ P (An )) = lim P (An ) .
næŒ næŒ næŒ
k=1 k=1 k=1
Example 3. 1. Classical probability scheme: - a finite set, F = 2 , all elementary events have the
same probability. Then for any A œ F
|A|
P(A) =
| |
We can choose F = 2 and P ({Êi }) = pi . This choice defines the probability space ( , F, P), and for
any A œ F we have
Œ
ÿ
P(A) = 1A (Êk ) pk
k=1
4
where
I
1 ÊœA
1A (Ê) =
0 ʜ
/A
! "
3. Geometric probability: We choose œ B Rd , that is is a Bored subset of Rd and we assume that
0 < | | < Œ, where 2
⁄
| |= 1 . (1)
Rd
Let F = B( ) - the smallest ‡-algebra that contains all open subsets of and for A œ F
|A|
P(A) =
| |
Then ( , F, P) is a probability space. We use this probability space to describe experiments where a
point(s) are randomly chosen from .
Example 4. Alice and Bob have agreed to meet at a bar between noon and 1 pm. The one who arrives first
waits 20 minutes for the other, after which he/she leaves. What is the probability of their meeting if their
arrivals occur at random and independently in the interval between noon and 1 pm?
x - arrival time of A
y - arrival time of B
= {(x, y) | x œ [0, 60], y œ [0, 60]}
E - an event that A and B meet
E = {(x, y) œ ||x ≠ y |Æ 20}
x2 + px + q = 0,
are chosen at random in the interval [0, 1]. What is the probability that the roots of the equation are real
numbers?
= {(p, q) | p, q œ [0, 1]}, | | = 1
) *
E - an event that the roots are real, E = (p, q) | p2 ≠ 4q Ø 0
2 This integral is actually the Lebesgue integral. For a Riemann integrable function, the Lebesgue integral is equal to the
Riemann integral. We will always consider subsets of Rd whose characteristic functions are Riemann integrable.
5
Figure 2: Solutions of Example 5
Conditional Probability
Example 6.