Introdiscreteprobas v1.2
Introdiscreteprobas v1.2
Michaël Baudin
June 2011
Abstract
In this article, we present an introduction to probabilities with Scilab.
Numerical experiments are based on Scilab. The first section presents discrete
random variables and conditionnal probabilities. In the second section, we
present combinations problems, tree diagrams and Bernouilli trials. In the
third section, we present simulation of random processes with Scilab. Coin
simulations are presented, as well as the Galton board.
Contents
1 Discrete random variables 4
1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Distribution function and probability . . . . . . . . . . . . . . . . . . 5
1.3 Properties of discrete probabilities . . . . . . . . . . . . . . . . . . . . 7
1.4 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Life table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7 Bayes’ formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.8 Independent events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.9 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Combinatorics 21
2.1 Tree diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 The gamma function . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Overview of functions in Scilab . . . . . . . . . . . . . . . . . . . . . 28
2.5 The gamma function in Scilab . . . . . . . . . . . . . . . . . . . . . . 28
2.6 The factorial and log-factorial functions . . . . . . . . . . . . . . . . . 30
2.7 Computing factorial and log-factorial with Scilab . . . . . . . . . . . 31
2.8 Stirling’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.9 Computing permutations and log-permutations with Scilab . . . . . . 36
2.10 The birthday problem . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.11 A modified birthday problem . . . . . . . . . . . . . . . . . . . . . . 40
2.12 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1
2.13 Computing combinations and log-combinations with Scilab . . . . . . 44
2.14 The poker game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.15 Bernoulli trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.16 Computing the binomial distribution . . . . . . . . . . . . . . . . . . 51
2.17 The hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . 54
2.18 Computing the hypergeometric distribution with Scilab . . . . . . . . 55
2.19 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.20 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 Acknowledgments 70
5 Answers to exercises 71
5.1 Answers for section 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Answers for section 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Bibliography 89
Index 90
2
Copyright
c 2008-2010 - Consortium Scilab - Digiteo - Michael Baudin
This file must be used under the terms of the Creative Commons Attribution-
ShareAlike 3.0 Unported License:
http://creativecommons.org/licenses/by-sa/3.0
3
1 Discrete random variables
In this section, we present discrete random variables. The first section presents
general definition for sets, including unions and intersections. Then we present
the definition of the discrete distribution function and the probability of an event.
In the third section, we give properties of probabilities, such as, for example, the
probability of the union of two disjoints events. The fourth section is devoted to the
very common discrete uniform distribution function. Then we present the definition
of conditional probability. This leads to Bayes formula which allows to compute
the posterior conditional probability, given a set of hypothesis probabilities. This
section finishes with the definition of independent events.
1.1 Sets
A set is a collection of elements. In this document, we consider sets of elements in
a fixed non empty set Ω, to be called a space.
Assume that A is a set of elements. If x is a point in that set, we note x ∈ A. If
there is no point in A, we write A = ∅. If the number of elements in A is finite, let
us denote by #(A) the number of elements in the set A. If the number of elements
in A is infinite, the cardinality cannot be computed (for example A = N).
The set Ac is the set of all points in Ω which are not in A:
Ac = {x ∈ Ω / x ∈
/ A} . (1)
The set Ac is called the complementary set of A.
The set B is a subset of A if any point in B is also in A and we can write A ⊂ B.
The two sets A and B are equal if A ⊂ B and B ⊂ A. The difference set A − B is
the set of all points of A which are not in B:
A − B = {x ∈ A / x ∈
/ B} . (2)
The intersection A ∩ B of two sets A and B is the set of points common to A and
B:
A ∩ B = {x ∈ A and x ∈ B} . (3)
The union A ∪ B of two sets A and B is the set of points which belong to at least
one of the sets A or B:
A ∪ B = {x ∈ A or x ∈ B} . (4)
The operations that we defined are presented in figure 1. These figures are often
referred to as Venn’s diagrams.
Two sets A and B are disjoints, or mutually exclusive if their intersection is
empty, i.e. A ∩ B = ∅.
In the following, we will use the fact that we can always decompose the union of
two sets as the union of three disjoints subsets. Indeed, assume that A, B ⊂ Ω. We
have
A ∪ B = (A − B) ∪ (A ∩ B) ∪ (B − A), (5)
4
Ω Ω
A B A B
Ω Ω
c
A A B
A-B
A × B = {(x, y) /x ∈ A, y ∈ B} . (6)
Example 1.1 (Die with 6 faces) Consider a 6-faces die. The space for this experi-
ment is
Ω = {1, 2, 3, 4, 5, 6} . (8)
The set of even numbers is A = {2, 4, 6} and the set of odd numbers is B = {1, 3, 5}.
Their intersection is empty, i.e. A ∩ B = ∅ which proves that A and B are disjoints.
Since their union is the whole sample space, i.e. A ∪ B = Ω, these two sets are
mutually complement, i.e. Ac = B and B c = A.
5
Assume that Ω is a set, called the sample space. In this document, we will
consider the case where the sample space is finite, i.e. the number of elements in
Ω is finite. Assume that we are performing random trials, so that each trial is
associated to one outcome x ∈ Ω. Each subset A of the sample space Ω is called an
event. We say that the event A ∩ B occurs if both the events A and B occur. We
say that the event A ∪ B occurs if the event A or the event B occurs.
Example 1.2 (Die with 6 faces) Consider a 6-faces die which is rolled once. The
sample space is:
Ω = {1, 2, 3, 4, 5, 6} . (9)
0 ≤ f (x) ≤ 1, (10)
Example 1.3 (Die with 6 faces) Assume that a 6-faces die is rolled once. The
sample space for this experiment is
Ω = {1, 2, 3, 4, 5, 6} . (12)
Assume that the die is fair. This means that the probability of each of the six
outcomes is the same, i.e. the distribution function is f (x) = 1/6 for x ∈ Ω, which
satisfies the conditions of the definition 1.1.
Example 1.4 (Die with 6 faces) Assume that a 6-faces die is rolled once so that
the sample space for this experiment is Ω = {1, 2, 3, 4, 5, 6}. Assume that the distri-
bution function is f (x) = 1/6 for x ∈ Ω. The event
A = {2, 4, 6} (14)
6
corresponds to the statement that the result of the roll is an even number. From
the definition 1.2, the probability of the event A is
P (A) = f (2) + f (4) + f (6) (15)
1 1 1
= + + (16)
6 6 6
1
= . (17)
2
7
Ω
A B
The figure 2 presents the situation of two disjoints sets A and B. Since the two
sets have no intersection, it suffices to add the probabilities associated with each
event.
Proof. Assume that A and B are two disjoints subsets of Ω. We can decompose
A ∪ B as A ∪ B = (A − B) ∪ (A ∩ B) ∪ (B − A), so that
X
P (A ∪ B) = f (x) (24)
x∈A∪B
X X X
= f (x) + f (x) + f (x). (25)
x∈A−B x∈A∩B x∈B−A
8
Ω
A B
Proof. We have Ω = A ∪ Ac , where the sets A and Ac are disjoints. Therefore, from
proposition 1.4, we have
Proposition 1.7. (Probability of the union) Assume that Ω is a sample space and
that f is a distribution function on Ω. Assume that A and B are two subsets of Ω,
not necessarily disjoints. We have:
The figure 3 presents the situation where two sets A and B have a non empty
intersection. When we add the probabilities of the two events A and B, the inter-
section is added twice. This is why it must be removed by subtraction.
Proof. Assume that A and B are two subsets of Ω. The proof is based on the analysis
of Venn’s diagram presented in figure 3. The idea of the proof is to compute the
9
probability P (A ∪ B), by making disjoints sets on which the equality 23 can be
applied. We can decompose the union of the two set A and B as the union of
disjoints sets:
A ∪ B = (A − B) ∪ (A ∩ B) ∪ (B − A). (32)
P (A ∪ B) = P (A − B) + P (A ∩ B) + P (B − A). (33)
The next part of the proof is based on the computation of P (A − B) and P (B − A).
We can decompose the set A as the union of disjoints sets
A = (A − B) ∪ (A ∩ B), (34)
for all x ∈ Ω.
10
Proposition 1.9. (Probability with uniform distribution) Assume that Ω is a finite,
nonempty, sample space and that f is a uniform distribution function. Then the
probability of the event A ⊂ Ω is
#(A)
P (A) = . (39)
#(Ω)
Proof. The definition 1.2 implies
X
P (A) = f (x) (40)
x∈A
X 1
= (41)
x∈A
#(Ω)
#(A)
= , (42)
#(Ω)
which concludes the proof.
Example 1.8 (Die with 6 faces) Assume that a 6-faces die is rolled once so that
the sample space for this experiment is Ω = {1, 2, 3, 4, 5, 6}. In the previous analysis
of this example, we have assumed that the distribution function is f (x) = 1/6 for
x ∈ Ω. This is consistent with definition 1.8, since #(Ω) = 6. Such a die is a
fair die, meaning that all faces have the same probability. The event A = {2, 4, 6}
corresponds to the statement that the result of the roll is an even number. The
number of outcomes in this event is #(A) = 3. From proposition 1.9, the probability
of this event is P (A) = 21 .
11
Ω
Proof. We must prove that the function f (x|A) is a distribution function. Let us
prove that the function f (x|A) satisfies the equality
X
f (x|A) = 1. (44)
x∈Ω
Indeed, we have
X X X
f (x|A) = f (x|A) + f (x|A) (45)
x∈Ω x∈A x∈A
/
X f (x)
= P (46)
x∈A x∈A f (x)
= 1, (47)
(48)
which concludes the proof.
This leads us to the following definition of the conditional probability of an event
A given an event B.
Proposition 1.11. (Conditional probability) Assume that Ω is a finite sample space
and A and B are two subsets of Ω. Assume that P (B) > 0. The conditional
probability of the event A given the event B is
P (A ∩ B)
P (A|B) = . (49)
P (B)
The figure 5 presents the situation where we consider the event A|B. The prob-
ability P (A) is with respect to Ω while the probability P (A|B) is with respect to
B.
Proof. Assume that A and B are subsets of the sample space Ω. The conditional
distribution function f (x|B) can be used to compute the probability of the event A
given the event B. Indeed, we have
X
P (A|B) = f (x|B) (50)
x∈A
X
= f (x|B) (51)
x∈A∩B
12
Ω
A B
A⋂B
Figure 5: The conditionnal probability P (A|B) measures the probability of the set
A ∩ B with respect to the set B.
since f (x|B) = 0 if x ∈
/ B. Hence,
X f (x)
P (A|B) = P (52)
x∈A∩B x∈B f (x)
P
x∈A∩B f (x)
= P (53)
x∈B f (x)
P (A ∩ B)
= . (54)
P (B)
We notice that
#(B) #(A ∩ B) #(A ∩ B)
= , (57)
#(Ω) #(B) #(Ω)
for all A, B ⊂ Ω. The previous equation could have been directly found based on
the equation 49.
13
Age Group Male Female
<1 100000 100000
1-4 99276 99391
5-9 99156 99292
10-14 99085 99232
15-19 98989 99164
20-24 98573 98991
25-29 97887 98758
30-34 97223 98484
35-39 96526 98133
40-44 95665 97621
45-49 94396 96823
50-54 92487 95603
55-59 89643 93850
60-64 85726 91384
65-69 80364 87726
70-74 72889 82275
75-79 62860 74398
80-84 49846 63218
85-89 34096 48086
90-94 18315 30289
95-99 7198 14523
100+ 1940 4804
14
1.11, we have
P ({a ≥ 60} ∩ {a ≥ 80})
P ({a ≥ 80}|{a ≥ 60}) = (59)
P ({a ≥ 60})
P ({a ≥ 80})
= (60)
P ({a ≥ 60})
0.63218
= (61)
0.91384
= 0.6918, (62)
with 4 significant digits. In other words, a women who is already 60, has 69.18 %
of chance to live to 80.
It is easy to gather the data into Scilab variables, as in the following Scilab script.
The ages variable contains the age classes. It is made of 22 entries, where the class
#k is from age ages(k-1)+1 to ages(k). The number of male survivors in the class
#k is males(k), while the number of female survivors is females(k).
ages = [ 0 ; 4 ; 9 ; 1 4 ; 1 9 ; 2 4 ; 2 9 ; 3 4 ; 3 9 ; 4 4 ; 4 9 ; 5 4 ; . .
59;64;69;74;79;84;89;94;99;100];
males = [ 1 0 0 0 0 0 ; 9 9 2 7 6 ; 9 9 1 5 6 ; 9 9 0 8 5 ; 9 8 9 8 9 ; 9 8 5 7 3 ; . .
97887;97223;96526;95665;94396;92487;89643;85726;
80364;72889;62860;49846;34096;18315;7198;1940];
females = [ 1 0 0 0 0 0 ; 9 9 3 9 1 ; 9 9 2 9 2 ; 9 9 2 3 2 ; 9 9 1 6 4 ; 9 8 9 9 1 ; 9 8 7 5 8 ; . .
98484;98133;97621;96823;95603;93850;91384;
87726;82275;74398;63218;48086;30289;14523;4804];
The following lifeprint function prints the data and displays a table similar to
the figure 6.
function lifeprint ( ages , males , females )
nc = size ( ages , " * " );
mprintf ( " <1 %6d %6d \ n " , males (1) , females (1));
for k = 2: nc
amin = ages (k -1) + 1;
amax = ages ( k );
mprintf ( " %3d - %3d %6d %6d \ n " ,..
amin , amax , males ( k ) , females ( k ));
end
endfunction
We are now interested to compute the required probabilities with Scilab. The
following lifeproba function returns the probability that a person can live to age
a, given the datas in the tables ages and survivors. In practice, the survivors
variable will be either equal to males or to females. The algorithm first searches
in the ages table the class k which contains the age a. We compute the index k
such that a is contained in the age class from ages(k-1)+1 to ages(k). Then the
probability of living to age a is computed by using the number of survivors associated
to the class k.
function p = lifeproba (a , ages , survivors )
nc = size ( ages , " * " );
for k = 2: nc
if ( a >= ages (k -1) + 1 & a <= ages ( k ) ) then
break
15
end
end
if ( k == nc ) then
error ( " Age not found in table " )
end
p = survivors ( k )/ survivors (1)
endfunction
Although the previous algorithm is rather naive (it does not make use of vectorized
statements), this is sufficient for small life tables such as in our case.
The following session shows how the lifeproba function computes the proba-
bility of living to age 60.
--> pa = lifeproba (60 , ages , females )
pa =
0.91384
The following lifecondproba returns the probability that a person with age a
can live to age b, given the datas in the tables ages and survivors. We assume
that a < b. The algorithm is a straightforward application of the proposition 1.11.
function p = lifecondproba (a , b , ages , survivors )
if ( a >= b ) then
p = 1
return
end
pa = lifeproba (a , ages , survivors )
pb = lifeproba (b , ages , survivors )
p = pb / pa
endfunction
In the following session, we compute the probability that a woman lives to age 80,
given that she is 60.
--> pab = lifecondproba (60 , 80 , ages , females )
pab =
0.6917841
It is now easy to compute the probability that a female live to various ages, given
that she is 40. This is done in the following script, which produces the figure 7.
bages = floor ( linspace (41 ,99 ,20));
for k = 1 : 20
pab ( k ) = lifecondproba (40 , bages ( k ) , ages , females );
end
plot ( bages , pab , " bo - " );
xtitle ( " Probability of living to age B , for a women of age 40. " ,..
" B " ," Probability " );
Proposition 1.12. ( Bayes’ formula) Assume that the sample space Ω can be de-
composed in a sequence of events, which are called hypotheses. Let us denote by
16
Probability of living to age B, for a women of age 40.
1.0
0.9
0.8
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
40 50 60 70 80 90 100
Age B
Figure 7: Probability that a female live to various ages, given that she is 40.
Ω = H1 ∪ H2 ∪ . . . ∪ Hm , (63)
P (E|Hi )P (Hi )
P (Hi |E) = P . (64)
i=1,m P (E|Hi )P (Hi )
In order to use Bayes’ formula, assume that the probabilities of each hypothesis
is known, that is, assume that the probabilities P (Hi ) are given for 1 ≤ i ≤ m.
Assume that the probabilities P (E|Hi ) are known for 1 ≤ i ≤ m. Therefore, we
are able to compute P (Hi |E), that is, if the event E has occurred, we are able to
compute the probability of each hypothesis Hi . In practice, we consider the most
likely hypothesis Hi for which the probability P (Hi |E) is maximum over i = 1, m.
Proof. By definition of the conditional probability, we have
P (Hi ∩ E)
P (Hi |E) = . (65)
P (E)
17
for 1 ≤ i ≤ m. On the other side, we must compute de denominator P (E). This
can be done by using the fact that the sequence of hypotheses Hi is a decomposition
of the whole sample space Ω. Hence, we can decompose the event E as
E = (E ∩ H1 ) ∪ (E ∩ H2 ) ∩ . . . ∩ (E ∩ Hm ). (67)
We can now plug 66 and 69 into 65, which concludes the proof.
The following exercise is presented in [17], in chapter 1, ”Elements of probability”.
Example 1.9 Consider the situation where an insurance company tries to compute
the probability of having an accident. This company makes the assumption that
people are accident prone or are not accident prone and consider the probability of
having an accident during the 1 year period following the insurance policy purchase.
They assume that an accident-prone person will have an accident with probability
0.4. For a non-accident-prone person, the probability of having an accident is 0.2.
Assume that 30 % percent of the population is accident prone. Assume that a person
has an accident. What is the probability that he is accident prone ?
Let us denote by E the event that the person will have an accident within the
year of purchase and denote by H1 the event that the person is accident prone.
By hypothesis, the sample space Ω of all the persons can be decomposed with the
pairwise disjoints sets H1 and H2 = H1c , which are, in this particular situation the
hypotheses. By hypothesis, we know that P (E|H1 ) = 0.4 and P (E|H2c ) = 0.2. We
also know that P (H1 ) = 0.3, which implies that P (H2 ) = 1 − P (H1 ) = 0.7. We
want to compute P (H1 |E). By Bayes’ formula 64, we have
P (E|H1 )P (H1 )
P (H1 |E) = (70)
P (E|H1 )P (H1 ) + P (E|H2 )P (H2 )
0.4 × 0.3
= (71)
0.4 × 0.3 + 0.2 × 0.7
≈ 0.4615, (72)
18
1.8 Independent events
In this section, we define independent events and give exemples of such events.
Definition 1.13. (Independent events) Assume that Ω is a finite sample space. The
event two events A, B ⊂ Ω are independent if P (A) > 0 and P (B) > 0 and
In the previous definition, the roles of A and B can be reversed, which leads to
the following result. If two events A, B ⊂ Ω are independent, then
Proof. Assume that the two events A, B ⊂ Ω are satisfying P (A) > 0 and P (B) > 0.
In the first part of this proof, let us prove that 75 is satisfied. By definition of
the conditional probability 1.11, we have P (A ∩ B) = P (A|B)P (B). Since A and B
are independent, we have P (A|B) = P (A), which leads to P (A ∩ B) = P (A)P (B)
and concludes the first part.
In the second part, let us assume that 75 is satisfied, and let us prove that the
events A and B are independent. By definition of the conditional probability 1.11,
we have
P (A ∩ B)
P (A|B) = . (76)
P (B)
P (A)P (B)
P (A|B) = = P (A), (77)
P (B)
19
Example 1.10 (Die with 6 faces) Assume that a 6-faces of a fair die is rolled twice
(instad of once) and consider the problem of choosing the correct sample space. The
correct sample space takes into account for the order of the rolls and is
Ω = {(i, j) / i, j = 1, 6} . (78)
The set Ω as 6 × 6 = 36 elements. The wrong (for our purpose), unordered, sample
space is
Ω2 = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 3), (3, 4), (3, 5), (3, 6), (4, 4), (4, 5), (4, 6)
(5, 5), (5, 6), (6, 6)}. (79)
The set Ω2 as 21 elements. The fact that the set Ω2 is the wrong sample space for
this experiment is linked to the fact that the two rolls are independent. Therefore,
the equality 75 can be applied, stating that the probabilities of two independent
events is the product of the probabilities. Therefore, the probability of any event
(i, j) is
1
P ((i, j)) = , (81)
36
for all i, j = 1, 6. The only sample space which can leave this probability consistent
with the uniform distribution equality of a finite space 38, is Ω. If the sample space
Ω2 was chosen, the equality 75 would be violated so that the two events would
become dependent.
20
1.10 Exercises
Exercise 1.1 (Head and tail ) Assume that we have a coin which is tossed twice. We record the
outcomes so that the order matters, i.e. the sample space is ΩHH, HT, T H, T T . Assume that the
distribution function is uniform, i.e. each of the head and the tail have an equal probability.
1. What is the probability for the event A = HH, HT, T H ?
2. What is the probability for the event A = HH, HT ?
Exercise 1.2 (Two dice) Assume that we are rolling a pair of dice. Assume that each face has
an equal probability.
1. What is the probability of getting a sum of 7 ?
2. What is the probability of getting a sum of 11 ?
3. What is the probability of getting a double one, i.e. snakeeyes ?
Exercise 1.3 (Méré’s experiments) This exercise is presented in [7], in the historical remarks
of section 1.2, ”Discrete Probability Distributions”. Famous letters between Pascal and Fermat
were investigated by a request for help from a French nobleman and gambler, Chevalier de Méré.
It is said that de Méré had been betting that, in four rolls of a die, at least one six would turn up
(event A). He was winning consistently and, to get more people to play, he changed the game to
get that, in 24 rolls of two dice, a pair of 6 would turn up (event B). It is claimed that de Méré
lost with 24 and felt that 25 were necessary to make the game favorable (event C). What is the
probability for the three events A, B and C ? Can you compute with Scilab the probability for
event A and a number of rolls equal to 1, 2, 3 or 4 ? Can you compute with Scilab the probability
for the event B or C for a number of rolls equal to 10, 20, 24, 25, 30 ?
Exercise 1.4 (Independent events) Assume that Ω is a finite sample space. Assume that the
two events A, B ⊂ Ω are independent. Prove that P (B|A) = P (B).
Exercise 1.5 (Boole’s inequality ) Assume that Ω is a finite sample space. Let (Ei )i=1,n be a
sequence of finite sets included in Ω and n > 0. Prove Boole’s inequality:
[ X
P Ei ≤ P (Ei ). (82)
i=1,n i=1,n
2 Combinatorics
In this section, we present several tools which allow to compute probabilities of
discrete events. One powerful analysis tool is the tree diagram, which is presented
in the first part of this section. Then, we detail permutations and combinations
numbers, which allow to solve many probability problems.
21
Figure 8: Tree diagram - The task is made with 3 steps. There are 2 choices for the
step #1, 3 choices for step #2 and 2 choices for step #3. The total number of ways
to perform the full sequence of steps is n = 2 · 3 · 2 = 12.
2.2 Permutations
In this section, we present permutations, which are ordered subsets of a given set.
Without loss of generality, we can assume that the finite set A can be ordered
and numbered from 1 to n = #(A), so that we can write A = {1, 2, . . . , n}. To define
a particular permutation, one can write a matrix with 2 rows and n columns which
22
2 3
1
3 2
1 3
2
3 2
1 2
3
2 1
Figure 9: Tree diagram for the computation of permutations of the set A = {1, 2, 3}.
• a1 → a2 ,
• a2 → a1 ,
• a3 → a4 ,
• a3 → a4 .
Since the first row is always the same, there is no additional information provided
by this row. This is why the permutation can be written by uniquely defining the
second row. This way, the previous mapping can be written as
σ= 2 1 4 3 . (84)
We can try to count the number of possible permutations of a given set A with
n elements.
The tree diagram associated with the computation of the number of permutations
for n = 3 is presented in figure 9. In the first step, we decide which number to place
at index 1. For this index, we have 3 possibilities, that is, the numbers 1, 2 and
3. In the second step, we decide which number to place at index 2. At this index,
we have 2 possibilities left, where the exact numbers depend on the branch. In the
third step, we decide which number to place at index 3. At this last index, we only
have one number left.
This leads to the following proposition, which defines the factorial function.
n! = n · (n − 1) . . . 2 · 1. (85)
23
Proof. #1 Let us pick an element to place at index 1. There are n elements in
the set, leading to n possible choices. For the element at index 2, there are n − 1
elements left in the set. For the element at index n, there is only 1 element left.
The total number of permutations is therefore n · (n − 1) . . . 2 · 1, which concludes
the proof.
Proof. #2 The element at index 1 can be located at indexes 1, 2, . . . , n so that there
are n ways to set the element #1. Once the element at index 1 is placed, there
are n − 1 ways to set the element at index 2. The last element at index n can
only be set at the remaining index. The total number of permutations is therefore
n · (n − 1) . . . 2 · 1, which concludes the proof.
When n = 0, it seems that we cannot define the number 0!. For reasons which
will be clearer when we will introduce the gamma function, it is convenient to define
0! as equal to one:
0! = 1. (86)
Example 2.1 Let us compute the number of permutations of the set A = {1, 2, 3}.
By the equation 85, we have 6! = 3 · 2 · 1 = 6 permutations of the set A. These
permutations are:
(1 2 3)
(1 3 2)
(2 1 3)
(87)
(2 3 1)
(3 1 2)
(3 2 1)
The previous permutations can also be directly read from the tree diagram 9, from
the root of the tree to each of the 6 leafs.
In some situations, all the elements in the set A are not involved in the permu-
tation. Assume that j is a positive integer, so that 0 ≤ j ≤ n. A j-permutation is
a permutation of a subset of j elements in A. The general counting method used
for the previous proposition allows to count the total number of j-permutations of
a given set A.
Proof. The element at index 1 can be located at indexes 1, 2, . . . , n so that there are
n ways to set the element at index 1. Once element at index 1 is placed, there are
n − 1 ways to set the element at index 2. The element at index j can only be set at
the remaining n − j + 1 indexes. The total number of j-permutations is therefore
n · (n − 1) . . . (n − j + 1), which concludes the proof.
24
Notice that the number of j-permutations of n elements and the factorial of n
are equal when j = n. Indeed, we have
(n)0 = 1. (90)
Example 2.2 Let us compute the number of 2-permutations of the set A = {1, 2, 3, 4}.
By the equation 88, we have (4)2 = 4 · 3 = 12 permutations of the set A. These
permutations are:
(1 2) (2 1) (3 1) (4 1)
(1 3) (2 3) (3 2) (4 2) (91)
(1 4) (2 4) (3 4) (4 3)
The previous definition is not the usual form of the gamma function, but the
following proposition allows to get it.
Proposition 2.5. ( Gamma function) Let x be a real with x > 0. The gamma
function satisfies
Z ∞
Γ(x) = tx−1 e−t dt. (93)
0
Proof. Let us consider the change of variable u = − log(t). Therefore, t = e−u , which
leads, by differenciation, to dt = −e−u du. We get (− log(t))x−1 dt = −ux−1 e−u du.
Moreover, if t = 0, then u = ∞ and if t = 1, then u = 0. This leads to
Z 0
Γ(x) = − ux−1 e−u du. (94)
∞
25
For any continuously differentiable function f and any real numbers a and b.
Z b Z a
f (x)dx = − f (x)dx. (95)
a b
We reverse the bounds of the integral in the equality 94 and get the result.
The gamma function satisfies
Z ∞
∞
e−t dt = −e−t 0 = (0 + e0 ) = 1.
Γ(1) = (96)
0
The following proposition makes the link between the gamma and the factorial
functions.
Proposition 2.6. ( Gamma and factorial) Let x be a real with x > 0. The gamma
function satisfies
Γ(x + 1) = xΓ(x) (97)
and
Γ(n + 1) = n! (98)
for any integer n ≥ 0.
Proof. Let us prove the equality 97. We want to compute
Z ∞
Γ(x + 1) = tx e−t dt. (99)
0
The proof is based on the integration by parts formula. For any continuously differ-
entiable functions f and g and any real numbers a and b, we have
Z b Z b
0 b
f (t)g (t)dt = [f (t)g(t)]a − f (t)0 g(t)dt. (100)
a a
Let us define f (t) = tx and g 0 (t) = e−t . We have f 0 (t) = xtx−1 and g(t) = −e−t . By
the integration by parts formula 100, the equation 99 becomes
Z ∞
x −t ∞
Γ(x + 1) = −t e 0 + xtx−1 e−t dt. (101)
0
x −t
Let us introduce the function h(t) = −t e . We have h(0) = 0 and limt→∞ h(t) = 0,
for any x > 0. Hence,
Z ∞
Γ(x + 1) = xtx−1 e−t dt, (102)
0
26
The gamma function is not the only function f which satisfies f (n) = n!. But the
Bohr-Mollerup theorem prooves that the gamma function is the unique function f
which satisfies the equalities f (1) = 1 and f (x+1) = xf (x), and such that log(f (x))
is convex [2].
It is possible to extend this function to negative values by inverting the equation
97, which implies
Γ(x + 1)
Γ(x) = , (103)
x
for x ∈] − 1, 0[. This allows to compute, for example Γ(−1/2) = −2Γ(1/2). By
induction, we can also compute the value of the gamma function for x ∈] − 2, −1[.
Indeed, the equation 103 implies
Γ(x + 2)
Γ(x + 1) = , (104)
x+1
which leads to
Γ(x + 2)
Γ(x) = . (105)
x(x + 1)
By induction of the intervals ] − n − 1, −n[ with n a positive integer, this formula
allows to compute values of the gamma function for all x ≤ 0, except the negative
integers 0, −1, −2, . . .. This leads to the following proposition.
Proposition 2.7. ( Gamma function for negative arguments) For any non zero
integer n and any real x such that x + n > 0,
Γ(x + n)
Γ(x) = . (106)
x(x + 1) . . . (x + n − 1)
Proof. The proof is by induction on n. The equation 103 prooves that the equality
is true for n = 1. Assume that the equality 106 is true for n et let us proove that it
also holds for n + 1. By the equation 103 applied to x + n, we have
Γ(x + n + 1)
Γ(x + n) = . (107)
x+n
Therefore, we have
Γ(x + n + 1)
Γ(x) = (108)
x(x + 1) . . . (x + n − 1)(x + n)
which proves that the statement holds for n + 1 and concludes the proof.
The gamma function is singular for negative integers values of its argument, as
stated in the following proposition.
Proposition 2.8. ( Gamma function for integer negative arguments) For any non
negative integer n,
(−1)n
Γ(−n + h) ∼ , (109)
n!h
when h is small.
27
factorial returns n!
gamma returns Γ(x)
gammaln returns log(Γ(x))
Γ(h)
Γ(−n + h) = . (110)
(h − n)(h − n + 1)) . . . (h + 1)
Γ(h+1)
But Γ(h) = h
, which leads to
Γ(h + 1)
Γ(−n + h) = . (111)
(h − n)(h − n + 1)) . . . (h + 1)h
When h is small, the expression Γ(h + 1) converges to Γ(1) = 1. On the other hand,
the expression (h − n)(h − n + 1)) . . . (h + 1)h converges to (−n)(−n + 1) . . . (1)h,
which leads to the the term (−1)n and concludes the proof.
We have reviewed the main properties of the gamma function. In practical
situations, we use the gamma function in order to compute the factorial number, as
we are going to see in the next sections. The main advantage of the gamma function
over the factorial is that it avoids to form the product n! = n · (n − 1) . . . 1, which
allows to save a significant amount of CPU time and computer memory.
28
6
-2
-4
-6
-4 -3 -2 -1 0 1 2 3 4
h = gcf ();
h . children . data_bounds = [
- 4. -6
4. 6
];
Notice that the two floating point signed zeros +0 and -0 are associated with the
function values −∞ and +∞. This is consistent with the value of the limit of the
function from either sides of the singular point. This contrasts with the value of the
gamma function on negative integer points, where the function value is %nan. This
is consistent with the fact that, on this singular points, the function is equal to −∞
on one side and +∞ on the other side. Therefore, since the argument x has one
single floating point representation when it is a negative nonzero integer, the only
solution consistent with the IEEE754 standard is to set the result to %nan.
Notice that we used 1001 points to plot the gamma function. This allows to get
29
4.0e+006
3.5e+006
3.0e+006
2.5e+006
2.0e+006
1.5e+006
1.0e+006
5.0e+005
0.0e+000
1 2 3 4 5 6 7 8 9 10
points exactly located at the singular points. These values are ignored by the plot
function and makes a nice plot. Indeed, if 1000 points are used instead, vertical lines
corresponding to the y-value immediately at the left and the right of the singularity
would be displayed.
The result is presented in figure 12. We see that the growth rate of the factorial
function is large.
The largest values of n so that n! is representable as a double precision float-
ing point number is n = 170. In the following session, we check that 171! is not
representable as a Scilab double.
--> factorial (170)
ans =
7.257+306
--> factorial (171)
ans =
Inf
30
Logarithm of n!
800
700
600
log(n!) 500
400
300
200
100
0
0 20 40 60 80 100 120 140 160 180
n
Notice that we used the base-e logarithm function log, that is, the reciprocal of the
exponential function.
The factorial number n! grows exponentially, but its logarithm grows much more
slowly. In the figure 13, we plot the logarithm of n! in the interval [0, 170]. We see
that the y coordinate varies only from 0 up to 800. Hence, there are a large number
of integers n for which n! may be not representable as a double but fln (n) is still
representable as a double.
31
The implementation of the factorial function in Scilab allows to take both
matrix and hypermatrices input arguments. In order to be fast, it uses vectorization.
The following factorialScilab function represents the computationnal core of the
actual implementation of the factorial function in Scilab.
function f = factorialScilab ( n )
n ( n ==0)=1
t = cumprod (1: max ( n ))
v = t ( n (:))
f = matrix (v , size ( n ))
endfunction
The statement n(n==0)=1 allows to set all zeros of the matrix n to one, so that
the next statements do not have to manage the special case 0! = 1. Then, we use
the cumprod function in order to compute a column vector containing cumulated
products, up to the maximum entry of n. The use of cumprod allows to get all the
results in one call, but also produces unnecessary values of the factorial function.
In order to get just what is needed, the statement v = t(n(:)) allows to extract
the required values. Finally, the statement f = matrix(v,size(n)) reshapes the
matrix of values so that the shape of the output argument is the same as the shape
of the input argument.
The following function allows to compute n! based on the prod function, which
computes the product of its input argument.
function f = factorial_naive ( n )
f = prod (1: n )
endfunction
The factorial_naive function has two drawbacks. The first one is that it can-
not manage matrix input arguments. Furthermore, it requires more memory than
necessary.
In practice, the factorial function can be computed based on the gamma function.
The following implementation of the factorial function is based on the equality 98.
function f = myfactorial ( n )
if ( ( or ( n (:) < 0) ) | ( n (:) <> round ( n (:) ) ) ) then
error ( " myfactorial : n must all be nonnegative integers " );
end
f = gamma ( n + 1 )
endfunction
The myfactorial function also checks that the input argument n is positive. It also
checks that n is an integer by using the condition ( n(:) <> round (n(:) ).
Indeed, if the value of n is different from the value of round(n), this means that the
input argument n is not an integer.
The main drawback of the factorialScilab function is that it uses more mem-
ory than necessary. It may fail to produce a result when it is given a large input
argument. In the following session, we use the factorial function with a very large
input integer. In this particular case, it is obvious that the correct result is Inf.
Anyway, this should have been the result of the function, which should not have
generated an error. On the other side, the myfactorial function works perfectly.
--> factorial (1. e10 )
! - - error 17
32
stack size exceeded !
--> myfactorial (1. e10 )
ans =
Inf
We now consider the computation of the log-factorial function fln . We can use
the gammaln function which directly provides the correct result.
function flog = factoriallog ( n )
flog = gammaln ( n +1)
endfunction
The advantage of this method is that matrix input arguments can be manage by the
factoriallog function.
There is another possible implementation for the log-factorial function, based on
the logarithm function. We have
The previous equation can be simplified since log(1) = 0. This leads to the following
implementation.
function flog = f ac t o r ia l l o g _l e s s n ai v e ( n )
flog = sum ( log (2: n ))
endfunction
The previous function has several drawbacks. The first problem is that this function
may require a large array if n is large. Hence, using this function is limited to
relatively small values of n. Moreover, it requires to evaluate the log function at
least n − 1 times, which leads to a performance issue. Finally, it is not possible to
directly let the variable n be a matrix of doubles. All these issues make the function
factoriallog_lessnaive a less than perfect implementation of the log-factorial
function.
33
We shall not give the proof of this formula here (see [2, 19] for a complete
derivation).
The previous proposition allows to directly derive an asymptotic behavior for
the factorial function.
Proposition 2.11. ( Stirling’s formula) For any positive integer n → ∞,
n n √
n! ∼ 2πn. (117)
e
Proof. By the proposition 2.10, we have
1√
n! = Γ(n + 1) ∼ e−n−1 (n + 1)n+1− 2 2π (118)
1√
∼ e−n−1 (n + 1)n+ 2 2π (119)
√
∼ e−n−1 (n + 1)n 2πn. (120)
We can simplify the previous expression for large values of n. Indeed, we have
e−n−1 ∼ e−n and (n + 1)n ∼ nn , which concludes the proof.
In the following script, we compare Stirling’s formula and the factorial function
for various values of n. Moreover, we compute the number of significant digits
n!−Sn
produced by Stirling’s formula, by using the equation d = − log10 n! , where Sn
is Stirling’s approximation given by the equation 117.
-->n = (1:20:185) ’;
-->f = factorial ( n );
-->s = sqrt (2* %pi .* n ).*( n ./ %e ).^ n ;
-->d = - log10 ( abs (f - s )./ f );
- - >[ n f s d ]
ans =
1. 1. 0.9221370 1.1086689
21. 5.109 D +19 5.089 D +19 2.4022947
41. 3.345 D +49 3.338 D +49 2.692415
61. 5.076 D +83 5.069 D +83 2.8648116
81. 5.79 D +120 5.79 D +120 2.9878919
101. 9.42 D +159 9.41 D +159 3.0836832
121. 8.09 D +200 8.08 D +200 3.1621171
141. 1.89 D +243 1.89 D +243 3.2285294
161. 7.59 D +286 7.58 D +286 3.2861201
181. Inf Inf Nan
We see that the number of significant digits increases with n, reaching more than 3
when n is close to its upper limit.
It is not straightforward to proove Stirling’s formula. Still, we can easily proove
the following proposition, which focuses on the log-factorial function.
Proposition 2.12. ( Log-factorial for large n) For any positive integer n → ∞,
34
The previous equation can be simplified, since log(1) = 0. The log function is a
nondecreasing function of x for x > 0. Therefore,
Z k Z k+1
log(x)dx < log(k) < log(x)dx, (124)
k−1 k
We must now compute the two integrals which appear in the previous inequali-
ties. We recall that the anti-derivative of the log(x) function is x log(x) − x, since
(x log(x) − x)0 = log(x) + x · x1 − 1 = log(x). Furthermore, we recall that the limit
of the function x log(x) is zero when x converges to zero. Therefore,
Z n
log(x)dx = [x log(x) − x]n0 (126)
0
= n log(n) − n (127)
and
Z n+1
log(x)dx = [x log(x) − x]n+1
1 (128)
1
= (n + 1) log(n + 1) − (n + 1) + 1 (129)
= (n + 1) log(n + 1) − n. (130)
We plug the two previous results into the equation 125 and get
n log(n) − n < log(n!) < (n + 1) log(n + 1) − n, (131)
which concludes the proof.
Stirling’s formula 116 is consistent with the asymptotic equation 117, which
allows to directly derive the equation
n n √
log(n!) = log 2πn (132)
e
n n √
= log + log( 2πn) (133)
en √ √
= n log + log( 2π) + log( n) (134)
e
1 1
= n log(n) − n log(e) + log(2π) + log(n) (135)
2 2
1 1
= n log(n) − n + log(2π) + log(n), (136)
2 2
since log(e) = 1. This immediately implies the equation 121.
In the following session, we compare the asymptotic equation 121 with the gam-
maln function. The session displays the column vectors [n f s d], where f is
computed from the gammaln function, s is computed from the asymptotic equation
121 and d is the number of significant digits in the asymptotic equation. We see that,
for n greater than 1010 , we have more than 10 significant digits in the asymptotic
equation.
35
-->n = logspace (1 ,10 ,10) ’;
-->f = gammaln ( n +1);
-->s = n .* log ( n ) - n ;
-->d = - log10 ( abs (f - s )./ f );
- - >[ n f s d ]
ans =
10. 15.104413 13.025851 0.8613409
100. 363.73938 360.51702 2.0526167
1000. 5912.1282 5907.7553 3.1309743
10000. 82108.928 82103.404 4.1721275
100000. 1051299.2 1051292.5 5.1972489
1000000. 12815518. 12815511. 6.2141578
10000000. 1.512 D +08 1.512 D +08 7.2263182
1.000 D +08 1.742 D +09 1.742 D +09 8.2354866
1.000 D +09 1.972 D +10 1.972 D +10 9.2426477
1.000 D +10 2.203 D +11 2.203 D +11 10.248397
In the following session, we see that the previous function works for small values of
n and j.
-->n = [5 5 5 5 5 5] ’;
-->j = [0 1 2 3 4 5] ’;
- - >[ n j p er m u t at i o n s _v e r y n ai v e (n , j )]
ans =
5. 0. 1.
5. 1. 5.
5. 2. 20.
5. 3. 60.
5. 4. 120.
5. 5. 120.
36
This is caused by an overflow during the computation of the factorial function.
There is unfortunately no way to fix this problem, since the result is, indeed, not
representable as a double precision floating point number.
On the other hand, the permutations_verynaive function performs poorly in
cases where n is large, whatever the value of j, as presented in the following session.
--> p er m u t at i o n s _v e r y n ai v e ( 171 , 0 )
ans =
Nan
In the following session, we check the values of the function (n)j for n = 5 and
j = 1, 2, . . . , 5.
-->n = 5;
--> for j = 0 : 5
--> p = permutations_naive ( n , j );
--> disp ([ n j p ]);
--> end
5. 0. 1.
5. 1. 5.
5. 2. 20.
5. 3. 60.
5. 4. 120.
5. 5. 120.
37
logarithm first. By the equation 139, we have
n!
log((n)j ) = log (140)
(n − j)!
= log(n!) − log((n − j)!) (141)
= log(Γ(n + 1)) − log(Γ(n − j + 1)). (142)
38
Assume that n > 0 persons are gathered in a room. Can we compute the
probability that two persons in the room have the same birthday ?
To perform this computation, we assume that the year is made of 365 days. We
assume that each day has the same probability of a birthday.
Let us denote by Ω ⊂ Nn the sample space, which is the space of all possible
combinations of birthdays for n persons. It is defined by
The searched event E is the event that two persons have the same birthday. To
compute this event more easily, we can compute the probability of the comple-
mentary event E c , i.e., the event that all persons have a distinct birthday. In-
deed, once we have computed P (E c ), we can deduce the searched probability with
P (E) = 1 − P (E c ).
By hypothesis, all days in the year have the same probability so that we can c)
apply proposition 1.9 for uniform discrete distributions. Therefore, P (E c ) = #(E
#(Ω)
.
The birthday of the each person can be chosen from 365 days. Since the birthdays
of several persons are independent events, this leads to
Let us now compute the size of the complementary event E c . The birthday of
the first person can be chosen among 365 possible days. Once chosen, the birthday
for the second person can be chosen among 365-1 = 364 days, so that the birthdays
are different. By repeating this process for the n persons, we get the following size
for the event E c :
Let us define the probability Q(E) as the probability of the complementary event:
Q(E) = P (E c ). (146)
(365)n
Q(E) = (147)
365n
which leads to the required probability
(365)n
P (E) = 1 − Q(E) = 1 − . (148)
365n
The following twobirthday_verynaive function returns the probability that two
persons have the same birth time in a group of n persons, in a period of d days. In
365!
order to compute (365)n , we choose the formula (365)n = (365−n)! .
function p = t wo b ir th d ay _ ve ry n ai ve ( n , d )
p = 1 - factorial ( d )./ factorial ( d - n ) ./ d ^ n
endfunction
39
We are going to use the previous function with d = 365. In the following session,
we see that the twobirthday_verynaive does not allow to compute any result,
whatever the value of n.
-->n = (1:5) ’;
- - >[ n tw o bi rt h da y_ v er y na iv e (n ,365)]
ans =
1. Nan
2. Nan
3. Nan
4. Nan
5. Nan
In the following session, we compute several the probability that two persons have
the same birthday for n going from 1 to 10.
-->n = (1:5) ’;
- - >[ n twobirthday_naive (n ,365)]
ans =
1. 0.
2. 0.0027397260273972490197
3. 0.0082041658847813447863
4. 0.0163559124665503263785
5. 0.0271355736996392593596
We can explore larger values of n and see when the probability p break the p =
0.5 threshold. The following Scilab session shows how to compute the required
probability for n = 20, 21, . . . , 25.
-->n = (20:25) ’;
- - >[ n twobirthday_naive (n ,365)]
ans =
20. 0.4114383835806317835093
21. 0.4436883351652584073221
22. 0.4756953076625709542213
23. 0.5072972343239776638057
24. 0.5383442579144532835755
25. 0.5686997039695230737877
This shows that if more than 23 persons are gathered, there is a favorable probability
that two persons have the same birthday.
40
the probability of having two persons with the same birth day and hour is greater
than 0.5.
We assume that a year is made of 365 days and that a day is made of 24 hours.
This computation should be easy to perform. It suffices to use the twobirth-
day_naive function with d = 365 · 24. In the following session, we explore the
values of P (E) for n = 20, 21, . . . , 25.
-->n = (20:25) ’;
- - >[ n twobirthday_naive (n ,365*24)]
ans =
20. 0.0214717378650828294440
21. 0.0237058206494269452236
22. 0.0260462518925431707473
23. 0.0284922544755111806225
24. 0.0310430168113396964813
25. 0.0336976934779864567560
We see that the probability is much lower than previously. The problem is that,
for larger values of n, the previous function fails, as can be seen in the following
session.
--> twobirthday_naive (100 ,365*24)
ans =
Nan
The same issue occurs for (365 · 24)100 . This leads to the ratio Inf/Inf, which is
equal to the IEEE Nan. In order to solve this issue, we can use logarithms of the
intermediate results. We have
(365)n
log(Q(E)) = log (149)
365n
= log((365)n ) − log(365n ) (150)
= log((365)n ) − n log(365). (151)
This leads to the following twobirthday_lessnaive function, which uses the per-
mutationslog function.
function p = t wo b ir th d ay _ le ss n ai ve ( n , d )
q = exp ( permutationslog ( d , n ) - n * log ( d ))
p = 1 - q
endfunction
We check in the following session that our less naive function allows to compute the
required probability for n = 100.
--> tw ob i rt hd a y_ l es sn a iv e (100 ,365*24)
ans =
0.4329003041011145747063
41
In order to search for the number of persons which makes the probability be greater
that p = 0.5, we perform a while loop. We quit this loop when the probability
breaks the threshold.
n = 1;
while ( %t )
p = tw ob i rt hd a y_ l es sn a iv e ( n , 365*24 );
if ( p > 0.5 ) then
mprintf ( " n = %d , p = %e \ n " ,n , p )
break
end
n = n + 1;
end
We are now interested in computing the probability of getting two persons with
the same birth day and hour in a group of 500 persons. The following session shows
the result of calling the twobirthday_lessnaive with n = 500.
-->p = t wo b ir th d ay _ le ss n ai ve ( 500 , 365*24 )
p =
0.9999995
This probability is very close to 1. In order to get more significant digits, we use
the format function in the following session.
--> format (25)
-->p
p =
0.9999995054082023715480
This allows to get a larger number of significant digits. But the result is only accurate
at most to 17 significant digits. Since 6 of these digits are digits are 9, there are
only 17-6=11 digits available for the required probability p. The reason for this
inaccuracy is that q is very close to zero, which makes p = 1 − q be very close to
1. This implies that, because of our way to compute the probability p, we can at
best expect 11 accurate digits for p. We emphasize that this is independent from
the actual accuracy of the intermediate computations and is only generated by the
way of representing the solution of the problem with double precision floating point
numbers.
One possible solution is to compute q instead of p. Indeed, the q value is close
to zero, where floating point numbers can be represented with limited but sufficient
accuracy. The following twobirthday function allows to compute both p and q.
function [ p , q ] = twobirthday ( n , d )
q = exp ( permutationslog ( d , n ) - n * log ( d ))
p = 1 - q
endfunction
In the following session, we display the result of the computation and use the
mprintf function with the %.17e format in order to display 17 digits after the
decimal point.
42
- - >[p , q ] = twobirthday ( 500 , 365*24 );
--> mprintf ( " p = % .17 e \ n " , p )
p =9 .9 999 95 054 08 202 370 e -001
--> mprintf ( " q = % .17 e \ n " , q )
q =4 .9 459 17 976 44 640 870 e -007
We now have all the available digits for q and we know that the probability of having
two persons with the same birth day and hour in a group of n = 500 persons is close
p = 1 − 4.94591797644640870 · 10−7 .
But this does not imply that all these digits are all exact. Indeed, floating
point evaluations of elementary operators like +, -, *, / and elementary and special
functions like exp, log and gamma are associated with rounding errors and various
approximations.
We have computed the probability for n = 500 with the symbolic computation
system Wolfram Alpha [16] and used the expresssion:
(365* 24)!/(3 65*24 - 500)!/( (365*24) ^500)
We found that the exact probability is in this case
z =4 .9 459 17 976 40 207 720 e -7
rounded to 17 digits. In the following session, we compute the relative error between
the computed and the exact probabilities.
-->z =4. 94 591 79 764 020 77 20 e -7
z =
0.0000005
--> abs (q - z )/ z
ans =
8.963 D -12
-->- log10 ( abs (q - z )/ z )
ans =
11.047534
We see that there are approximately 11 digits accurate in this case which is suffi-
ciently accurate for our purpose.
2.12 Combinations
In this section, we present combinations, which are unordered subsets of a given set.
The number of distinct subsets with j elements which can be chosen
from a set
n
A with n elements is the binomial coefficient and is denoted by . The following
j
proposition gives an explicit formula for the binomial number.
Proposition 2.13. ( Binomial) The number of distinct subsets with j elements
which can be chosen from a set A with n elements is the binomial coefficient and is
defined by
n n.(n − 1) . . . (n − j + 1)
= . (152)
j 1.2 . . . j
The following proof is based on the fact that subsets are unordered, while per-
mutations are based on the order.
43
Proof. Assume that the set A has n elements and consider subsets with j > 0
elements. By proposition 2.3, the number of j-permutations of the set A is (n)j =
n.(n−1) . . . (n−j +1). Notice that the order does not matter in creating the subsets,
so that the number of subsets is lower than the number of permutations. This is
why each subset is associated with one or more permutations. By proposition 2.2,
there are j! ways to order asetwith j elements. Therefore, the number of subsets
n
with j elements is given by = n.(n−1).....(n−j+1)
1.2...j
, which concludes the proof.
j
The expression for the binomial coefficient can be simplified if we use the number
of j-permutations and the factorial number, which leads to
n (n)j
= . (153)
j j!
n!
The equality (n)j = (n−j)!
leads to
n n!
= . (154)
j (n − j)!j!
This immediately leads to
n n
= . (155)
j n−j
The following Scilab function performs the computation of the binomial number for
positive values of n and j.
44
function c = nchoosek ( n , j )
c = exp ( gammaln ( n +1) - gammaln ( j +1) - gammaln (n - j +1))
if ( and ( round ( n )== n ) & and ( round ( j )== j ) ) then
b = round ( b )
end
endfunction
In the following session, we compute the value of the binomial coefficients for
n = 1, 2, . . . , 5. The values in this table are known as Pascal’s triangle.
--> for n =0:5
--> for j =0: n
--> c = nchoosek ( n , j );
--> mprintf ( " %2d " ,c );
--> end
--> mprintf ( " \ n " );
--> end
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
We now explain why we choose to use the exp and the gammaln to perform our
computation for the nchoosek function. Indeed, we could have used a more naive
method, based on the prod function, as in the following example :
function c = nchoosek_naive ( n , j )
c = prod ( n : -1 : n - j +1 )/ prod (1: j )
endfunction
For small integer values of n, the two previous functions produce the same result.
Unfortunately, even for moderate values
of n, the naive method fails. In the following
n
session, we compute the value of with n = 10000 and j = 134.
j
--> nchoosek ( 10000 , 134 )
ans =
2.050+307
--> nchoosek_naive ( 10000 , 134 )
ans =
Inf
The reason why the naive computation fails is because the products involved in
the intermediate variables for the naive method are generating an overflow. This
means that the values are too large for being stored in a double precision floating
point variable. This is a pity, since the result can be stored in a double precision
floating point variable. The nchoosek function, on the other hand, computes first the
logarithm of the factorial number. This logarithm cannot overflow because if x is a
double precision floating point number, then log(x) can always be represented since
its exponent is always smaller than the exponent of x. In the end, the combination
of the exp and gammaln functions allows to accurately compute the result in the
sense that, if the result is representable as a double precision floating point number,
then nchoosek will produce a result as accurate as possible.
45
Notice that we use the round function in our implementation of the nchoosek
function. This is because the nchoosek function manages in fact real double preci-
sion floating point input arguments. Consider the example
where n = 4 and j = 1
n
and let us compute the associated number of nchoosek . In the following Scilab
j
session, we use the format so that we display at least 15 significant digits.
--> format (20);
-->n = 4;
-->j = 1;
-->c = exp ( gammaln ( n +1) - gammaln ( j +1) - gammaln (n - j +1))
c =
3.99 99999999 9999822
We see that there are 15 significant digits, which is the best that can be expected
from the exp and gammaln functions. But the result is not an integer anymore,
i.e. it is very close to the integer 4, but not exactly equal to it. This is why in
the nchoosek function, if n and j are both integers, we round the number c to the
nearest integer with a call to the round function.
Finally, notice that our implementation of the nchoosek function uses the func-
tion and. This allows to use arrays of integers as input variables. In the following
5
session, we compute , for j = 0, 1, . . . , 5 in one single call. This is a consequence
j
of the fact that the exp and gammaln both accept matrices input arguments.
-->n = 5 * ones (6 ,1);
-->j = (0:5) ’;
-->c = nchoosek ( n , j );
- - >[ n j c ]
ans =
5. 0. 1.
5. 1. 5.
5. 2. 10.
5. 3. 10.
5. 4. 5.
5. 5. 1.
It appears, as we will see later in this document, that the number of combinations
appears in several probability computations. For example, this function is used as
an intermediate computation in the hypergeometric distribution function which will
be presented later in this document. We will see that the numerical issue associated
with the use of floating point numbers is solved by the use of the logarithm of the
number of combinations, and this is why we now focus on this computation.
k
Let us introduce the function clog as the logarithm of the number :
x
n
clog (n, j) = log . (159)
j
46
1 2 3 4 5 6 7 8 9 10 J Q K
1♥ 2♥ 3♥ 4♥ 5♥ 6♥ 7♥ 8♥ 9♥ 10♥ J♥ Q♥ K♥
1♣ 2♣ 3♣ 4♣ 5♣ 6♣ 7♣ 8♣ 9♣ 10♣ J♣ Q♣ K♣
1♠ 2♠ 3♠ 4♠ 5♠ 6♠ 7♠ 8♠ 9♠ 10♠ J♠ Q♠ K♠
Figure 14: Cards of a 52 cards deck - ”J” stands for Jack, ”Q” stands for Queen and
”K” stands for King
which implies
The following nchooseklog computes clog (n, j) from the equation 161.
function c = nchooseklog ( n , k )
c = gammaln ( n + 1 ) - gammaln ( k + 1) - gammaln ( n - k + 1)
endfunction
47
To answer this question, we will compute the probability of each event. Since the
order of the cards can be changed by the player, we are interested in combinations
(and not in permutations). We make the assumption that the process of choosing
the cards is really random, so that all combinations of 5 cards have the same prob-
abilities, i.e. the distribution function is uniform. Since the order of the cards does
not matter, the sample space Ω is the set of all combinations of 5 cards chosen from
52 cards. Therefore, the size of Ω is
52
#Ω = = 2598960. (162)
5
48
S
p
p S
q F
S
p p S
q
F
q F
(start)
p S
p S
q q F
F
p S
q F
q F
Figure 16: A Bernoulli process with 3 trials. – The letter ”S” indicates ”success” and
the letter ”F” indicates ”failure”.
1. Each experiment has two possible outcomes, which we may call success and
failure.
2. The probability p ∈ [0, 1] of success of each experiment is the same for each
experiment.
The table 17 presents the value of the distribution function for each outcome x ∈ Ω.
We can check that the sum of probabilities of all events is equal to 1. Indeed,
X
f (xi ) = p3 + p2 q + p2 q + pq 2 + p2 q + pq 2 + pq 2 + q 3 (166)
i=1,8
49
x f (x)
SSS p3
SSF p2 q
SFS p2 q
SFF pq 2
FSS p2 q
FSF pq 2
FFS pq 2
FFF q3
b(3, p, 3) = p3 (170)
b(3, p, 2) = 3p2 p (171)
b(3, p, 1) = 3pq 2 (172)
b(3, p, 0) = q3 (173)
The following proposition extends the previous analysis to the general case.
Proposition 2.16. ( Binomial probability) In a Bernoulli process with n > 0 trials
with success probability p ∈ [0, 1], the probability of exactly j successes is
n j n−j
b(n, p, j) = pq , (174)
j
where 0 ≤ j ≤ n and q = 1 − p.
Proof. We denote by A ⊂ Ω the event that one process is associated with exactly j
successes. By definition, the probability of the event A is
X
b(n, p, j) = P (A) = f (x). (175)
x∈A
The size of the set A is the number of subsets of j elements in a set of size n. Indeed,
the order does not matter since we only require that, during the whole process, the
total number of successes is exactly j, no matter of the order of the successesand
n
failures. The number of outcomes with exactly j successes is therefore #(A) = ,
j
which, combined with equation 176, concludes the proof.
50
Example 2.3 A fair coin is tossed six times. What is the probability that exactly 3
heads turn up ? This process is a Bernoulli process with n = 6 trials. Since the coin
is fair, the probability of success at each trial is p = 1/2. We can apply proposition
2.16 with j = 3 and get
6
b(6, 1/2, 3) = (1/2)3 (1/2)3 ≈ 0.3125, (177)
3
We can check that our implementation is correct by checking against two simple
examples.
Example 2.4 A fair coin is tossed six times. What is the probability that exactly
3 heads turn up ?
-->n = 6; pb = 0.5;
-->p = binopdf_naive ( 3 , n , pb )
p =
0.3125
Example 2.5 Assume that we work in a factory producing n = 100 puffins each
day. The probability of producing a defective puffin is 2%. What is the probability
that exactly 0 puffins are produced ?
-->n = 200; pb = 2/100;
-->p = binopdf_naive ( 0 , n , pb )
p =
0.0175879
We can now consider an example where the probability of success of each Bernoulli
trial is far more likely. Assume that the probability of sucess is p = 1 − 10−20 , i.e.
there is an extremely strong probability of success. Assume that we make n = 100
trials. What is the probability of having 90 sucesses ?
-->n = 100; pb = 1 - 1. e -20;
--> binopdf_naive ( 90 , n , pb )
ans =
0.
51
In order to check our computation, we can use Wolfram Alpha [16], with the expres-
sion :
This number is small, but it is representable by the double precision floating point
numbers used in Scilab. The reason of the failure of the naive implementation is
because the probability of success is so close to 1 that is has been rounded to one
by Scilab, as show in the following session.
--> format ( " e " ,25)
--> pb = 1 - 1. e -20
pb =
1. 00 000 00 000 000 00 000 D +00
Hence the probability of failure qb is represented by the floating point number zero,
leading to an inaccurate computation. In fact, any probability smaller than the
machine epsilon ≈ 10−16 would lead to the same issue.
The following binopdf_lessnaive function is a more accurate implementation
of the Binomial distribution function. Instead of computing the complementary
probability qb from pb, it takes it directly as an input argument.
function p = binopdf_lessnaive ( x , n , pb , qb )
p = nchoosek (n , x ) .* pb ^ x .* qb ^( n - x )
endfunction
In the previous example, there is an obvious difference between the naive and the
accurate implementations. But there are cases where the difference between the two
functions is less obvious, leading to the false feeling that the naive implementation
is accurate. This happens in particular where the complementary probability q is
close to zero, but larger than the machine precision, that is, larger than ≈ 10−16 in
the context of Scilab. In this case, the computed value of qb=1-pb is nonzero, but
does not have full accuracy: its digits are mainly driven by the rounding errors.
There is still something wrong with our less naive implementation. Indeed, con-
sider the following case, where we use a probability pb close to one and a particularily
chosen value x.
-->n = 1. e9 ; pb = 1 - 1. e -14; qb = 1. e -14;
-->x = 1. e9 - 48
x =
52
999999952.
--> binopdf_lessnaive ( x , n , pb , qb )
ans =
Nan
We can compute the exact result with Wolfram Alpha and get
e = 8.05 5386429 9160072 1 e -302
The previous number is close to the low limit available for double precision normal-
ized floating point numbers. We can analyze what happens here by computing the
intermediate terms which appear in the computation, as in the following session.
--> nchoosek (n , x )
ans =
Inf
--> pb ^ x
ans =
0.9999900080431765037048
--> qb ^( n - x )
ans =
0.
In the following session, we check that our implementation gives an accurate result.
-->n = 1. e9 ; pb = 1 - 1. e -14; qb = 1. e -14;
-->x = 1. e9 - 48
x =
999999952.
--> binopdf ( x , n , pb , qb )
ans =
8.055383937519171497 -302
By looking more closely at the previous result, we see that the order of magnitude
is good, but that all digits are not exact. In the following session, we compute the
number of decimal significant digits from the formula d = − log10 (|c − e|/e), where
e is the exact result and c is the computed result.
-->d = - log10 ( abs (p - e )/ e )
d =
6.5094691877632762100347
53
We see that the accurate implementation has about 6 significant digits, which is
far less than the maximum achievable precision. Indeed, the maximum achievable
precision is 15 significant digits for Scilab doubles. In order to measure the sensi-
tivity of the output arguments depending on the input arguments, we compute the
condition number of the binomial distribution function for this particular value of x.
In the following session, we compute the probability p1 of a slightly modified input
argument x + 2 * %eps * x. Then we compute the relative difference of the input
rx and the relative difference of the output ry. The condition number is defined as
the ratio between these two relative differences.
--> p1 = binopdf ( x + 2 * %eps * x , n , pb , qb )
p1 =
8.055461212394521478 -302
--> rx = 2* %eps
rx =
4. 44 089 20 985 006 26 162 D -16
--> ry = abs ( p2 - p )/ p
ry =
9. 81 753 23 288 868 64 651 D -07
-->c = ry / rx
c =
2. 21 071 17 469 036 34 071 D +09
We see that the condition number is close to 1010 . This means that a relatively
small variation of the input argument x generates a relatively large change in the
output probability p. In our particular case, a very small variation of x can make
the probability varying suddenly from values close to zero to values close to 1.
Hence, the computation is numerically difficult and this explains why the ac-
curacy of the binopdf function is sometimes less than maximum. We emphasize
that this is not a problem which is specific to the binopdf function: it is, indeed, a
problem which comes from the behaviour of the function itself, which varies greatly
for particular values of the input argument x.
54
Therefore, the probability of selecting x red balls is given by the hypergeometric
distribution function, defined by
k m−k
x n−x
P (X = x) = h(x, m, k, n) = . (180)
m
n
The actual computation of this distribution function is not straightforward, as
we are going to see in the next section.
An example of application of this distribution function is given in the exercise 2.9,
which consider the computation of the probability of the prediction of earthquakes
”by chance”. Another example of this distribution function is given in the next
section.
55
1030
and . The following session shows that the last term is represented as the
500
Infinity number of the IEEE-754 standard.
--> nchoosek (1030 ,515)
ans =
Inf
We can solve the problem by computing first the logarithm of the probability
and then exponentiating the result. Let us introduce the function hlog defined by
Thecomputation
of the function log-combination function, i.e. the clog (n, k) =
n
log function, has been presented in the section 2.13.
k
The following hygepdf function finally computes the hypergeometric function
from the equations 182 and 183.
function p = hygepdf ( x , m , k , n )
c1 = nchooseklog ( k, x )
c2 = nchooseklog ( m -k , n - x )
c3 = nchooseklog ( m , n )
p_log = c1 + c2 - c3
p = exp ( p_log )
endfunction
56
and the order of these parameters. The notation for the hypergeometric distribution
function h presented in section 2.17 is taken from Matlab. The letters chosen for
the parameters and the orders of the arguments are the one chosen in Matlab.
In an earlier version of this document, we considered [7], section 5.1, ”Important
Distributions”. Indeed, Grinstead and Snell chose to denote the number of items
(or balls) in the urn by the capital letter N , which may lead to bugs in Scilab
implementations, because of the variable n, denoting the number of samples (number
of balls selected). Also, in Matlab, the total number of balls in the urn (i.e. m)
comes as the first parameter of the distribution function, while it is the last one in
Grinstead and Snell. For practical reasons, we choose to keep Matlab’s choice.
The gamma function presented in section 2.3 is covered in many textbook, as in
[1]. An in-depth presentation of the gamma function is done in [18].
2.20 Exercises
Exercise 2.1 (Recurrence relation of binomial ) Prove the proposition 2.14, which is the
following. For integers n > 0 and 0 < j < n, the binomial coefficients satisfy
n n−1 n−1
= + . (184)
j j j−1
Exercise 2.2 (Number of subsets) Assume that Ω is a finite set with n ≥ 0 elements. Prove
that there are 2n subsets of Ω. Consider that ∅ and Ω are proper subsets of Ω.
Exercise 2.3 (Probabilities of Poker hands) Why does computing the probability of a straight
flush forces to take into account the probability of the royal flush ? Explain other possible conflicts
between Poker hands. Compute the probabilities for all Poker hands in figure 15.
This exercise is partly given in [7], in section 3.2, ”Combinations”.
Exercise 2.4 (Bernoulli trial for a die experiment) A die is rolled n = 4 times. What is the
probability that we obtain exactly one ”6” ? What is the probability for n = 1, 2, . . . , 12 ?
Exercise 2.5 (Probability of a flight crash) Assume that there are 20 000 flights of airplanes
each day in the world. Assume that there is one accident every 500 000 flights. What is the
probability of getting exactly 5 crash in 22 days ? What is the probability of getting at least 5
crash in 22 days ? What is the probability of getting exactly 3 crash in 42 days ? What is the
probability of getting at least 3 crash in 42 days ? Consider now all that the year is a sequence of
16 periods of 22 days (ignoring the 13 days left in the year). What is the probability of having one
period in the year which contains at least 5 crash ?
This exercise is presented in ”La loi des séries noires”, by Janvresse and de la Rue [9].
Exercise 2.6 (Binomial function maximum) Consider the dicrete distribution function of a
Bernoulli process, as defined by 2.16. Show that
p n−j+1
b(n, p, j) = b(n, p, j − 1), (185)
q j
Consider the experiment presented in section 3.4, which consists in tossing a coin 10 times and
counting the number of heads. With a Scilab simulation, can you compute what is the number of
heads which is the most likely to occur ?
This exercise is given in [7], in the exercise part of section 3.2, ”Combinations”.
57
Exercise 2.7 (Binomial coefficients and Pascal’s triangle) This exercise is given in [7], in
chapter 3, ”Combinatorics”. Let a, b be two real numbers and let n be a positive integer. Prove
the binomial theorem which states that
X n
n
(a + b) = aj bn−j . (187)
j
j=0,n
n
The binomial coefficients can be written in a triangle, where each line corresponds to n and
j
each row corresponds to j, as in the following array
1.
1. 1.
A = 1. 2. 1.
(188)
1. 3. 3. 1.
1. 4. 6. 4. 1.
Use the binomial theorem in order to prove that the sum of the terms in the n-th row is 2n . Prove
that if the terms are added with alternating signs, then the sum is zero.
Binomial coefficients can also be represented in a matrix called Pascal’s matrix, where the
binomial coefficients are stored in the diagonals of the matrix.
1. 1. 1. 1. 1.
1. 2. 3. 4. 5.
A = 1. 3. 6. 10. 15.
(189)
1. 4. 10. 20. 35.
1. 5. 15. 35. 70.
Design a Scilab script to compute Pascal’s triangle and Pascal’s matrix and check that you find
the same results as presented in 188 and 189.
Exercise 2.8 (Binomial identity ) Prove the following binomial identity
2n X n2
= . (190)
n j
j=0,n
To help to prove this result, consider a set with 2n elements, where n elements are red and n
elements are blue. Compute the number of ways to choose n elements in this set.
This exercise is given in [7], in chapter 3, ”Combinatorics”.
Exercise 2.9 (Earthquakes and predictions) Assume that a person ”predicts” the dates of
major earthquakes (with magnitude larger than 6.5 or with a large number of deaths, etc...) in
the world during 3 years, i.e. in a period of 1096 days. Assume that the ”specialist” predicts 169
earthquakes. Assume that, during the same period, 196 major earthquakes really occur, so that 33
earthquakes were correctly predicted by the ”specialist”. What is the probability that earthquakes
are predicted by chance ?
This exercise is presented by Charpak and Broch in [4].
Exercise 2.10 (Log-factorial function) There is another possible implementation of the log-
factorial function. Indeed, we have, by definition
which implies
58
3 Simulation of random processes with Scilab
In this section, we present how to simulate random events with Scilab. The problem
of generating random numbers is more complex and will not be detailed in this
chapter. We begin by a brief overview of random number generation and detail
the random number generator used in the rand function. Then we analyze how to
generate random numbers in the interval [0, 1] with the rand function. We present
how to generate random integers in a given interval [0, m − 1] or [m1 , m2 ]. In the
final part, we present a practical simulation of a game based on tossing a coin.
3.1 Overview
In this section, we present a special class of random number generators so that we
can have a general representation of what exactly this means.
The goal of a uniform random number generator is to generate a sequence of real
values un ∈ [0, 1] for n ≥ 0. Most uniform random number generators are based on
the fraction
xn
un = (194)
m
where m is a large integer and xn is a positive integer so that 0 < xn < m . In many
random number generators, the integer xn+1 is computed from the previous element
in the sequence xn .
The linear congruential generators [10] are based on the sequence
where
Specific rules allow to design the parameters of a uniform random number generator.
As a practical example, consider the Urand generator [12] which is used by Scilab
in the rand function. Its parameters are
• m = 231 ,
59
• a = 843314861,
• c = 453816693,
• x0 arbitrary.
Each call to the rand function produces a new random number in the interval [0, 1],
as presented in the following session.
--> rand ()
ans =
0.2113249
--> rand ()
ans =
0.7560439
--> rand ()
ans =
0.0002211
60
ans =
0.6040239
--> rand ()
ans =
0.0079647
In most random processes, several random numbers are to use at the same time.
Fortunately, the rand function allows to generate a matrix of random numbers,
instead of a single value. The user must then provide the number of rows and
columns of the matrix to generate, as in the following syntax.
rand ( nr , nc )
The use of this feature is presented in the following session, where a 2 × 3 matrix of
random numbers is generated.
--> rand (2 ,3)
ans =
0.6643966 0.5321420 0.5036204
0.9832111 0.4138784 0.6850569
In the following session, we generate random integers in the set {0, 1, . . . , 4}.
-->r = generateInRange0M1 ( 5 , 4 , 4 )
r =
2. 0. 3. 0.
1. 0. 2. 2.
0. 1. 1. 4.
4. 2. 4. 0.
To check that the generated integers are uniform in the interval, we compute the
distribution of the integers for 10000 integers in the set {0, 1, . . . , 4}. We use the bar
to plot the result, which is presented in the figure 18. We check that the probability
of each integer is close to 15 = 0.2.
-->r = generateInRange0M1 ( 5 , 100 , 100 );
61
Figure 18: Distribution of random integers from 0 to 4.
--> end
--> end
--> counter = counter / 10000;
--> counter
counter =
0.2023 0.2013 0.1983 0.1976 0.2005
--> bar ( counter )
We emphasize that the previous verifications allow to check that the empirical
distribution function is the expected one, but that does not guarantee that the
uniform random number generator is of good quality. Indeed, consider the sequence
xn = n (mod 5). This sequence produces uniform integers in the set {0, 1, . . . , 4},
but, obviously, is far from being truly random. Testing uniform random number
generators is a much more complicated problem and will not be presented here.
It is easy to adapt the previous function to various needs. For example, the fol-
lowing function returns a matrix with size nr×nc, where entries are random integers
in the set {1, 2, . . . , m}.
function ri = generateInRange1M ( m , nr , nc )
ri = ceil ( rand ( nr , nc ) * m )
endfunction
The following function returns a matrix with size nr×nc, where entries are random
integers in the set {m1 , m1 + 1 . . . , m2 }.
function ri = generateInRangeM12 ( m1 , m2 , nr , nc )
f = m2 - m1 + 1
ri = floor ( rand ( nr , nc ) * f ) + m1
endfunction
62
3.4 Simulation of a coin
Many practical experiments are very difficult to analyze by theory and, most of the
time, very easy to experiment with a computer. In this section, we give an example
of a coin experiment which is simulated with Scilab. This experiment is simple, so
that we can check that our simulation matches the result predicted by theory. In
practice, when no theory is able to predict a probability, it is much more difficult to
assess the result of simulation.
The following Scilab function generates a random number with the rand function
and use the floor in order to get a random integer, either 1, associated with ”Head”,
or 0, associated with ”Tail”. It prints out the result and returns the value.
// tossacoin --
// Prints " Head " or " Tail " depending on the simulation .
// Returns 1 for " Head " , 0 for " Tail "
function face = tossacoin ( )
face = floor ( 2 * rand () );
if ( face == 1 ) then
mprintf ( " Head \ n " )
else
mprintf ( " Tail \ n " )
end
endfunction
With such a function, it is easy to simulate the toss of a coin. In the following
session, we toss a coin 4 times. The ”seed” argument of the rand is used so that the
seed of the uniform random number generator is initialized to 0. This allows to get
consistent results across simulations.
rand ( " seed " ,0)
face = tossacoin ();
face = tossacoin ();
face = tossacoin ();
face = tossacoin ();
Assume that we are tossing a fair coin 10 times. What is the probability that
we get exactly 5 heads ?
This is a Bernoulli process, where the number of trials is n = 10 and the prob-
ability is p = 5. The probability of getting exactly j = 5 heads is given by the
binomial distribution and is
63
The following Scilab session shows how to perform the simulation. Then, we
perform 10000 simulations of the process. The floor function is used in combination
with the rand function to generate integers in the set {0, 1}. The sum allows to count
the number of heads in the experiment. If the number of heads is equal to 5, the
number of successes is updated accordingly.
--> rand ( " seed " ,0);
--> nb = 10000;
--> success = 0;
--> for i = 1: nb
--> faces = floor ( 2 * rand (1 ,10) );
--> nbheads = sum ( faces );
--> if ( nbheads == 5 ) then
--> success = success + 1;
--> end
--> end
--> pc = success / nb
pc =
0.2507
64
Figure 19: A Galton board.
// simulgalton --
// Performs one simulation of the Galton board with n stages ,
// and returns the index j =1 ,2 ,... , n where the ball falls . --
function j = simulgalton ( n , verbose )
if exists ( " verbose " ," local " )==0 then
verbose = 0
end
jmin = 1
for k = 1 : n
if verbose == 1 then
mprintf ( " Step # %d ( %d )\ n " ,k , jmin )
end
r = rand ()
if r <0.5 then
if verbose == 1 then
mprintf ( " To the left !\ n " )
end
else
if verbose == 1 then
mprintf ( " To the right !\ n " )
end
jmin = jmin + 1
end
end
j = jmin
endfunction
In the following Scilab script, we perform 10 000 experiments of the Galton board.
The cups variables stores the number of balls in each cup. For each experiment, we
update the number of balls in the cup which has been randomly generated by the
process. The bar function allows to plot the figure.
rand ( " seed " , 0 )
n = 10
cups = zeros (1 , n +1)
nshots = 10000
for k = 1: nshots
j = simulgalton ( n );
cups ( j ) = cups ( j ) + 1;
65
Simulation of the Galton board with n=100
0.25
Galton board
Binomial distribution
0.20
0.15
0.10
0.05
0.00
1 2 3 4 5 6 7 8 9 10 11
Figure 20: Simulation of a Galton board with n = 10 stages and 100 simulations. –
The line is the binomial distribution function.
end
bar (1: n +1 , cups )
The figures 20, 21 and 21 presents the result of the simulation of a Galton board
for n = 100, n = 1000 and n = 1000 as bar plots.
In the figure 23, we present Scilab functions which allow to manage the binomial
distribution. The cdfbin cumulated density function will be presented in the more
general context of cumulated density functions.
In the following session, we use the binomial function to compute the proba-
bilities of the binomial distribution function. The plot function generates the plot
which is presented in the previous figures, where the binomial distribution function
is computed with n = 10 and p = 0.5.
pr = binomial (0.5 , n )
plot (1: n +1 , pr )
In the figures 20, 21 and 21, we see that the bar plot representing the Galton
simulations converge toward line plot representing the binomial distribution func-
tion.
66
Simulation of the Galton board with n=1000
0.25
Galton board
Binomial distribution
0.20
0.15
0.10
0.05
0.00
1 2 3 4 5 6 7 8 9 10 11
Figure 21: Simulation of a Galton board with n = 10 stages and 1000 simulations.
0.25
0.20
0.15
0.10
0.05
0.00
1 2 3 4 5 6 7 8 9 10 11
Figure 22: Simulation of a Galton board with n = 10 stages and 10 000 simulations.
67
following session, we use the perms to compute all the permutations of the vector
(1, 2, 3).
--> perms (1:3)
ans =
3. 2. 1.
3. 1. 2.
2. 3. 1.
2. 1. 3.
1. 3. 2.
1. 2. 3.
When the number of elements in the array is small, lower than 5 for example,
we may generate all the possible permutations and randomly chose one of them. In
the following session, we use n=4 and store in p all the possible permutations of the
column vector (1, 2, 3, 4)T . Then we generate the random number j in the interval
[1, n] and use this to get the permutation stored in the j-th row of p.
-->n = 4
n =
4.
-->p = perms ( (1: n ) ’ )
p =
4. 3. 2. 1.
4. 3. 1. 2.
4. 2. 3. 1.
[...]
1. 3. 2. 4.
1. 2. 4. 3.
1. 2. 3. 4.
-->j = floor ( rand ( ) * n) + 1
j =
4.
-->v = p (j ,:)
v =
4. 2. 1. 3.
The previous method is feasible, but only for very small values of n. Indeed, when
n grows, the required memory is grows as fast as n!, which is impractical for even
moderate values of n.
The following randperm function returns a random permutation of the numbers
in the interval [1, n]. It is based on the use of the grand function, which is used
to generate numbers in the uniform in the interval [0, 1[. Then, we use the gsort
function in order to sort the array and compute the order of the integers. Combined,
these two functions allows to generate a random permutation, at the cost of the sort
of a large number of values.
function p = randperm ( n )
[ ignore , p ] = gsort ( grand (1 ,n , " def " ) , " c " ," i " );
endfunction
For moderate values of n, the randperm function performs well, but for large values
of n, sorting the array might be expensive.
The grand function provides an algorithm to compute a random permutation
over a given array of values. This function is presented in figure 24. In the following
68
session, we use the grand function three times and get three independent permu-
tations of the vector (1:10)’. A side effect of the call to this function is that is
updates the state of the uniform random number generator used by grand.
-->s = grand ( 1 , " prm " , (1:10) ’ ) ’
s =
5. 1. 4. 3. 8. 7. 2. 10. 6. 9.
-->s = grand ( 1 , " prm " , (1:10) ’ ) ’
s =
4. 2. 3. 9. 6. 8. 10. 7. 5. 1.
-->s = grand ( 1 , " prm " , (1:10) ’ ) ’
s =
7. 4. 8. 9. 5. 2. 10. 1. 6. 3.
The source code used by grand to produce random permutations has been imple-
mented by Bruno Pin¸on. The algorithm is presented in [10], section 3.4.2 ”Random
sampling and shuffling”. According to Knuth, this algorithm was first published by
Moses and Oakford [13] in 1963 and by Durstenfeld [5] in 1964.
The following genprm function is a simplified version of this algorithm. We as-
sume that the size of the input matrix x is n. The algorithm proceeds by performing
n steps of the algorithm. At the step i, we compute an integer k as a random integer
uniform in the interval [i, n]. Then we exchange the values at indices i and k.
function x = genprm ( x )
n = size (x , " * " )
for i = 1: n
t = grand (1 ,1 , " unf " ,0 ,1)
k = floor ( t * ( n - i + 1)) + i
elt = x ( k )
x(k) = x(i)
x ( i ) = elt
end
endfunction
In the following session, we use the genprm function in order to generate three
independent permutations of the vector (1:10)’.
--> genprm ( 1:10 )
ans =
10. 2. 9. 8. 5. 3. 4. 7. 6. 1.
--> genprm ( 1:10 )
ans =
7. 4. 2. 9. 3. 1. 6. 5. 10. 8.
--> genprm ( 1:10 )
ans =
5. 2. 9. 6. 3. 4. 10. 1. 7. 8.
69
4 Acknowledgments
I would like to thank John Burkardt for his comments about the numerical compu-
tation of the permutation function. Thanks are also adressed to Samuel Gougeon
who suggested to improve the performance of the computation of the Pascal matrix.
70
5 Answers to exercises
5.1 Answers for section 1
Answer of Exercise 1.1 (Head and tail ) Assume that we have a coin which is tossed twice. We
record the outcomes so that the order matters, i.e. the sample space is Ω = {HH, HT, T H, T T }.
Assume that the distribution function is uniform, i.e. each of the head and the tail have an equal
probability. The size of the sample space Ω is #(Ω) = 4. The distribution is uniform so that
P (x) = 41 , for all x ∈ Ω.
1. What is the probability for the event A = HH, HT, T H ? The number of elements in the
event is #(A) = 3. By proposition 1.9, the probability of the event A is
#(A) 3
P (A) = = . (202)
#(Ω) 4
#(A) 2 1
P (A) = = = . (203)
#(Ω) 4 2
Answer of Exercise 1.2 (Two dice) Assume that we are rolling a pair of dice. Assume that
each face has an equal probability. The sample space is
A = {(1, 6), (6, 1), (2, 5), (5, 2), (4, 3), (3, 4)}. (205)
The number of elements in the event is #(A) = 6. The probability is therefore P (A) =
6 1
36 = 6 .
The number of elements in the event is #(A) = 2. The probability is therefore P (A) =
2 1
36 = 18 .
3. What is the probability of getting a double one, i.e. snakeeyes ? The event is made of the
1
set A = (1, 1), with #(A) = 1. Its probability is P (A) = 36 .
Answer of Exercise 1.3 (Méré’s experiments) The two proofs are based on the fact that
one event and its complementary event are satisfying P (A) + P (Ac ) = 1. Since P (A) is complex
to compute directly, we compute instead P (Ac ) and then use P (A) = 1 − P (Ac ).
The event A is that with four rolls of a die, at least one six turns up. The fact that a die is
rolled four times corresponds to the sample space
Ω = {i / i = 1, 6}4 . (207)
The size of the sample space is #(Ω) = 64 . Two make the computation more easy, we consider the
complementary event Ac , which is
Ac = {i / i = 1, 5}4 . (208)
71
4
The size of Ac is #(Ac ) = 54 The probability of event A is therefore P (A) = 1 − 56 ≈ 0.5177469.
Since P (A) > 1/2, de Méré wins consistently.
De Méré claims in 24 rolls of two dice, a pair of 6 would turn up (event B). The sample space
is now
The size of the sample space is #(Ω) = 3624 . The complementary event is
The size of Ac is #(Ac ) = (25 + 5 + 5)24 = 3524 . The probability of event A is therefore P (A) =
24
1 − 35
36 ≈ 0.4914039 < 1/2, which explains why de Méré looses consistently.
De Méré claims that 25 rolls were necessary to make the game favorable (event C). The same
35 24
derivation leads to the probability P (A) = 1 − 36 ≈ 0.5055315 > 1/2.
Can you compute with Scilab the probability for event A and a number of rolls equal to 1, 2,
3 or 4 ? The following Scilab session shows how to perform the computation.
-->i =1:4
i =
1. 2. 3. 4.
- - >1 -(5/6)^ i
ans =
0.1666667 0.3055556 0.4212963 0.5177469
Can you compute with Scilab the probability for the event B or C for a number of rolls equal to
10, 20, 24, 25, 30 ?
The following Scilab session shows how to perform the computation.
-->i =[10 20 24 25 30]
i =
10. 20. 24. 25. 30.
- - >1 -(35/36)^ i
ans =
0.2455066 0.4307397 0.4914039 0.5055315 0.5704969
Answer of Exercise 1.4 (Independent events) Assume that Ω is a finite sample space.
Assume that the two events A, B ⊂ Ω are independent. Let us prove that
P (B ∩ A)
P (B|A) = , (212)
P (A)
72
Let E1 , E2 ⊂ Ω be two sets not necessarily disjoints. Therefore, we have
The equality 214 is the result of proposition 1.7, which states that P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) −
P (E1 ∩ E2 ). The equality 214 can be deduced from the fact that P (E1 ∩ E2 ) ≥ 0, by definition of
a probability.
The equality 213 is therefore true for n = 2. To finish the proof, we will use induction. Let us
assume that the inequality is true for n, and let us prove that it is true for n + 1. Let us denote
by Fn ⊂ Ω the set defined by
[
Fn = Ei . (215)
i=1,n
S
We want to compute the probability P i=1,n+1 Ei = P (Fn+1 ). We see that Fn+1 = En+1 ∪ Fn .
We use the equality 214, which leads to
with 4 significant digits. This might be surprising, since the probability of a false positive is 1 %,
which is quite small.
Consider the case where 5 % of the population has the disease.
0.99 × 0.05
P (D|E) = (222)
0.99 × 0.05 + 0.01 × 0.95
≈ 0.8389 (223)
73
P (E|Dc ) P (D|E)
0.100000 0.047391
0.050000 0.090494
0.010000 0.332215
0.005000 0.498741
0.001000 0.832632
0.000500 0.908674
0.000100 0.980295
Figure 25: Probabilities of having the disease, given that the test is positive –
P (E|D) = 0.99 , P (D) = 0.005. The bold data is the data presented in the text of
the exercise, which corresponds to P (E|Dc ) = 0.01. The lower the probability of a
”false positive”, the more the result is accurate.
P (D) P (D|E)
0.500000 0.990000
0.100000 0.916667
0.050000 0.838983
0.010000 0.500000
0.005000 0.332215
0.001000 0.090164
0.000500 0.047188
Figure 26: Probabilities of having the disease, given that the test is positive –
P (E|D) = 0.99 , P (E|Dc ) = 0.01. The bold data is the data presented in the text
of the exercise, which corresponds to P (D) = 0.005. The more disease is rare, the
more the result is accurate.
with 4 significant digits. This shows that when more people have the disease (which is certainly
not desirable), the probability is higher.
Consider the case where the probability for a false positive is 0.1 % (but keep the probability
of a true positive equal to 99 %).
0.99 × 0.05
P (D|E) = (224)
0.99 × 0.05 + 0.001 × 0.95
≈ 0.9811 (225)
with 4 significant digits. This shows that when the probability of a false positive is lower, the
probability of having the disease, given that the test is positive, is higher.
The previous results might surprise a little bit, but are the result of the false positive. These
results are presented in the figures 25 and 26, which present the probability P (D|E) computed
with varying parameters P (E|Dc ) and P (D). The conclusion of this experiments is that a false
positive can reduce the reliability of the test if the disease is rare, or if the probability of a false
positive is high.
To make the previous computations clearer, consider the example where the population counts
1000 persons. Since 0.5 % of the population has the disease, this makes 0.005 × 1000 = 5 persons
who have the disease and 0.995 × 1000 = 995 who do not have the disease. From the 5 persons
who have the disease, there will be 0.99 × 5 = 4.95 persons who will have a positive test. Similarly,
from the 995 person who do not have the disease, there will be 0.01 × 995 = 9.95 persons who will
have a positive test. Therefore, given that the test is positive, the probability that the person has
74
the disease is
4.95
= 0.3322. (226)
4.95 + 9.95
The proof is based on the expansion of the binomial formula. By definition of the binomial
number 152, we have
n−1 (n − 1) . . . ((n − 1) − j + 1)
= (228)
j j(j − 1) . . . 1
(n − 1) . . . (n − j)
= , (229)
j(j − 1) . . . 1
and
n−1 (n − 1) . . . ((n − 1) − (j − 1) + 1)
= (230)
j−1 (j − 1)(j − 2) . . . 1
(n − 1) . . . (n − j + 1)
= . (231)
(j − 1)(j − 2) . . . 1
Aj ⊂ Ωn ⊂ Ωn+1 , (235)
75
for j = 1, 2n . We can construct additionnal subsets by considering the sets
Bj = Aj ∪ {n + 1}, (236)
for j = 1, 2n . For each j = 1, 2n , we have Bj ⊂ Ωn+1 . We have found 2n sets (Aj )j=1,2n and 2n
sets (Bj )j=1,2n which are subsets of Ωn+1 . The total number of subsets is therefore 2n +2n = 2n+1 ,
which concludes the proof.
Answer of Exercise 2.3 (Probabilities of Poker hands) Why does computing the probability
of a straight flush forces to take into account the probability of the royal flush ?
Consider the event where the 5 cards are in sequence with a single suit. This might be a straight
flush, if the last card in the sequence is not an ace. If the last card is an ace, then the hand is not
a straight flush anymore, since it is a royal flush. Therefore, the event which is associated with a
straight flush is ”a sequence of cards with a single suit, but not a royal flush”. When computing the
probability of a straight flush, the simplest is to compute the probability of the event ”sequence of
cards with a single suit”, and to remove the probability of the royal flush.
Explain other possible conflicts between Poker hands.
The following is a list of conflicts between Poker hands.
• A straight is a hand with 5 fives in sequence, but at least two cards have different suit. If
the cards are in sequence and have the same suit, this is not a straight anymore, this is a
straight flush.
• A double pair is when there are two pairs in the hand. But the two pairs must have different
suit, since, if they all have the same suit, this is not a double pair anymore, this is a four of
a kind.
• A flush is a hand where all the cards have the same suit. But the cards must not be in
sequence, since they would not form a flush anymore, but would form a straight flush or a
royal flush.
Compute the probabilities for all Poker hands in figure 15.
The probability of a pair is computed
as follows. There are 13 different ranks in the deck and,
4 4
once the rank is chosen, there are different pairs of this rank Therefore, there are 13
2 2
different pairs in the deck. The remaining 3 cards in the hand are chosen so that there have a
different suit from the current pair. If not, there would be a ”three of a kind” or even
a four of a
12
kind. Therefore, their suit must be chosen in the remaining 12 suits. There are different set
3
of
3 cards which have different ranks. For each card, there are 4 different suits, so that there are
12
.43 different combinations for the remaining 3 cards in the hand. The probability of a pair
3
is therefore
4 12
13. . .43
2 3 1098240
P (pair) = = ≈ 0.4225690 (237)
52 2598960
.
5
To compute the probability of a double pair, we must take into account the fact that the two
pairs must have different suits. If not, that would be a four of a kind, instead of a double pair.
This is why we begin by choosing two different ranks in the set of 13 ranks available. Once done,
each pair can have two of the four suits. Therefore, the total number of double pair in the first 4
2
13 4
cards is . . The 5th card must be chosen from the 11 remaining ranks (if not, one of
2 2
the pair would become a three of a kind ). Once the rank of the 5th card is chosen, it can have one
of the 4 available suits. Therefore, there are 11 ∗ 4 different choices for the 5th card. In the end,
76
the probability for a double pair is
2
13 4
. .11.4
2 2 123552
P (double pair) = ≈ 0.0475390 (238)
52 2598960
.
5
The probability
of a three of a kind is computed as follows. There are 13 different
ranks and
4 4
there are ways to choose 3 cards in a set of 4. Therefore, there are 13. different ways
3 3
to select the 3 first cards of the same kind.
The last two cards must be chosen so that it does
12
not create a four of a kind. There are different ways to select two ranks in the set of the
2
remaining 12 ranks. After that the ranks are chosen, each of the 2 cards can have one of the 4
2 12
suits, so that there are 4 ways to select the suits. All in all, there are .42 different ways to
2
select the last 2 cards. In the end, the probability for a three of a kind is
4 12
13. . .42
3 2 54912
P (three of a kind) = = ≈ 0.0211285 (239)
52 2598960
.
5
The probability of a four of a kind and of a full house have already been computed in the text.
The probability of a straight flush is computed as follows. The following is a list of the 10
possible sequences:
1 2 3 4 5 – 2 3 4 5 6 – 3 4 5 6 7 – 4 5 6 7 8 – 5 6 7 8 9 – 6 7 8 9 10
7 8 9 10 J – 8 9 10 J Q – 9 10 J Q K – 10 J Q K 1,
where all the cards can take the same among the 4 possible suits , ♥, ♣ and ♠. Therefore, the
total number of such hands is 4 ∗ 10 = 40. In order for these hands to be straight flush, and not
royal flush, we must remove the 4 royal flush. Finally, the total number of straight flush is 4 ∗ 10 − 4
and the probability of this hand is
4 ∗ 10 − 4 36
P (straight flush) = = ≈ 0.0000154 (240)
52 2598960
.
5
The probability of the royal flush is easy to compute since there are only 4 such hands in the
deck. The probability of the royal flush is therefore
4 4
P (royal flush) = = ≈ 0.0000015 (241)
52 2598960
.
5
The probability of a straight is computed as follows. The following is a list of the 10 possible
sequences:
1 2 3 4 5 – 2 3 4 5 6 – 3 4 5 6 7 – 4 5 6 7 8 – 5 6 7 8 9 – 6 7 8 9 10
7 8 9 10 J – 8 9 10 J Q – 9 10 J Q K – 10 J Q K 1,
where each card can have one of the 4 suits , ♥, ♣ and ♠. Therefore, the total number of such
hands is 45 .10. But we require that a straight is neither a straight flush, nor a royal flush, so that
we have to remove these higher value hands. Using the previous computation for the straight flush,
we get 45 .10 − 4 ∗ 10 different straight. Therefore, the probability of the straight is
45 .10 − 4 ∗ 10 10200
P (straight) = = ≈ 0.0039262 (242)
52 2598960
.
5
77
Name Number Probability
total 2598960 1.
no pair 1302540 0.5011774
pair 1098240 0.4225690
double pair 123552 0.0475390
three of a kind 54912 0.0211285
straight 10200 0.0039262
flush 5108 0.0019654
full house 3744 0.0014406
four of a kind 624 0.0002401
straight flush 36 0.0000139
royal flush 4 0.0000015
The probability of aflush is computed as follows. There are 4 suits in the deck. Once the
13
suit is chosen, there are different sequences of 5 cards which have the same suit. The total
5
13
number of such a hand is therefore 4. . But we have to remove straight flush and royal flush
5
13
from this counting. This leads to 4. − 4 ∗ 10 different flush. Therefore, the probability of this
5
hand is
13
4. − 4 ∗ 10
5 5108
P (flush) = = ≈ 0.0019654 (243)
52 2598960
.
5
The probability of a no pair event is simple to compute when we already know the probabilities
of all other events.
P (no pair) = 1 − P (royal flush) − P (straight flush) − P (four of a kind) − P (full house)
(244)
−P (flush) − P (straight) − P (three of a kind) − P (double pair) − P (pair)
(245)
13
4. − 4 ∗ 10
5 1302540
= = ≈ 0.5011774 (246)
52 2598960
.
5
Answer of Exercise 2.4 (Bernoulli trial for a die experiment) A fair die is rolled n = 4
times. What is the probability that we obtain exactly one ”6” ? For each roll, the probability of
getting a six is p = 1/6. Since each roll is independent of the previous rolls, this is a Bernoulli
process with n = 4 trials. Therefore the probability of getting exactly one ”6” is
4
b(4, 1/6, 1) = (1/6)1 (5/6)3 ≈ 0.3858025. (247)
1
What is the probability for n = 1, 2, . . . , 12 ? The following Scilab function computes the
probability of getting exactly one ”6” in n toss of a fair die.
78
// one6inNtoss --
// Computes the probability of getting exactly one "6" in n toss of a fair
function bnpj = one6inNtoss ( n )
p = 1/6
q = 1-p
j = 1
bnpj = nchoosek (n , j ) * p ^ j * q ^( n - j )
endfunction
In the following session, we use the function one6inNtoss to compute the probability for n =
1, 2, . . . , 12.
--> for n = 1:12
--> b = one6inNtoss ( n );
--> mprintf ( " In %d toss , p ( one six )= %f \ n " ,n , b );
--> end
In 1 toss , p ( one six )=0.166667
In 2 toss , p ( one six )=0.277778
In 3 toss , p ( one six )=0.347222
In 4 toss , p ( one six )=0.385802
In 5 toss , p ( one six )=0.401878
In 6 toss , p ( one six )=0.401878
In 7 toss , p ( one six )=0.390714
In 8 toss , p ( one six )=0.372109
In 9 toss , p ( one six )=0.348852
In 10 toss , p ( one six )=0.323011
In 11 toss , p ( one six )=0.296094
In 12 toss , p ( one six )=0.269176
Answer of Exercise 2.5 (Probability of a flight crash) Assume that there are 20000 flights
of airplanes each day in the world. Assume that there is one accident every 500 000 flights. What
is the probability of getting exactly 5 crash in 22 days ?
There are several answers to this question, depending on the accuracy required.
1. Flight by flight. The first approach is based on the analysis of a Bernoulli process, where
each flight has a probability of crash.
2. Time decomposition. The second approach is based on the analysis of a Bernoulli process,
where each days has a probability of crash.
3. Poisson approximation. The third approach is based on the Poisson approximation of the
binomial distribution function.
This answer is based on the hypothesis that the flights are independent. Therefore, the process
can be considered as a Bernoulli process where each flight has a crash probability equal to p =
1
500000 . The number of steps in the Bernoulli process is equal to the number of flights n. In 22
days, the number of flights is n = 22 × 20000. The probability of getting exactly 5 crash in 22 days
is
22 × 20000 5 22×20000−5
P (”exactly 5 crash in 22 days”) = p q ≈ 0.0018241 (248)
5
The figure 28 presents the results for various number of crash.
What is the probability of at least 5 crash in 22 days ? The probability of getting more than
5 crash can be computed as depending the probability of getting 0, 1, 2, 3 or 4 crash. Therefore,
P (”at least 5 crash in 22 days”) = 1 − P (”0, 1, 2, 3 or 4 crash in 22 days”) (249)
X 22 × 20000
= 1− pj q 22×20000−j (250)
j
j=0,4
≈ 0.0021294 (251)
79
Event Probability
0 crash in 22 days 0.41478255
1 crash in 22 days 0.36500937
2 crash in 22 days 0.16060408
3 crash in 22 days 0.04711041
4 crash in 22 days 0.01036424
5 crash in 22 days 0.00182409
6 crash in 22 days 0.00026753
7 crash in 22 days 0.00003363
8 crash in 22 days 0.00000370
9 crash in 22 days 0.00000036
10 crash in 22 days 0.00000003
Figure 28: Crash probabilities for 22 days with one crash every 500 000 flights and
20000 flights each day
Event Probability
0 crash in 42 days 0.18637366
1 crash in 42 days 0.31310838
2 crash in 42 days 0.26301125
3 crash in 42 days 0.14728625
4 crash in 42 days 0.06186013
5 crash in 42 days 0.02078494
6 crash in 42 days 0.00581976
7 crash in 42 days 0.00139674
8 crash in 42 days 0.00029331
9 crash in 42 days 0.00005475
10 crash in 42 days 0.00000920
Figure 29: Crash probabilities for 42 days with one crash every 500 000 flights and
20 000 flights each day
What is the probability of getting exactly 3 crash in 42 days ? What is the probability of
getting at least 3 crash in 42 days ?
The same computations can be performed for 42 days, which represent 6 weeks. The figure 29
presents the results.
The probability of having exactly 3 crash in 42 days is
22 × 20000 3 22×20000−3
P (”exactly 3 crash in 42 days”) = p q (252)
3
≈ 0.14728625 (253)
80
We now present another approach for the computation of the same problem. The method is
based on counting the number of crash during a given time unit, for example one day. We consider
that the process is a Bernoulli process, where each day is associated to the probability that exactly
1 crash occur, which is obviously an approximation, since it is possible that more than one crash
occur during one day. Since there is one crash every 500 000 flights, the probability that one flight
has no crash is p = 49999
50000 . By hypothesis, all flight are independent, therefore, the probability of
getting no accident in one day is
20000
49999
P (”no crash in 1 day”) = ≈ 0.6703174. (257)
50000
49999 20000
where p̃ = 1 − 50000 ≈ 0.0392106 and q̃ = 1 − p̃. This leads to
We see that the result is close, but different from the previous probability, which was equal to
0.00182409. In fact, the result depends on the time unit that we consider for our calculation.
Obviously, if we consider that the unit time is the 1/2 day, the formula is changed to
22 × 2 5 22×2−5
P (”exactly 5 crash in 22 days”) ≈ p̃ q̃ (261)
5
(262)
49999 20000/2
where p̃ = 1 − 50000 and q̃ = 1 − p̃. This gives
If we consider the hour as the time unit, we get P (”exactly 5 crash in 22 days”) = 0.0017973.
The third approach is based on the approximation that the binomial distribution function is
closely approximated when n is large by the Poisson distribution function, that is
λj
b(n, p, j) ≈ exp(−λ), (264)
j!
where λ = np. Here, the parameter λ is equal to λ = 22 × 20000/500000 = 0.88. The result is
λ5
P (”exactly 5 crash in 22 days”) ≈ exp(−λ) (265)
5!
≈ 0.0018241. (266)
The three approaches are presented in figure 30. We can see that the flight by flight approach
gives a result which is very close to the Poisson approximation, with 6 common digits. The various
approaches based on time decomposition gives different results, with only 1 common digit. We can
check that smaller time units lead to results which are closer to the flight by flight approach.
Consider now all that the year is a sequence of 16 periods of 22 days (ignoring the 13 days left
in the year). What is the probability of having one period in the year which contains at least 5
crash ?
81
Event Probability
Flight by flight 0.00182409
Time Unit = day 0.0012366
Time Unit = 1/2 day 0.0015155
Time Unit = hour 0.0017973
Poisson 0.0018241
Figure 30: Probability of having exactly 5 crash in 22 days with one crash every 500
000 flights and 20 000 flights each day - Different approaches
This decomposition corresponds to 16×22 = 352 days (instead of the usual 365 days by regular
year), where the remaining 13 days are ignored. We want to compute the probability that one of
the 16 periods contains at least 5 crash. We have
P (”one period contains at least 5 crash”) = 1 − P (”all periods contain less than 4 crash”).
(267)
Since the periods are disjoints, the crash in all 16 periods are independents, so that
P (”all periods contain less than 4 crash”) = P (”one period contain less than 4 crash”)16 . (268)
Now the probability of having less than 4 crash in one period is equal to
It has already been computed that the probability of having at least 5 crash in one period of
22 days is
which is approximately 3%. This is much larger that the original probability 0.0021294 ≈ 0.2% of
having at least 5 crash in 22 days.
The approach presented here is still too simplified. Indeed, we should instead consider the
probability that one period in the year contains at least 5 crash and consider all possible periods of
22 days in the year. In this problem, the periods are not disjoints anymore, so that the computations
performed earlier cannot be applied. This kind of problems involves a method called scan statistics
which will not be presented here (see [9, 6]).
Answer of Exercise 2.6 (Binomial function maximum) Consider the discrete distribution
function of a Bernoulli process, as defined by 2.16. Let us prove that
p n−j+1
b(n, p, j) = b(n, p, j − 1), (276)
q j
82
for j ≥ 1. By definition, we have
n
b(n, p, j − 1) = pj−1 q n−j+1 , (277)
j−1
where q = 1 − p. By pre-multiplying the previous equality by p/q, we find
p n
b(n, p, j − 1) = pj q n−j , (278)
q j−1
83
function p = tossingcoin ( j )
rand ( " seed " ,0)
nb = 10000;
success = 0;
for i = 1: nb
faces = floor ( 2 * rand (1 ,10) );
nbheads = sum ( faces );
if ( nbheads == j ) then
success = success + 1;
end
end
p = success / nb
endfunction
By using this function with different values of j, we can easily determine the value of j which
maximizes this probability. The following session performs a loop for j = 0, 10 and prints out the
computed probability for each value of j.
--> for j = 0:10
--> p = tossingcoin ( j );
--> mprintf ( " P ( j = %d )= %f \ n " ,j , p );
--> end
P ( j =0)=0.000800
P ( j =1)=0.009700
P ( j =2)=0.043900
P ( j =3)=0.119600
P ( j =4)=0.206800
P ( j =5)=0.250700
P ( j =6)=0.200000
P ( j =7)=0.114400
P ( j =8)=0.043700
P ( j =9)=0.008800
P ( j =10)=0.001600
We see that j = 5 maximizes the probability, which corresponds to the fact that the most probable
event is that the number of heads in 10 tosses of a coin is 5. Indeed, this corresponds to the
jm = np = 10 × 21 = 5 value that we just found by theory.
Answer of Exercise 2.7 (Binomial coefficients and Pascal’s triangle) Let a, b be two real
numbers and let n be a positive integer. Let us prove the binomial theorem which states that
X n
(a + b)n = aj bn−j . (290)
j
j=0,n
The first term of his product an , the second term is an−1 b, and so forth, until the last term bn . The
expansion can then be written as the sum of terms aj bn−j , with j = 0, n, and where each term is
associated with a coefficient that we have to compute. Consider the term aj bn−j and let us count
the number of times that his term will appear in the expansion. This is equivalent as choosing j
elements in a set of n elements. Indeed, theorder of the elements does not count, since ab = ba.
n
Therefore, each term aj bn−j will appear times, which concludes the proof.
j
n
The binomial coefficients can be written in a triangle, where each line corresponds to n
j
84
and each row corresponds to j, as in the following array
1.
1. 1.
L= 1. 2. 1. (292)
1. 3. 3. 1.
1. 4. 6. 4. 1.
Let us use the binomial theorem in order to prove that the sum of the terms in the n-th row is 2n .
We apply the binomial theorem with a = b = 1 and get
X n
n
(1 + 1) = (293)
j
j=0,n
n
= 2 . (294)
Let us prove that if the terms are added with alternating signs, then the sum is zero. We apply
the binomial theorem with a = 1 and b = −1 and we get
X n
(1 − 1)n = (1)j (−1)n−j (295)
j
j=0,n
n n n
= (1)0 (−1)n + (1)1 (−1)n−1 + . . . + (1)n (−1)0 (296)
0 1 n
n n n
= (−1)n + (−1)n−1 + ... + (297)
0 1 n
This last equality proves that, if the sum is beginning
with
the
lastterm, alternating the signs of
n n
the terms leads to a zero sum. Additionally, we have = so that Pascal’s triangle has
j n−j
a symmetry property. If we use this symmetry property in the binomial expansion, we have
X n
n
(a + b) = an−j bj . (298)
n−j
j=0,n
85
function c = pascallow ( n )
c = zeros (n , n );
for i = 1: n
c (i ,1: i ) = nchoosek (i -1 ,(1: i ) -1);
end
endfunction
In the following session, we use the pascallow function to check that we get the matrix presented
in 292.
--> pascallow (5)
ans =
1. 0. 0. 0. 0.
2. 1. 0. 0. 0.
3. 3. 1. 0. 0.
4. 6. 4. 1. 0.
5. 10. 10. 5. 1.
In order to compute Pascal’s symetric matrix, some algebra is required so that the matrix
elements Sij are filled anti-diagonal by anti-diagonal, where an anti-diagonal is associated with a
constant sum i + j. The following function computes Pascal’s symetric matrix of order n.
function c = pascalsym ( n )
c = zeros (n , n );
for i = 1: n
c (i ,1: n ) = nchoosek ( i +(1: n ) -2 ,i -1);
end
endfunction
The following session shows a sample use of this function in order to check that we get the same
result as presented in equation 302.
--> pascalsym (5)
ans =
1. 1. 1. 1. 1.
1. 2. 3. 4. 5.
1. 3. 6. 10. 15.
1. 4. 10. 20. 35.
1. 5. 15. 35. 70.
We can additionnaly define Pascal’s upper triangular matrix as in the following function.
function c = pascalup ( n )
c = zeros (n , n );
for i = 1: n
c (i , i : n ) = nchoosek ( ( i : n ) -1 , i -1 );
end
endfunction
In the following session, we compute Pascal’s upper triangular matrix of order 5.
--> pascalup ( 5 )
ans =
1. 1. 1. 1. 1.
0. 1. 2. 3. 4.
0. 0. 1. 3. 6.
0. 0. 0. 1. 4.
0. 0. 0. 0. 1.
The lower, upper and symetric Pascal matrix are related by the equality L ∗ U = S, as shown
in the following session.
86
-->L = pascallow ( 5 );
-->U = pascalup ( 5 );
-->S = pascalsym ( 5 );
-->L * U - S
ans =
0. 0. 0. 0. 0.
0. 0. 0. 0. 0.
0. 0. 0. 0. 0.
0. 0. 0. 0. 0.
0. 0. 0. 0. 0.
To help ourselves to prove this result, we consider a set A with 2n elements, where n elements are
red and n elements are blue. Let us compute the number of ways to choose n elements in this set.
For example, we consider the case n = 3 so that the set is
A = {R1 , R2 , R3 , B1 , B2 , B3 } . (304)
We are searching subsets Ai ⊂ A, where i = 1, imax , where imax is the positive integer to be
computed. To organize our computation, we order the subsets depending on the number of red
balls in the subset. The following is the list of all possible subsets of size 3 with no red element.
A1 = {B1 , B2 , B3 } . (305)
The following is the list of all possible subsets of size 3 with 1 red element (and, therefore, 2 blue
elements).
The other
subsets
can be computed with the same method so
that we finally find that there are,
6 2n
indeed, = 20 subsets. Therefore, we write the term as the sum of the subsets where
3 n
there are j red elements, where 1 ≤ j ≤ n. We have
2n X
= Cj , (309)
n
j=0,n
n
where Cj is the number of subsets of A with n elements where j elements are red. There are
j
n
ways to choose the j red elements and ways to choose the n−j blue elements so that Cj =
n−j
n n n n
. But the symetry property of the binomial function states that = ,
j n−j j n−j
which leads to
2
n
Cj = . (310)
j
Finally, the previous equality can be plugged into 309 so that the equality 303 holds true, which
concludes the proof.
87
Answer of Exercise 2.9 (Earthquakes and predictions) Assume that a person predicts the
dates of major earthquakes (with magnitude larger than 6.5 or with a large number of deaths,
etc...) in the world during 3 years, i.e. in a period of 1096 days. Assume that the ”specialist”
predicts 169 earthquakes. Assume that, during the same period, 196 major earthquakes really
occur, so that 33 earthquakes were correctly predicted by the ”specialist”. What is the probability
that the earthquakes are predicted by chance ?
We consider the set of m = 1096 days, where k = 196 days are earthquakes and m − k =
1096 − 196 are not earthquake days. In this set, we are picking n = 169 days, where x = 33 days
are earthquakes.
The probability of selecting x earthquake days is given by the hypergeometric distribution
function defined by
k m−k
x n−x
P (X = x) = h(x, m, k, n) = . (311)
m
n
which is approximately 7 %.
To know if the prediction is based on chance, we perform the computation, with Scilab, of
all the probabilities with x = 0, 1, . . . , 169. The following script allows to compute the required
probability and to draw the plot which is presented in figure 31.
// The number of days in three years
m = 1096
// The number of days selected
n = 169
// The number of earthquake days in three years
k = 196
// The number of earthquake days selected
x = 33
// The probability of picking 169 days , where 33 are earthquakes .
p = hygepdf ( x , m , k , n )
88
Probability of random predictions
0.09
0.08
0.07
0.06
0.05
Probability
0.04
0.03
0.02
0.01
0.00
0 20 40 60 80 100 120 140 160 180
Number of earthquakes
Figure 31: Probability of having x earthquake days while choosing n = 169 days
from m = 1096 days, where k = 196 days are earthquake days.
References
[1] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables. Dover Publications Inc., 1972.
[2] George E. Andrews, Richard Askey, and Ranjan Roy. Special Functions. Cam-
bridge University Press, Cambridge, 1999.
[4] Georges Charpak and Henri Broch. Devenez sorciers, devenez savants. Odile
Jacob, 2002.
89
[5] Richard Durstenfeld. Algorithm 235: Random permutation. Commun. ACM,
7(7):420, 1964.
[9] É. Janvresse and T. de la Rue. La loi des séries noires. La Recherche, 393:52–
53, Janvier 2006. http://www.univ-rouen.fr/LMRS/Persopage/Janvresse/
Publi/series_noires.pdf.
[12] Michael A. Malcolm and Cleve B. Moler. Urand: a universal random number
generator. Technical report, Stanford University, Stanford, CA, USA, 1973.
[14] World Health Organization. Life table, united states of america, 2009. http:
//www.who.int/.
[17] M. Ross, Sheldon. Introduction to probability and statistics for engineers and
scientists. John Wiley and Sons, 1987.
[19] Edmund Taylor Whittaker and George Neville Watson. A Course of Modern
Analysis. Cambridge Mathematical Library, 1927.
[21] A. Talha Yalta. The accuracy of statistical distributions in microsoft excel 2007.
Comput. Stat. Data Anal., 52(10):4579–4586, 2008.
90
Index
Bernoulli, 30
binomial, 25
combination, 25
combinatorics, 11
complementary, 2
conditional
distribution function, 9
probability, 9
disjoint, 2
event, 3
factorial, 13, 21
fair die, 8
gamma, 15
grand, 34
intersection, 2
outcome, 3
permutation, 12
permutations, 23
poker, 28
rand, 34
random, 3
rank, 28
sample space, 3
seed, 35
subset, 2
suit, 28
tree diagram, 12
uniform, 8
union, 2
Venn, 2
91