STA02A2 - Chapter 1
STA02A2 - Chapter 1
CHAPTER 1: Probability
1. Introduction
Probability refers to the study of randomness and uncertainty. In any situation where a number of different
outcomes may occur, the theory of probability provides methods for quantifying the chance or likelihood
associated with the various outcomes. The idea of probability, chance, or randomness is quite old, but its
axiomatization in mathematical terms occurred relatively recently. The mathematical theory of probability has
been applied to many problems in different disciplines, such as genetics, kinetic theory of gases, computer science,
engineering, operations research, actuarial science, social science, marketing, etc.
this reasoning is faulty, he still made money betting on this game. Given his success in this game, he modified
the game by betting on getting at least one double six in 24 rolls of two dice. Since the probability of getting a
1
pair of sixes in one roll of two dice is 36 , he thought that the probability of getting a pair of sixes in 24 rolls of
24
two dice is 36 . He ultimately lost a lot of money on this game. He turned to his friend Blaise Pascal to explain
why he kept losing a game that had such favourable odds. Pascal liaised with Pierre de Fermat and the two of
them laid out the fundamental principles of probability theory for the first time.
1
Chapter 1 Statistics 2A STA02A2
Since then, the theory of probability developed rapidly and was applied to areas other than gambling, such as
demography, physics, engineering, and so on. In 1933, the Russian mathematician Andrei Kolmogorov,
established the modern axiomatic foundation of probability theory. He showed that the behaviour of all
probabilities is based on three simple axioms, which is still used today in probability theory.
Random experiment
A random experiment is a procedure that results in an uncertain outcome.
Sample space
The sample space is the set of all possible outcomes of a random experiment. The sample space is denoted by Ω
and an element of Ω is denoted by ω. For example:
➢ A commuter passes through three traffic lights and either stops (s) or continues (c). The sample space is
the set of classifications:
= ccc, ccs, css, csc, sss, ssc, scc, scs
➢ The sample space of the number of jobs in a print queue of a company is all the nonnegative integers:
= 0,1, 2,
➢ The sample space of the length of time between successive earthquakes in a particular region that are
greater than a certain magnitude is the set of all nonnegative real number:
= t | t 0
2
Chapter 1 Statistics 2A STA02A2
Events
An event is a subspace of the sample space. The certain event is an event that is sure to occur. The impossible
event is an event that has no chance of occurring. An elementary event is an event that consists of only one
outcome of the sample space. The empty set is the set that does not contain any of the outcomes of the sample
space and is denoted by = . Events are typically denoted with uppercase letters.
For example, let A = the commuter stops at the first light, then A = sss, ssc, scc, scs .
Representing probabilities
Venn diagrams are used to visually represent events and how they co-exist in the sample space. The sample space
is represented by a rectangular box and circles are typically used to describe the events. The sizes of the circles
do not necessarily correspond to the sizes of the probabilities. Probability calculations can become difficult using
Venn diagrams. These diagrams are best used to visualise events rather than to calculate probabilities of events.
A contingency table classifies the outcomes (events) of one variable in rows and the other variable in columns. It
can show frequency counts or probabilities. Tree diagrams are discussed in Section 3.3 of the lecture notes.
Complement of an event
The set of all outcomes in the sample space excluding the event, is referred to as the complement of the event.
For an event A, its complement is Ac = A = i : i A . Note: A c is the main notation used in the textbook, and
A is the main notation used in the notes. Both forms of notation are correct.
For example, let A = the commuter stops at the first light, therefore A = the commuter continues at the first light,
then A = ccc, ccs, css, csc .
3
Chapter 1 Statistics 2A STA02A2
Union of events
Union refers to the combination of events. The union of two events A and B is denoted by A B and contains
the outcomes of event A, or the outcomes of event B, or the outcomes of both A and B, i.e., either A or B or both
events occurred. The union of events A1 , A2 , A3 , is denoted in set notation as Ai = : Ai , for some i .
i =1
Intersection of events
Intersection refers to where events occur together. The intersection of two events A and B is denoted by A B
and describes all the outcomes that are common to both A and B, i.e., both A and B occurred. The intersection of
A1 , A2 , A3 , events is denoted in set notation as Ai = : Ai , for all i .
i =1
4
Chapter 1 Statistics 2A STA02A2
Exhaustive events
Events are exhaustive if they fill up the sample space, i.e., the union of events is the sample space.
Partitioning
Events that are mutually exclusive and exhaustive form a partitioning of the sample space.
Subset/containment
If an event A is contained within an event B, all elements of A are also elements of B, but not necessarily vice
versa. Containment is denoted in set notation as follows: A B i A i B .
Equality
If an event A is equal to an event B, all elements of A are also elements of B, and all elements of B are also
elements of A. Equality is denoted in set notation as follows: A = B A B and B A .
5
Chapter 1 Statistics 2A STA02A2
Exercise 1
1) Consider the following contingency table showing the rate of absenteeism of 400 employees by their smoking
status.
Smoking status
Smokers (S) Non-smokers (N)
Absenteeism Less than 10 days (L) 34 260
10 or more days (M) 78 28
2) A South African wine club has classified its last 200 customers’ orders according to the criteria.
Age of customer
Under 30 30 to <50 50 and over
Type of wine bought South African 99 28 16
French 3 1 18
German 15 3 9
Other 2 5 1
d) Age category “under 30” and “30 to < 50” are mutually exclusive events.
6
Chapter 1 Statistics 2A STA02A2
3. Probability Measures
This section defines basic laws of set theory, and the axioms and laws of probability. The following
properties/rules of probability measures are consequences of the axioms of probabilities.
Exercise 2
1) Prove that ( A B) C = ( A C ) ( B C ) using:
a. Venn diagrams.
( A B) C
( A C) ( B C)
7
Chapter 1 Statistics 2A STA02A2
8
Chapter 1 Statistics 2A STA02A2
2. If A , then P ( A) 0 , therefore 0 P ( A) 1
More generally, if A1 , A2 , A3 , are mutually disjoint, then P Ai = P ( Ai )
i =1 i =1
Complement rule
P ( A ) = 1 − P ( A)
Proof/Derivation:
✓ Events A and A are disjoint, and A A = , i.e., they form a partitioning of the sample space
✓ Therefore, P ( A ) + P ( A ) = 1 P ( A ) = 1 − P ( A )
Proof/Derivation:
( )
✓ Since = c P ( ) = P c = 1 − P ( ) = 1 − 1 = 0 .
9
Chapter 1 Statistics 2A STA02A2
Probabilities of subsets
If A B , then P ( A) P ( B )
Proof/Derivation:
✓ Therefore, P ( A ) = P ( B ) − P ( B A ) P ( B )
Addition law
P ( A B ) = P ( A) + P ( B ) − P ( A B )
Proof/Derivation:
✓ For two events A and B the following equations are true:
A B = A B + A B + A B A = A B + A B B = A B + A B
✓ Therefore P ( A ) +P ( B ) = P ( A B ) + P ( A B ) + P ( A B ) + P ( A B )
P ( A ) +P ( B ) = P ( A B ) + P ( A B ) + P ( A B ) + P ( A B )
P ( A ) +P ( B ) = P ( A B ) + P ( A B )
P ( A B ) = P ( A ) +P ( B ) − P ( A B )
10
Chapter 1 Statistics 2A STA02A2
De Morgan’s laws
De Morgan’s first and second law relate the intersection and union of events by complements.
✓ First law: A B = A B
✓ Second law: A B = A B
Exercise 3
1) Prove De Morgan’s first law using containment of events.
11
Chapter 1 Statistics 2A STA02A2
2) Let A and B be two evens defined on a sample space Ω such that P ( A) = 0.3 , P ( B ) = 0.5 and P ( A B ) = 0.7.
b. P ( A B )
c. P ( A B )
3) Let A and B be two evens defined on a sample space Ω such that P ( A) = 0.7 , P ( A B ) = 0.4 and
12
Chapter 1 Statistics 2A STA02A2
Conditional probability
Let A and B be two events with P ( B ) 0 . The conditional probability of A given B is defined as:
P ( A B)
P ( A | B) =
P ( B)
The idea behind this definition is that, if we know that event B occurred, the relevant sample space becomes B
rather than Ω, and the conditional probability is a probability measure on B. Note that P ( A | B ) P ( B | A) .
Exercise 4
Let P ( A) = 0.6 , P ( B ) = 0.4 and P ( A B ) = 0.3 . Find P ( A | B ) and P ( B | A) using:
2) A contingency table.
13
Chapter 1 Statistics 2A STA02A2
Multiplication law
The multiplication law is derived from the formulation of a conditional probability and expresses the probability
of the intersection of two events in terms of a marginal and a conditional probability.
P ( A B)
P ( A | B) =
P ( B)
P ( A B) = P ( A | B) P ( B) = P ( B) P ( A | B)
P ( B A)
P ( B | A) =
P ( A)
P ( A B ) = P ( B | A) P ( A) = P ( A) P ( B | A)
Exercise 5
A batch of 20 parts contains 4 defective parts. Two parts are chosen one at a time without replacement. Let D1 =
the first part is defective, and D2 = the second part is defective.
2) Calculate the probability that at least one of the two parts selected is defective.
14
Chapter 1 Statistics 2A STA02A2
It is clear that the four Bi’s form a partitioning of the sample space, and A consists of the four respective
intersections with the four Bi’s. If the marginal and conditional probabilities related to the four Bi’s are known, it
is easy to calculate the probability of event A as the sum of the four conditional probabilities of A | Bi , weighted
by the respective marginal probabilities of Bi. This property is known as the law of total probability.
event A: P ( A ) = i =1 P ( A | Bi ) P ( Bi )
n
Proof/Derivation:
P ( A) = P ( A ) = P A ( ( n
i =1
Bi )) = P ( n
i =1
A Bi )
Since the events A Bi are disjoint for all i, P ( n
i =1 )
A Bi = i =1 P ( A Bi ) = i =1 P ( A | Bi ) P ( Bi )
n n
Exercise 6
A batch of 20 parts contains 4 defective parts. Two parts are chosen one at a time without replacement. Let D1 =
the first part is defective, and D2 = the second part is defective. Calculate the probability that the second part
selected is defective.
15
Chapter 1 Statistics 2A STA02A2
Bayes’ Rule
Bayes’ Theorem was developed by Reverend Thomas Bayes in the 18th century to revise probability calculations
in light of new information and calculate posterior probabilities. This theorem is a special application of
conditional probabilities and makes use of the law of total probability. It can be seen as an “inverse” problem,
where we are given the “effect” and must calculate the “cause”.
P ( A | Bj ) P ( Bj )
P ( B j | A) =
P ( A | Bi ) P ( Bi )
n
i =1
All the marginal probabilities add up to one. For each conditioning event, all the conditional probabilities add up
to one. All the intersection probabilities add up to one.
16
Chapter 1 Statistics 2A STA02A2
Exercise 7
In a manufacturing plant two machines A and B are used to manufacture a mechanical part. Machine A is used
60% of the time, and machine B is used 40% of the time. The probability that machine A produces a defective
part is 0.1, while machine B has a 15% chance of producing a defective part.
1) Construct a tree diagram to represent the given probabilities.
2) What is the probability that a randomly selected part was manufactured by machine A and it was good?
4) A randomly selected part is tested and found to be defective. What is the probability that it was produced by
machine B?
17
Chapter 1 Statistics 2A STA02A2
Statistical independence
Two events are statistically independent if the occurrence of one does not change the probability of the next one
occurring, that is P ( A | B ) = P ( A) and P ( B | A) = P ( B ) . If events A and B are independent, then the probability
of the intersection between A and B is equal to the product of the two marginal probabilities. Therefore, if events
A and B are statistically independent it follows from the multiplication law that:
P ( A B ) = P ( A | B ) P ( B ) = P ( A) P ( B )
Exercise 8
A device consists of three independent components. The probabilities of the three components functioning
correctly are 0.96, 0.92 and 0.95, respectively. The device can only function properly if all three components
function correctly. What is the probability that the device will not function properly?
4. Counting Methods
Probabilities are easy to compute for finite sample spaces. If = 1 , 2 , N and P (i ) = pi , then the
P ( A) = P ( ) = i pi
i:i A i: i A
If Ω consists of N elements, and all elements are equally likely, then P (i ) = 1
N . Therefore, the probability of any
18
Chapter 1 Statistics 2A STA02A2
Consider a game where a fair coin is tossed twice. Therefore, the sample space is = hh, ht , th, tt . Let A denote
the event that at least one coin landed on heads. The A = hh, ht , th and P ( A) = 34 . This probability can be
directly calculated from the formula above since each outcome of the sample space is equally likely. If we count
the number of times the coin lands on heads in two tosses of the coin, then the sample space is = 0,1, 2 . In
applying the formula above, P ( A) = 23 34 , which is not the correct probability of the event. This is because the
elements in the sample space of the second experiment are not equally likely. If the number of elements of a
sample space is easy to write out and count, it is easy to calculate the probabilities of events. However, if the
sample space consists of many outcomes, we use counting methods to determine the total number of elements in
the sample space and in subsets of the sample space.
different objects in the second set, …, nm different objects in the mth set, and if the different sets are disjoint, then
the total number of ways to select an object from one of the m sets is n1 + n2 + + nm , i.e., the total number of
distinct objects.
Example
A total of 18 students are registered for Statistics and 12 students registered for Computer Science. If none of the
students are registered for both modules, then there are 18 + 12 = 30 different students. If 7 students are registered
for both modules, then the groups are no longer disjoint, but we can create disjoint sets consisting of students
registered for “Statistics only” (namely 18 – 7 = 11 students), “Computer Science only” (namely 12 – 7 = 5
students) and “both modules” (7 students). Therefore, there are 11 + 5 + 7 = 23 different students.
outcomes in the second stage, …, and n p outcomes in the pth stage. If the number of outcomes at each stage is
independent of the choices in the previous stages and if the composite outcomes are all distinct, then the
experiment has n1 n2 n p different composite outcomes.
19
Chapter 1 Statistics 2A STA02A2
Example
There are 5 different Statistics books, 6 different Mathematics books, and 8 different Computer Science books on
a bookshelf. A student selects 2 books at random from different subject areas. How many different ways are there
to select an (unordered) pair of 2 books from different subject areas? Then by the multiplication principle there
are 5 × 6 = 30 ways to select one Statistics and one Mathematics book, 5 × 8 = 40 ways to select one Statistics
and one Computer Science book, and 6 × 8 = 48 ways to select one Mathematics and one Computer Science book.
These three types of selections are distinct, and so by the addition principle there are 30 + 40 + 48 = 118 ways in
total.
Example
An 8-bit binary number consists of a sequence of 8 binary digits, i.e., 0’s and 1’s only. The total number of 8-bit
binary numbers is equal to 2 2 2 2 2 2 2 2 = 28 = 256 .
Consider a set of n distinct objects. A sample of size r is selected from the n objects where duplication is not
allowed, and the samples are ordered, i.e., order is important. The first object can be chosen in n ways, the second
object can be chosen in n – 1 ways, the third object can be chosen in n – 2 ways, …, the rth object can be selected
in n – r + 1 ways, so there are n ( n − 1)( n − 2 ) ( n − r + 1) different samples. This is known as the permutation of
n!
r objects out of n, termed “n permutation r”, and is expressed in terms of factorial notation as P ( n, r ) = .
( n − r )!
Example
The total number of different 5-letter “words” with no repeated letters that can be formed from the 26-letter
alphabet is P ( 26,5) = 7893600 .
20
Chapter 1 Statistics 2A STA02A2
Combinations
Consider a set of n distinct objects. A sample of size r is selected from the n objects where duplication is not
allowed, and the samples are not ordered, i.e., order is not important. This is referred to as a combination, which
is essentially a permutation where the duplicate subsets are removed. Since there is r! ways to order a sample of
P ( n, r ) n! n
size r, the number of unordered samples is = = C ( n, r ) = , termed “n combination r”.
r! ( n − r ) !r ! r
n
The numbers are called binomial coefficients because of their role in the binomial expansion
r
n
n k n−k n
n
( + )
= ( + ) = =
n n
a b a b . If a b 2 , this expression reduces to 2 , which is the total number of
k =0 k k =0 k
n n
Note that = , since selecting r out of n is the same as selecting the remaining n – r out of n.
r n−r
Consider an experiment where a sample of m objects is selected from n objects without replacement, and order is
not important. The n objects consist of r objects of type X and (n – r) objects of type Y. For k r , let event A be
the total number of ways to select k objects of type X and (m – k) objects of type Y. Therefore, A can occur in
r n − n
ways, and P ( A) is the ratio of the number of ways A can occur to the total number of outcomes,
k m − k
namely:
r n − r
k m−k
P ( A) =
n
m
Example
Ten equally qualified people apply to be part of a committee consisting of four people. There are 4 female and 6
10
male applicants. The total number of ways to select the committee with no restriction is = 210 . The total
4
4 6
number of ways to select the committee consisting of only one female is = 80 .
1 3
21
Chapter 1 Statistics 2A STA02A2
r n , such that there is repetition of identical objects within each class. The repetition is restricted since each
n n!
class has a set number of identical objects. The total number of ways to do this is = .
n1n2 nr n1 !n2 ! nr !
These numbers are called multinomial coefficients and occur in the expansion:
n
n n1 n2
( x1 + x2 + + xr ) =
n
x x xrnr
k = 0 n1n2 nr 1 2
r
where the sum is over all nonnegative integers n1 , n1 , , nr such that n
i =1
i = n.
Proof/Derivation:
n
✓ There are ways to choose/arrange the objects from the first class.
n1
n − n1
✓ There are ways to choose/arrange the objects from the second class.
n2
✓ Continue in this manner for all classes.
✓ Then, by the multiplication principle, the total number of choices/arrangements is:
n n n − n1 n − n1 − n2 − nr −1
=
n1n2 nr n1 n2 nr
=
n! ( n − n1 )! ( n − n1 − n2 − nr −1 ) !
( n − n1 ) !n1 ! ( n − n1 − n2 ) !n2 ! 0!nr !
n!
=
n1 !n2 ! nr !
22
Chapter 1 Statistics 2A STA02A2
Example
A committee of seven members is to be divided into three subcommittees of size three, two and two. The number
7 7!
of ways to do this is = = 210 .
3 2 2 3!2!2!
Example
How many arrangements (“words”) are possible using all the letters from MISSISSIPPI. In this 11-letter word
there is 1 M, 4 I’s, 4 S’s, and 2 P’s. Therefore, the total number of “words” that we can create with all 11 letters
11 11!
is = = 34650 .
1 4 4 2 1!4!4!2!
Example
How many ways can we select two hot dogs from three varieties of hot dog, where there are at least two of each
variety available to choose from? In this example, r = 2 and n = 3. The number of possible subs of two hot dogs
2 + 3 − 1 4
consisting of any of the three varieties is = = 6.
2 2
Example
How many ways are there to fill a box with a dozen doughnuts chosen from five different varieties such that at
least one doughnut of each variety is picked? In this example we must first select one doughnut from each variety
and then select the remaining seven doughnuts any way we want, i.e., r = 7 and n = 5. Then the total number of
7 + 5 − 1 11
ways to fill the box is = = 330 .
7 7
23
Chapter 1 Statistics 2A STA02A2
Exercise 9
1) A developer of a new subdivision offers a prospective home buyer a choice of four designs, three different
heating options, a garage or a carport, and a patio or a screened porch. How many different plans available to
the buyer?
2) How many 4-digit pin numbers are possible where duplication is allowed?
3) What is the probability that a 4-digit pin starts with the number 5, and has no duplicated numbers?
4) How many distinct words that can be form from all the letters in the word “WORD”?
5) In how many ways can you select a sample of two objects from the set {1, 2, 3, 4}?
24
Chapter 1 Statistics 2A STA02A2
6) Five accounts are randomly sampled without replacement from a box of forty accounts. The box contains thirty
accounts that have a debit balance. What is the probability of selecting at least four account with a credit
balance?
7) How many words that can be formed with the letters in the word “STATISTICIAN”?
8) How many ways are there to pick a collection of exactly ten Smarties from a jar with red, blue and purple
Smarties, such that there is at most five red Smarties in the selection? Hint: use the complement rule.
25