0% found this document useful (0 votes)
20 views65 pages

01-bayes-answers-hidden-lecture

The document outlines a lecture series on probability, focusing on conditional probabilities and Bayes' theorem, presented by Mateja Jamnik and Thomas Sauerwald at the University of Cambridge. It includes a syllabus detailing topics such as random variables and applications in machine learning, emphasizing the importance of probability in various fields. Recommended readings are provided to support the course material.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views65 pages

01-bayes-answers-hidden-lecture

The document outlines a lecture series on probability, focusing on conditional probabilities and Bayes' theorem, presented by Mateja Jamnik and Thomas Sauerwald at the University of Cambridge. It includes a syllabus detailing topics such as random variables and applications in machine learning, emphasizing the importance of probability in various fields. Recommended readings are provided to support the course material.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Introduction to Probability

Lecture 1: Conditional probabilities and Bayes’ theorem


Mateja Jamnik, Thomas Sauerwald

University of Cambridge, Department of Computer Science and Technology


email: {mateja.jamnik,thomas.sauerwald}@cl.cam.ac.uk
Outline

Logistics, motivation, background

Conditional probability

Bayes’ Theorem

Independence

Intro to Probability Logistics, motivation, background 2


Lecturers

Mateja Jamnik Thomas Sauerwald

Intro to Probability Logistics, motivation, background 3


Course logistics

Rough syllabus:
Introduction to probability: 1 lecture
Discrete and continuous random variables: 6 lectures
Moments and limit theorems: 3 lectures
Applications/statistics: 2 lectures

Recommended reading:
Ross, S.M. (2014). A First course in probability. Pearson (9th ed.).
Dekking, F.M., et. al. (2005) A modern introduction to probability and
statistics. Springer.
Bertsekas, D.P. & Tsitsiklis, J.N. (2008). Introduction to probability. Athena
Scientific.
Grimmett, G. & Welsh, D. (2014). Probability: an Introduction. Oxford
University Press (2nd ed.).

Intro to Probability Logistics, motivation, background 4


Why probability?

Gives us mathematical tools to deal with uncertain events.

It is used everywhere, especially in applications of machine learning.

Machine learning: use probability to compute predictions about and from data.

Probability is not statistics:


Both about random processes.
Probability: logically self-contained, few rules for computing, one correct answer.
Statistics: messier, more art, get experimental data and try to draw
probabilistic conclusions, no single correct answer.

Intro to Probability Logistics, motivation, background 5


Applications of probability

Computer Science

Probability

Intro to Probability Logistics, motivation, background 6


Applications of probability

Computer Science Mathematics

Biology Probability Physics

...

Intro to Probability Logistics, motivation, background 6


Applications of probability

Ranking Websites

Computer Science Mathematics

Biology Probability Physics

...

Intro to Probability Logistics, motivation, background 6


Applications of probability

Ranking Websites

Computer Science Mathematics

Biology Probability Physics

...

Data Mining

Intro to Probability Logistics, motivation, background 6


Applications of probability

Ranking Websites

Computer Science Mathematics

Biology Probability Physics

...

Data Mining

Intro to Probability Logistics, motivation, background 6


Applications of probability

Ranking Websites Matching


⎛ 0 0 0 0 0 1 1 0 0 1 ⎞

⎜ ⎟


⎜ 0 0 0 0 0 0 1 1 0 1 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 1 1 0 1 0 ⎟


⎜ ⎟


⎜ ⎟



0 0 0 0 0 1 0 1 1 0 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 0 0 1 1 1 ⎟

A= ⎜
⎜ ⎟


⎜ 1 0 1 1 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟


⎜ 1 1 1 0 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ 0 1 0 1 1 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟

⎜ 0 0 1 1 1 0 0 0 0 0 ⎟
⎝ 1 1 0 0 1 0 0 0 0 0 ⎠

Computer Science Mathematics

Biology Probability Physics

...

Data Mining

Intro to Probability Logistics, motivation, background 6


Applications of probability

Ranking Websites Matching


⎛ 0 0 0 0 0 1 1 0 0 1 ⎞

⎜ ⎟


⎜ 0 0 0 0 0 0 1 1 0 1 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 1 1 0 1 0 ⎟


⎜ ⎟


⎜ ⎟



0 0 0 0 0 1 0 1 1 0 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 0 0 1 1 1 ⎟

A= ⎜
⎜ ⎟


⎜ 1 0 1 1 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟


⎜ 1 1 1 0 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ 0 1 0 1 1 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟

⎜ 0 0 1 1 1 0 0 0 0 0 ⎟
⎝ 1 1 0 0 1 0 0 0 0 0 ⎠

Computer Science Mathematics

Biology Probability Physics

...

Data Mining

Intro to Probability Logistics, motivation, background 6


Applications of probability

Ranking Websites Matching


⎛ 0 0 0 0 0 1 1 0 0 1 ⎞

⎜ ⎟


⎜ 0 0 0 0 0 0 1 1 0 1 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 1 1 0 1 0 ⎟


⎜ ⎟


⎜ ⎟



0 0 0 0 0 1 0 1 1 0 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 0 0 1 1 1 ⎟

A= ⎜
⎜ ⎟


⎜ 1 0 1 1 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟


⎜ 1 1 1 0 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ 0 1 0 1 1 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟

⎜ 0 0 1 1 1 0 0 0 0 0 ⎟
⎝ 1 1 0 0 1 0 0 0 0 0 ⎠

Computer Science Mathematics

Biology Probability Physics

...

Data Mining Particle Processes


7
10

3
10
Intro to Probability Logistics, motivation, background 6
Applications of probability

Ranking Websites Matching


⎛ 0 0 0 0 0 1 1 0 0 1 ⎞

⎜ ⎟


⎜ 0 0 0 0 0 0 1 1 0 1 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 1 1 0 1 0 ⎟


⎜ ⎟


⎜ ⎟



0 0 0 0 0 1 0 1 1 0 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 0 0 1 1 1 ⎟

A= ⎜
⎜ ⎟


⎜ 1 0 1 1 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟


⎜ 1 1 1 0 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ 0 1 0 1 1 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟

⎜ 0 0 1 1 1 0 0 0 0 0 ⎟
⎝ 1 1 0 0 1 0 0 0 0 0 ⎠

Computer Science Mathematics

Biology Probability Physics

...

Data Mining Deep Learning Particle Processes


7
10

3
10
Intro to Probability Logistics, motivation, background 6
Applications of probability

Ranking Websites Matching


⎛ 0 0 0 0 0 1 1 0 0 1 ⎞

⎜ ⎟


⎜ 0 0 0 0 0 0 1 1 0 1 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 1 1 0 1 0 ⎟


⎜ ⎟


⎜ ⎟



0 0 0 0 0 1 0 1 1 0 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 0 0 1 1 1 ⎟

A= ⎜
⎜ ⎟


⎜ 1 0 1 1 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟


⎜ 1 1 1 0 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ 0 1 0 1 1 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟

⎜ 0 0 1 1 1 0 0 0 0 0 ⎟
⎝ 1 1 0 0 1 0 0 0 0 0 ⎠

Finance
Computer Science Mathematics

Biology Probability Physics

...

Data Mining Deep Learning Particle Processes


7
10

3
10
Intro to Probability Logistics, motivation, background 6
Applications of probability

Ranking Websites Matching


⎛ 0 0 0 0 0 1 1 0 0 1 ⎞

⎜ ⎟


⎜ 0 0 0 0 0 0 1 1 0 1 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 1 1 0 1 0 ⎟


⎜ ⎟


⎜ ⎟



0 0 0 0 0 1 0 1 1 0 ⎟


⎜ ⎟


⎜ 0 0 0 0 0 0 0 1 1 1 ⎟

A= ⎜
⎜ ⎟


⎜ 1 0 1 1 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟


⎜ 1 1 1 0 0 0 0 0 0 0 ⎟


⎜ ⎟


⎜ 0 1 0 1 1 0 0 0 0 0 ⎟


⎜ ⎟


⎜ ⎟

⎜ 0 0 1 1 1 0 0 0 0 0 ⎟
⎝ 1 1 0 0 1 0 0 0 0 0 ⎠

Finance Medicine
Computer Science Mathematics

Biology Probability Physics

...

Data Mining Deep Learning Particle Processes


7
10

3
10
Intro to Probability Logistics, motivation, background 6
Prerequisite background

Set theory
Counting: product rule, sum rule, inclusion-exclusion
Combinatorics: permutations
Probability space: sample space, event space
Axioms
Union bound

Look for revision material of above on the course website:


https://www.cl.cam.ac.uk/teaching/2324/IntroProb/

Intro to Probability Logistics, motivation, background 7


Outline

Logistics, motivation, background

Conditional probability

Bayes’ Theorem

Independence

Intro to Probability Conditional probability 8


Definition

Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]

Intro to Probability Conditional probability 9


Definition

Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]

Sample space: all possible outcomes consistent with F (i.e., S ∩ F = F )

Intro to Probability Conditional probability 9


Definition

Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]

Sample space: all possible outcomes consistent with F (i.e., S ∩ F = F )


Event space: all outcomes in E consistent with F (i.e., E ∩ F )

Intro to Probability Conditional probability 9


Definition

Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]

Sample space: all possible outcomes consistent with F (i.e., S ∩ F = F )


Event space: all outcomes in E consistent with F (i.e., E ∩ F )
Note: we assume that all outcomes are equally likely

Intro to Probability Conditional probability 9


Definition

Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]

Sample space: all possible outcomes consistent with F (i.e., S ∩ F = F )


Event space: all outcomes in E consistent with F (i.e., E ∩ F )
Note: we assume that all outcomes are equally likely
# outcomes in E∩F
# outcomes in E ∩ F # outcomes in S P[E ∩ F ]
P [ E∣F ] = = =
# outcomes in F # outcomes in F P[F ]
# outcomes in S

Intro to Probability Conditional probability 9


Example

Example
Two dice are rolled yielding value D1 and D2 . Let E be event that
D1 + D2 = 4.
1. What is P [ E ]?
2. Let event F be D1 = 2. What is P [ E∣F ]?
Answer

Intro to Probability Conditional probability 10


Rules revisited

Chain rule
Rearranging the definition of conditional probability gives us:

P [ EF ] = P [ E∣F ] P [ F ]

Intro to Probability Conditional probability 11


Rules revisited

Chain rule
Rearranging the definition of conditional probability gives us:

P [ EF ] = P [ E∣F ] P [ F ]

Generalisation of the Chain rule:

Multiplication rule

P [ E1 E2 ⋯En ] = P [ E1 ] P [ E2 ∣E1 ] P [ E3 ∣E2 E1 ] ⋯P [ En ∣E1 ⋯En−1 ]

Intro to Probability Conditional probability 11


Example
Example
An ordinary deck of 52 playing cards is randomly divided into 4 piles of
13 cards each. What is the probability that each pile has exactly 1 ace?
Answer
Example
Example
An ordinary deck of 52 playing cards is randomly divided into 4 piles of
13 cards each. What is the probability that each pile has exactly 1 ace?
Answer
Define:
E1 = ace♥ is in any one pile
E2 = ace♥ and ace♠ are in different piles
E3 = ace♥, ace♠ and ace♣ are in different piles
E4 = all aces are in different piles
Example
Example
An ordinary deck of 52 playing cards is randomly divided into 4 piles of
13 cards each. What is the probability that each pile has exactly 1 ace?
Answer
Define:
E1 = ace♥ is in any one pile
E2 = ace♥ and ace♠ are in different piles
E3 = ace♥, ace♠ and ace♣ are in different piles
E4 = all aces are in different piles
P [ E1 E2 E3 E4 ] = P [ E1 ] P [ E2 ∣E1 ] P [ E3 ∣E1 E2 ] P [ E4 ∣E1 E2 E3 ]
Example
Example
An ordinary deck of 52 playing cards is randomly divided into 4 piles of
13 cards each. What is the probability that each pile has exactly 1 ace?
Answer
Define:
E1 = ace♥ is in any one pile
E2 = ace♥ and ace♠ are in different piles
E3 = ace♥, ace♠ and ace♣ are in different piles
E4 = all aces are in different piles
P [ E1 E2 E3 E4 ] = P [ E1 ] P [ E2 ∣E1 ] P [ E3 ∣E1 E2 ] P [ E4 ∣E1 E2 E3 ]
We have P [ E1 ] = 1. For rest we consider complement of next ace
being in the same pile and thus have:

Intro to Probability Conditional probability 12


Outline

Logistics, motivation, background

Conditional probability

Bayes’ Theorem

Independence

Intro to Probability Bayes’ Theorem 13


Law of total probability

The law of total probability (a.k.a. Partition theorem)


For events E and F where P [ F ] > 0, then for any event E
c c c
P [ E ] = P [ EF ] + P [ EF ] = P [ E∣F ] P [ F ] + P [ E∣F ] P [ F ]

In general, for disjoint events F1 , F2 , . . . Fn s.t. F1 ∪ F2 ∪ ⋯ ∪ Fn = S,


n
P [ E ] = ∑ P [ E∣Fi ] P [ Fi ]
i=1

Intuition:
c
Want to know probability of E. There are two scenarios, F and F . If we
know these and the probability of E conditioned on each scenario, we can
compute the probability of E.

Intro to Probability Bayes’ Theorem 14


Lightbulb example
Example
There are 3 boxes each containing a different number of light bulbs.
The first box has 10 bulbs of which 4 are dead, the second has 6 bulbs
of which 1 is dead, and the third box has 8 bulbs of which 3 are dead.
What is the probability of a dead bulb being selected when a bulb is
chosen at random from one of the 3 boxes (each box has equal chance
of being picked)?
Answer
Lightbulb example
Example
There are 3 boxes each containing a different number of light bulbs.
The first box has 10 bulbs of which 4 are dead, the second has 6 bulbs
of which 1 is dead, and the third box has 8 bulbs of which 3 are dead.
What is the probability of a dead bulb being selected when a bulb is
chosen at random from one of the 3 boxes (each box has equal chance
of being picked)?
Answer

Let event E = "dead bulb is picked", and F1 = "bulb is picked from first
box", F2 = "bulb is picked from second box" and F3 = "bulb is picked
from third box". We know:
Lightbulb example
Example
There are 3 boxes each containing a different number of light bulbs.
The first box has 10 bulbs of which 4 are dead, the second has 6 bulbs
of which 1 is dead, and the third box has 8 bulbs of which 3 are dead.
What is the probability of a dead bulb being selected when a bulb is
chosen at random from one of the 3 boxes (each box has equal chance
of being picked)?
Answer

Let event E = "dead bulb is picked", and F1 = "bulb is picked from first
box", F2 = "bulb is picked from second box" and F3 = "bulb is picked
from third box". We know:
4 1 3
P [ E∣F1 ] = , P [ E∣F2 ] = , P [ E∣F3 ] =
10 6 8
Lightbulb example
Example
There are 3 boxes each containing a different number of light bulbs.
The first box has 10 bulbs of which 4 are dead, the second has 6 bulbs
of which 1 is dead, and the third box has 8 bulbs of which 3 are dead.
What is the probability of a dead bulb being selected when a bulb is
chosen at random from one of the 3 boxes (each box has equal chance
of being picked)?
Answer

Let event E = "dead bulb is picked", and F1 = "bulb is picked from first
box", F2 = "bulb is picked from second box" and F3 = "bulb is picked
from third box". We know:
4 1 3
P [ E∣F1 ] = , P [ E∣F2 ] = , P [ E∣F3 ] =
10 6 8
We need to compute P [ E ], and we know that P [ Fi ] = 13 :

Intro to Probability Bayes’ Theorem 15


Bayes’ theorem
How many spam emails contain the word "Dear"?
P [ E∣F ] = P [ "Dear"∣spam ]
But how about what is the probability that an email containing "Dear" is
spam?
P [ F ∣E ] = P [ spam∣"Dear" ]

Intro to Probability Bayes’ Theorem 16


Bayes’ theorem
How many spam emails contain the word "Dear"?
P [ E∣F ] = P [ "Dear"∣spam ]
But how about what is the probability that an email containing "Dear" is
spam?
P [ F ∣E ] = P [ spam∣"Dear" ]
Bayes’ theorem
For any events E and F where P [ E ] > 0 and P [ F ] > 0,
P [ E∣F ] P [ F ]
P [ F ∣E ] =
P[E ]
and in expanded form,
P [ E∣F ] P [ F ] P [ E∣F ] P [ F ]
P [ F ∣E ] = =
P [ E∣F ] P [ F ] + P [ E∣F c ] P [ F c ] ∑ni=1 P [ E∣Fi ] P [ Fi ]

using the Law of Total Probability. Note that all events Fi must be
mutually exclusive (non-overlapping) and exhaustive (their union is the
complete sample space) .

Intro to Probability Bayes’ Theorem 16


Example

Example
60% of all email in 2022 is spam. 20% of spam contains the word
"Dear". 1% of non-spam contains the word "Dear". What is the
probability that an email is spam given it contains the word "Dear"?
Answer
Example

Example
60% of all email in 2022 is spam. 20% of spam contains the word
"Dear". 1% of non-spam contains the word "Dear". What is the
probability that an email is spam given it contains the word "Dear"?
Answer

Let event E ="Dear", event F = spam.

Intro to Probability Bayes’ Theorem 17


Bayes’ terminology

P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]

F : hypothesis, E: evidence

Intro to Probability Bayes’ Theorem 18


Bayes’ terminology

posterior
P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]

F : hypothesis, E: evidence

Intro to Probability Bayes’ Theorem 18


Bayes’ terminology

prior
posterior
P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]

F : hypothesis, E: evidence
P [ F ]: "prior probability" of hypothesis

Intro to Probability Bayes’ Theorem 18


Bayes’ terminology

prior
posterior likelihood

P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]

F : hypothesis, E: evidence
P [ F ]: "prior probability" of hypothesis
P [ E∣F ]: probability of evidence given hypothesis (likelihood)

Intro to Probability Bayes’ Theorem 18


Bayes’ terminology

prior
posterior likelihood

P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]

normalisation constant

F : hypothesis, E: evidence
P [ F ]: "prior probability" of hypothesis
P [ E∣F ]: probability of evidence given hypothesis (likelihood)
P [ E ]: calculated by making sure that probabilities of all
outcomes sum to 1 (they are "normalised")

Intro to Probability Bayes’ Theorem 18


Confusion matrix (error matrix)

Used in classification tasks for predicting output error.

True condition
Total population Condition positive Condition negative
c
F F
Predicted True positive False positive
Predicted

P [ E∣F ] P [ E∣F ]
c
condition

condition pos-
itive E
Predicted False negative True negative
P [ E ∣F ] P [ E ∣F ]
c c c
condition neg-
c
ative E

Intro to Probability Bayes’ Theorem 19


Medical testing example
Example
A test is 98% effective at detecting the disease COVID-19 ("true
positive").
The test has a "false positive" rate of 1%.
0.5% of the population has COVID-19.
What is the likelihood you have COVID-19 if you test positive?
Answer
Medical testing example
Example
A test is 98% effective at detecting the disease COVID-19 ("true
positive").
The test has a "false positive" rate of 1%.
0.5% of the population has COVID-19.
What is the likelihood you have COVID-19 if you test positive?
Answer

Let E: test positive, F : actually have COVID-19.


Need to find P [ F ∣E ].

Intro to Probability Bayes’ Theorem 20


Bayesian intuition

33% chance of having COVID-19 after testing positive may seem surprising.

Intro to Probability Bayes’ Theorem 21


Bayesian intuition

33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).

Intro to Probability Bayes’ Theorem 21


Bayesian intuition

33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
c
F yes disease F no disease
E test+ True positive False positive
c
P [ E∣F ] = 0.98 P [ E∣F ] = 0.01
c
E test- False negative True negative
c c c
P [ E ∣F ] = 0.02 P [ E ∣F ] = 0.99

Intro to Probability Bayes’ Theorem 21


Bayesian intuition

33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
c
F yes disease F no disease
E test+ True positive False positive
c
P [ E∣F ] = 0.98 P [ E∣F ] = 0.01
c
E test- False negative True negative
c c c
P [ E ∣F ] = 0.02 P [ E ∣F ] = 0.99
But what is a chance of having COVID-19 if you test and it comes back negative?

Intro to Probability Bayes’ Theorem 21


Bayesian intuition

33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
c
F yes disease F no disease
E test+ True positive False positive
c
P [ E∣F ] = 0.98 P [ E∣F ] = 0.01
c
E test- False negative True negative
c c c
P [ E ∣F ] = 0.02 P [ E ∣F ] = 0.99
But what is a chance of having COVID-19 if you test and it comes back negative?
c
c P [ E ∣F ] P [ F ]
P [ F ∣E ] = ≈ 0.0001
P [ E c ∣F ] P [ F ] + P [ E c ∣F c ] P [ F c ]

Intro to Probability Bayes’ Theorem 21


Bayesian intuition

33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
c
F yes disease F no disease
E test+ True positive False positive
c
P [ E∣F ] = 0.98 P [ E∣F ] = 0.01
c
E test- False negative True negative
c c c
P [ E ∣F ] = 0.02 P [ E ∣F ] = 0.99
But what is a chance of having COVID-19 if you test and it comes back negative?
c
c P [ E ∣F ] P [ F ]
P [ F ∣E ] = ≈ 0.0001
P [ E c ∣F ] P [ F ] + P [ E c ∣F c ] P [ F c ]
We update our beliefs with Bayes’ theorem:
I have 0.5% chance of having COVID-19. I take the test:
Test is positive: I now have 33% chance of having COVID-19.
Test is negative: I now have 0.01% chance of having COVID-19.
So it makes sense to take the test.

Intro to Probability Bayes’ Theorem 21


Outline

Logistics, motivation, background

Conditional probability

Bayes’ Theorem

Independence

Intro to Probability Independence 22


Independent events

Independence
Two events E and F are independent if and only if

P [ EF ] = P [ E ] P [ F ]

Otherwise, they are called dependent events.


In general, n events E1 , E2 , . . . , En are mutually independent if for every
subset of these events with r elements (where r ≤ n) it holds that

P [ Ea Eb ⋯Er ] = P [ Ea ] P [ Eb ] ⋯P [ Er ]

Intro to Probability Independence 23


Independent events

Independence
Two events E and F are independent if and only if

P [ EF ] = P [ E ] P [ F ]

Otherwise, they are called dependent events.


In general, n events E1 , E2 , . . . , En are mutually independent if for every
subset of these events with r elements (where r ≤ n) it holds that

P [ Ea Eb ⋯Er ] = P [ Ea ] P [ Eb ] ⋯P [ Er ]

Therefore for 3 events E, F , G to be independent, we must have

P [ EFG ] =P [ E ] P [ F ] P [ G ]
P [ EF ] =P [ E ] P [ F ]
P [ EG ] =P [ E ] P [ G ]
P [ FG ] =P [ F ] P [ G ]

Intro to Probability Independence 23


Independence of complement

Notice an equivalent definition for independent events E and F (P [ F ] > 0)

P [ E∣F ] = P [ E ]

Proof:

Intro to Probability Independence 24


Independence of complement

Notice an equivalent definition for independent events E and F (P [ F ] > 0)

P [ E∣F ] = P [ E ]

Proof:

Independence of complement
c
If events E and F are independent, then E and F are independent:
c c
P [ EF ] = P [ E ] P [ F ]

Proof:

Intro to Probability Independence 24


Independence of complement

Notice an equivalent definition for independent events E and F (P [ F ] > 0)

P [ E∣F ] = P [ E ]

Proof:

Independence of complement
c
If events E and F are independent, then E and F are independent:
c c
P [ EF ] = P [ E ] P [ F ]

Proof:

Intro to Probability Independence 24


Example

Example
Each roll of a die is an independent trial. We have two rolls of D1 and
D2 . Let event E ∶ D1 = 1, F ∶ D2 = 6 and event G ∶ D1 + D2 = 7 (thus
G = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}).
1. Are E and F independent?
2. Are E and G independent?
3. Are E, F , G independent?
Answer

Intro to Probability Independence 25


Conditional independence

Conditional independence
Two events E and F are called conditionally independent given a third
event G if
P [ EF ∣G ] = P [ E∣G ] P [ F ∣G ]
Or equivalently,
P [ E∣FG ] = P [ E∣G ]

Intro to Probability Independence 26


Conditional independence

Conditional independence
Two events E and F are called conditionally independent given a third
event G if
P [ EF ∣G ] = P [ E∣G ] P [ F ∣G ]
Or equivalently,
P [ E∣FG ] = P [ E∣G ]

Notice that:
Dependent events can become conditionally independent.
Independent events can become conditionally dependent.
Knowing when conditioning breaks or creates independence is a big part
of building complex probabilistic models.

Intro to Probability Independence 26


Example revisited

Example
Each roll of a die is an independent trial. We have two rolls of D1 and
D2 . Let event E ∶ D1 = 1, F ∶ D2 = 6 and event G ∶ D1 + D2 = 7 (thus
G = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}).
1. Are E and F independent?
2. Are E and F independent given G?
Answer

Intro to Probability Independence 27


Summary of conditional probability

Conditioning on event G:

Name of rule Original rule Conditional rule

1st axiom of probability 0 ≤ P[E ] ≤ 1 0 ≤ P [ E∣G ] ≤ 1


c c
Complement P[E ] = 1 − P[E ] P [ E∣G ] = 1 − P [ E ∣G ]

Chain rule P [ EF ] = P [ E∣F ] P [ F ] P [ EF ∣G ] = P [ E∣FG ] P [ F ∣G ]


P [ E∣F ] P [ F ] P [ E∣FG ] P [ F ∣G ]
Bayes’ theorem P [ F ∣E ] = P [ F ∣EG ] =
P[E ] P [ E∣G ]

Intro to Probability Independence 28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy