01-bayes-answers-hidden-lecture
01-bayes-answers-hidden-lecture
Conditional probability
Bayes’ Theorem
Independence
Rough syllabus:
Introduction to probability: 1 lecture
Discrete and continuous random variables: 6 lectures
Moments and limit theorems: 3 lectures
Applications/statistics: 2 lectures
Recommended reading:
Ross, S.M. (2014). A First course in probability. Pearson (9th ed.).
Dekking, F.M., et. al. (2005) A modern introduction to probability and
statistics. Springer.
Bertsekas, D.P. & Tsitsiklis, J.N. (2008). Introduction to probability. Athena
Scientific.
Grimmett, G. & Welsh, D. (2014). Probability: an Introduction. Oxford
University Press (2nd ed.).
Machine learning: use probability to compute predictions about and from data.
Computer Science
Probability
...
Ranking Websites
...
Ranking Websites
...
Data Mining
Ranking Websites
...
Data Mining
...
Data Mining
...
Data Mining
...
3
10
Intro to Probability Logistics, motivation, background 6
Applications of probability
...
3
10
Intro to Probability Logistics, motivation, background 6
Applications of probability
Finance
Computer Science Mathematics
...
3
10
Intro to Probability Logistics, motivation, background 6
Applications of probability
Finance Medicine
Computer Science Mathematics
...
3
10
Intro to Probability Logistics, motivation, background 6
Prerequisite background
Set theory
Counting: product rule, sum rule, inclusion-exclusion
Combinatorics: permutations
Probability space: sample space, event space
Axioms
Union bound
Conditional probability
Bayes’ Theorem
Independence
Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]
Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]
Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]
Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]
Conditional probability
Consider an experiment with sample space S, and two events E and F .
Then, the (conditional) probability of event E given F has occurred
(denoted P [ E∣F ]) with P [ F ] > 0 is defined by
P [ E ∩ F ] P [ EF ]
P [ E∣F ] = =
P[F ] P[F ]
Example
Two dice are rolled yielding value D1 and D2 . Let E be event that
D1 + D2 = 4.
1. What is P [ E ]?
2. Let event F be D1 = 2. What is P [ E∣F ]?
Answer
Chain rule
Rearranging the definition of conditional probability gives us:
P [ EF ] = P [ E∣F ] P [ F ]
Chain rule
Rearranging the definition of conditional probability gives us:
P [ EF ] = P [ E∣F ] P [ F ]
Multiplication rule
Conditional probability
Bayes’ Theorem
Independence
Intuition:
c
Want to know probability of E. There are two scenarios, F and F . If we
know these and the probability of E conditioned on each scenario, we can
compute the probability of E.
Let event E = "dead bulb is picked", and F1 = "bulb is picked from first
box", F2 = "bulb is picked from second box" and F3 = "bulb is picked
from third box". We know:
Lightbulb example
Example
There are 3 boxes each containing a different number of light bulbs.
The first box has 10 bulbs of which 4 are dead, the second has 6 bulbs
of which 1 is dead, and the third box has 8 bulbs of which 3 are dead.
What is the probability of a dead bulb being selected when a bulb is
chosen at random from one of the 3 boxes (each box has equal chance
of being picked)?
Answer
Let event E = "dead bulb is picked", and F1 = "bulb is picked from first
box", F2 = "bulb is picked from second box" and F3 = "bulb is picked
from third box". We know:
4 1 3
P [ E∣F1 ] = , P [ E∣F2 ] = , P [ E∣F3 ] =
10 6 8
Lightbulb example
Example
There are 3 boxes each containing a different number of light bulbs.
The first box has 10 bulbs of which 4 are dead, the second has 6 bulbs
of which 1 is dead, and the third box has 8 bulbs of which 3 are dead.
What is the probability of a dead bulb being selected when a bulb is
chosen at random from one of the 3 boxes (each box has equal chance
of being picked)?
Answer
Let event E = "dead bulb is picked", and F1 = "bulb is picked from first
box", F2 = "bulb is picked from second box" and F3 = "bulb is picked
from third box". We know:
4 1 3
P [ E∣F1 ] = , P [ E∣F2 ] = , P [ E∣F3 ] =
10 6 8
We need to compute P [ E ], and we know that P [ Fi ] = 13 :
using the Law of Total Probability. Note that all events Fi must be
mutually exclusive (non-overlapping) and exhaustive (their union is the
complete sample space) .
Example
60% of all email in 2022 is spam. 20% of spam contains the word
"Dear". 1% of non-spam contains the word "Dear". What is the
probability that an email is spam given it contains the word "Dear"?
Answer
Example
Example
60% of all email in 2022 is spam. 20% of spam contains the word
"Dear". 1% of non-spam contains the word "Dear". What is the
probability that an email is spam given it contains the word "Dear"?
Answer
P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]
F : hypothesis, E: evidence
posterior
P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]
F : hypothesis, E: evidence
prior
posterior
P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]
F : hypothesis, E: evidence
P [ F ]: "prior probability" of hypothesis
prior
posterior likelihood
P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]
F : hypothesis, E: evidence
P [ F ]: "prior probability" of hypothesis
P [ E∣F ]: probability of evidence given hypothesis (likelihood)
prior
posterior likelihood
P [ E∣F ] ⋅ P[F ]
P [ F ∣E ] =
P[E ]
normalisation constant
F : hypothesis, E: evidence
P [ F ]: "prior probability" of hypothesis
P [ E∣F ]: probability of evidence given hypothesis (likelihood)
P [ E ]: calculated by making sure that probabilities of all
outcomes sum to 1 (they are "normalised")
True condition
Total population Condition positive Condition negative
c
F F
Predicted True positive False positive
Predicted
P [ E∣F ] P [ E∣F ]
c
condition
condition pos-
itive E
Predicted False negative True negative
P [ E ∣F ] P [ E ∣F ]
c c c
condition neg-
c
ative E
33% chance of having COVID-19 after testing positive may seem surprising.
33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
c
F yes disease F no disease
E test+ True positive False positive
c
P [ E∣F ] = 0.98 P [ E∣F ] = 0.01
c
E test- False negative True negative
c c c
P [ E ∣F ] = 0.02 P [ E ∣F ] = 0.99
33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
c
F yes disease F no disease
E test+ True positive False positive
c
P [ E∣F ] = 0.98 P [ E∣F ] = 0.01
c
E test- False negative True negative
c c c
P [ E ∣F ] = 0.02 P [ E ∣F ] = 0.99
But what is a chance of having COVID-19 if you test and it comes back negative?
33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
c
F yes disease F no disease
E test+ True positive False positive
c
P [ E∣F ] = 0.98 P [ E∣F ] = 0.01
c
E test- False negative True negative
c c c
P [ E ∣F ] = 0.02 P [ E ∣F ] = 0.99
But what is a chance of having COVID-19 if you test and it comes back negative?
c
c P [ E ∣F ] P [ F ]
P [ F ∣E ] = ≈ 0.0001
P [ E c ∣F ] P [ F ] + P [ E c ∣F c ] P [ F c ]
33% chance of having COVID-19 after testing positive may seem surprising.
But the space of facts is now conditioned on a positive test result (people who test
positive and have COVID-19 and people who test positive and don’t have
COVID-19).
c
F yes disease F no disease
E test+ True positive False positive
c
P [ E∣F ] = 0.98 P [ E∣F ] = 0.01
c
E test- False negative True negative
c c c
P [ E ∣F ] = 0.02 P [ E ∣F ] = 0.99
But what is a chance of having COVID-19 if you test and it comes back negative?
c
c P [ E ∣F ] P [ F ]
P [ F ∣E ] = ≈ 0.0001
P [ E c ∣F ] P [ F ] + P [ E c ∣F c ] P [ F c ]
We update our beliefs with Bayes’ theorem:
I have 0.5% chance of having COVID-19. I take the test:
Test is positive: I now have 33% chance of having COVID-19.
Test is negative: I now have 0.01% chance of having COVID-19.
So it makes sense to take the test.
Conditional probability
Bayes’ Theorem
Independence
Independence
Two events E and F are independent if and only if
P [ EF ] = P [ E ] P [ F ]
P [ Ea Eb ⋯Er ] = P [ Ea ] P [ Eb ] ⋯P [ Er ]
Independence
Two events E and F are independent if and only if
P [ EF ] = P [ E ] P [ F ]
P [ Ea Eb ⋯Er ] = P [ Ea ] P [ Eb ] ⋯P [ Er ]
P [ EFG ] =P [ E ] P [ F ] P [ G ]
P [ EF ] =P [ E ] P [ F ]
P [ EG ] =P [ E ] P [ G ]
P [ FG ] =P [ F ] P [ G ]
P [ E∣F ] = P [ E ]
Proof:
P [ E∣F ] = P [ E ]
Proof:
Independence of complement
c
If events E and F are independent, then E and F are independent:
c c
P [ EF ] = P [ E ] P [ F ]
Proof:
P [ E∣F ] = P [ E ]
Proof:
Independence of complement
c
If events E and F are independent, then E and F are independent:
c c
P [ EF ] = P [ E ] P [ F ]
Proof:
Example
Each roll of a die is an independent trial. We have two rolls of D1 and
D2 . Let event E ∶ D1 = 1, F ∶ D2 = 6 and event G ∶ D1 + D2 = 7 (thus
G = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}).
1. Are E and F independent?
2. Are E and G independent?
3. Are E, F , G independent?
Answer
Conditional independence
Two events E and F are called conditionally independent given a third
event G if
P [ EF ∣G ] = P [ E∣G ] P [ F ∣G ]
Or equivalently,
P [ E∣FG ] = P [ E∣G ]
Conditional independence
Two events E and F are called conditionally independent given a third
event G if
P [ EF ∣G ] = P [ E∣G ] P [ F ∣G ]
Or equivalently,
P [ E∣FG ] = P [ E∣G ]
Notice that:
Dependent events can become conditionally independent.
Independent events can become conditionally dependent.
Knowing when conditioning breaks or creates independence is a big part
of building complex probabilistic models.
Example
Each roll of a die is an independent trial. We have two rolls of D1 and
D2 . Let event E ∶ D1 = 1, F ∶ D2 = 6 and event G ∶ D1 + D2 = 7 (thus
G = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}).
1. Are E and F independent?
2. Are E and F independent given G?
Answer
Conditioning on event G: