Lec 07
Lec 07
Dr Mohammed Hadi
m.hadi2@aston.ac.uk
Mathematics for AI
Probability - Introduction
Probability and AI
https://medium.com/swlh/the-power-of-probability-in-ai-bfe07bbea061 3
Probability vs Statistics
Probability and statistics are related areas of mathematics which concern themselves with
analysing the relative frequency of events. Still, there are fundamental differences in the
way they see the world:
Probability Statistics
Deals with predicting how likely a future Involves the analysis of the frequency of
event is to happen. past events
https://www3.cs.stonybrook.edu/~skiena/jaialai/excerpts/node12.html 4
https://www.mathsisfun.com/data/probability.html
Probability - Definition
Probabilities are given as numbers between 0 (no chance) and 1 (certainty), but
you can multiply this by 100 to get a percentage.
(1) http://www.buffalo.edu/content/dam/www/ccr/hswrkshop/ebp-workshop_stats-2017.pdf 5
Probability - Axioms
Definition: If either event A or event B can occur but never both simultaneously (there is no overlap
between them(, then they are called disjoint or mutually exclusive.
The result of this approach is matched with the result of the counting approach
8
Mathematics for AI
Probability – Conditional Probability
Probability - Conditional Probability
• Consider a situation where event 𝐵 has occurred and where we would like to
calculate the probability of another event 𝐴 while considering that 𝐵 has
occurred.
• The new probability is called the conditional probability of 𝑨 given 𝑩 and is
denoted by 𝑃(𝐴|𝐵) and calculated as:
P(A ∩ 𝐵)
𝑃 𝐴𝐵 = , 𝑖𝑓 𝑃(𝐵) ≠ 0
P(𝐵)
Q.1/ In a group of kids, if one is selected at random the probability that they like oranges is
0.6, the probability that they like oranges AND apples is 0.3. If a kid, who likes oranges, is
selected at random, what is the probability that they also likes apples?
Q.2/ A die is rolled, find the probability that an even number is obtained knowing the
number is greater than 3.
12
Mathematics for AI
Probability – Independent Events
Probability – Independent Events
0 𝐼𝑓 ℎ𝑒𝑎𝑑𝑠
𝑋 =8 𝑃(𝑋 = 1) = …
1 𝐼𝑓 𝑇𝑎𝑖𝑙𝑠
Random variables:
• Discrete Random Variables: Take on distinct or separate values.
0 𝐼𝑓 ℎ𝑒𝑎𝑑𝑠
𝑋 =8
1 𝐼𝑓 𝑇𝑎𝑖𝑙𝑠
(1) http://www.buffalo.edu/content/dam/www/ccr/hswrkshop/ebp-workshop_stats-2017.pdf 20
Probability - Random variables
21
Probability - Random variables
Example:
Spiegel, M.R., Schiller, J.J. and Srinivasan, R.A., 2013. Schaum's outline of probability and statistics. McGraw-Hill Education. 22
Mathematics for AI
Probability - Binomial Distribution
Probability - Binomial Distribution
A Binomial Distribution is a type of distribution that has two possible outcomes.
It can be thought of as simply the probability of a SUCCESS or FAILURE outcome in an
experiment that is repeated multiple times. It must also meet the following conditions:
1. The number of observations or trials is fixed. In other words, you can only figure out
the probability of something happening if you do it a certain number of times. This is common
sense—if you toss a coin once, your probability of getting a tails is 50%. If you toss a coin a
20 times, your probability of getting a tails is very, very close to 100%.
2. Each observation or trial is independent. In other words, none of your trials have an
effect on the probability of the next trial.
3. Each observation represents one of two outcomes ("success" or "failure"). E.g., Binary.
4. The probability of success (tails, heads, fail or pass) is exactly the same from one
trial to another.
http://www.stat.yale.edu/Courses/1997-98/101/binom.htm
24
Probability - Bernoulli Trial
Bernoulli trial is an experiment where there are only two outcomes: success and failure.
If each Bernoulli trial is independent, then the number of successes in Bernoulli trails has a
binomial Distribution.
The probability of success, 𝑝, remains the same every time the trial is conducted
Example 1. Consider the experiment of flipping a fair coin. The outcomes are heads “𝐻" or tails
“𝑇", each one with probability 0.5 : P(𝐻) = 0.5 and P(𝑇) = 0.5.
PS: The outcomes of a Bernoulli trial do not need to be a success and failure in lay terms. A
‘success’ is simply the outcome of interest.
𝑃 (𝑋 = 1) = 1/6
𝑃 (𝑋 = 0) = 5/6
(𝐻 , 𝑇 , 𝑇 , 𝑇 ) (𝑇 , 𝐻 , 𝑇 , 𝑇 ) (𝑇 , 𝑇 , 𝐻 , 𝑇 ) (𝑇 , 𝑇 , 𝑇 , 𝐻 )
0.6 × 0.4 × 0.4 × 0.4 = 0.6 × 0.43 0.4 × 0.6 × 0.4 × 0.4 = 0.6 × 0.43 0.4 × 0.4 × 0.6 × 0.4 = 0.6 × 0.43 0.4 × 0.4 × 0.4 × 0.6 = 0.6 × 0.43
=0.0384
&
This number is the number of combinations of 4 choose 1 : 𝐶4,1 or $
𝑃 (𝑥|𝑛, 𝑝) : probability of 𝒙 given that the number of trials is 𝑛 and the probability of success
at each trial is 𝑝. Also called the probability mass function (PMF).
The bar “|” is read: “given that”, if we know 𝑛 and 𝑝, we can calculate the probability of 𝑥.
The parameters of the distribution are 𝒏 and 𝒑.
P
Prob. n It is also called the Probability mass function , Probability
Of x # of successes given n choose
trials and a prob of success p x
distribution function (PDF)
Example 3. Consider the experiment of flipping an unfair coin 𝟒 times, where P(𝐻) = 0.6 and
P(𝑇) = 0.4.
The parameters of the distribution are 𝑛 = 4 and 𝑝 = 0.6.
What is the probability of getting H twice given that the number of trials is 4 and the probability of
success at each trial is 0.6 ?
4 &'(
4!
𝑃 2 4, 0.6 = ∗ 0.62 ∗ 1 − 0.6 = ∗ 0.6( ∗ 1 − 0.6 &'( = 0.3456
2 2! 4 − 2 ! 33
CS 4755 - Dr Roberto Puch-Solis
Probability - Binomial Distribution: Python code
n p
Expected value:
The population mean for a random variable and is therefore a measure of the centre for
the distribution of a random variable.
Expected value of a discrete random variable 𝑋 : 𝜇 = 𝐸[𝑋] = ∑𝑥∈Ω 𝑥 × 𝑃(𝑥)
𝑋
x 0 1 2 3
P(X=x) 0.1 0.2 0.4 0.3
E(X) 0 x 0.1 + 1 x 0.2 + 2 x 0.4 + 3 x 0.3 = 1.9
35
Probability - Binomial Distribution
Expected value:
The population mean for a random variable and is therefore a measure of centre for the
distribution of a random variable.
For binomial dist. If you know the #
of trials and the prob. Then E(x) is np
If 𝑋~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛, 𝑝) , 𝐸 [𝑋] = 𝑛 𝑝
𝐸 [𝑋] = 10 x 0.7 = 7 36
Probability - Binomial Distribution
𝜇 = 𝐸[𝑋] = 1.9 x 0 1 2 3
P(X=x) 0.1 0.2 0.4 0.3
𝑉[𝑋] = 0.89
V(X) (0-1.9) 2 x 0.1 + (1-1.9) 2 x 0.2 + (2-1.9) 2 x
0.4 + (3-1.9) 2 x 0.3 = 0.89
37
Cumulative Distribution Function
( $
𝑃 𝑋 = 1 = = ; (prob of seeing heads once)((2 choose 1) * (1/2)^1 * (1/2)^1 )
& (
𝑃(𝑋 = 2) = 1/4 (prob of seeing heads twice) ((2 choose 2) * (1/2)^2 * (1/2)^0 = 1/4)
𝑥 0 1 2
𝑃(𝑋 = 𝑥) 1/4 1/2 1/4
𝐹(𝑋) 1/4 3/4 1
Python Example:
https://www.simplilearn.com/tutorials/statistics-tutorial/cumulative-distribution-function
38
Cumulative Distribution Function
Example: Let’s toss a coin twice. Let X be the number of observed heads. Find the CDF of X.
𝑋~Binomial (𝑛 = 2, 𝑝 = 1/2), range of 𝑋 = {0,1,2}
1 2 1 1
𝑃 𝑋=0 = ; 𝑃 𝑋 = 1 = = ; 𝑃(𝑋 = 2) =
4 4 2 4
s𝑭𝑿(𝑥) = 𝑷(𝑿 ≤ 𝑥)
For 𝑥 < 0 𝑷(𝑿 < 0) Does not exist 0
For 0 ≤ 𝑥 < 1 𝑷(𝑿 ≤ 𝟎) 𝑷(𝑿 = 𝟎) 1/4
𝑥 0 1 2
𝑃(𝑋 = 𝑥) 1/4 1/2 1/4
𝐹(𝑋) 1/4 3/4 1
39
Probability - Binomial Distribution: Example
Yes, because
100
𝑃(𝑥 = 90 |𝑛 = 100, 𝑝 = 0.8) = 0.890 (1 − 0.8)$11'21= 0.003
90
You can advice the company to improve the quality of their production line so that 95% of
the components pass the quality control phase.
45
Additional Resources:
46
Tutorial Questions
Tutorial Questions 1-3
48
Tutorial#1 Solution
49
Tutorial#1 Solution
50
Tutorial Questions 2-3
51
Tutorial#2 Solution
52
Tutorial Questions 3-3
53
Tutorial#3 Solution
54
End of Lecture 7