0% found this document useful (0 votes)
36 views51 pages

Lec 07

This document discusses probability and statistics concepts relevant to artificial intelligence. It begins with an introduction to probability, including definitions of probability, axioms of probability, and conditional probability. It then discusses random variables and how they are used to model random processes. Specifically, it covers discrete and continuous random variables. Finally, it discusses the binomial distribution, which models binary outcome experiments repeated multiple times, where each trial has a fixed probability of success.

Uploaded by

Tùng Đào
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views51 pages

Lec 07

This document discusses probability and statistics concepts relevant to artificial intelligence. It begins with an introduction to probability, including definitions of probability, axioms of probability, and conditional probability. It then discusses random variables and how they are used to model random processes. Specifically, it covers discrete and continuous random variables. Finally, it discusses the binomial distribution, which models binary outcome experiments repeated multiple times, where each trial has a fixed probability of success.

Uploaded by

Tùng Đào
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Mathematics for AI

Probability and Statistics

Dr Mohammed Hadi
m.hadi2@aston.ac.uk
Mathematics for AI
Probability - Introduction
Probability and AI

https://medium.com/swlh/the-power-of-probability-in-ai-bfe07bbea061 3
Probability vs Statistics
Probability and statistics are related areas of mathematics which concern themselves with
analysing the relative frequency of events. Still, there are fundamental differences in the
way they see the world:

Probability Statistics

Deals with predicting how likely a future Involves the analysis of the frequency of
event is to happen. past events

A theoretical branch of mathematics, An applied branch of mathematics, which


which studies the consequences of tries to make sense of observations in the
mathematical definitions real world.

https://www3.cs.stonybrook.edu/~skiena/jaialai/excerpts/node12.html 4
https://www.mathsisfun.com/data/probability.html
Probability - Definition

The probability P of an event A is the fraction of possible favourable outcomes f divided by


the total number of possible outcomes n (1).
𝐴 à getting 5
𝑷(𝑨) = 𝒇 /𝒏 𝑓 à getting 5 (no. of
fav. outcomes =1)
𝑛 à total possibilities

𝑷(𝑨 ) = 𝟏/𝟔 = 𝟎. 𝟏𝟔𝟔𝟔𝟕 = 𝟏𝟔. 𝟔𝟕%

Probabilities are given as numbers between 0 (no chance) and 1 (certainty), but
you can multiply this by 100 to get a percentage.
(1) http://www.buffalo.edu/content/dam/www/ccr/hswrkshop/ebp-workshop_stats-2017.pdf 5
Probability - Axioms

i.e., all possible


outcomes
• Axiom 1: For any event 𝐴, 𝟎 ≤ 𝑷(𝑨) ≤ 𝟏
• Axiom 2: Probability of the sample space 𝑃(Ω) = 1. We can not
roll a dice
• Axiom 3: if 𝐴1, 𝐴2, 𝐴3, … are disjoint event (i.e., there is no overlap between them), then twice at the
$ same time,
so A1 and
𝑃 𝐴1 ∪ 𝐴2 ∪ 𝐴3 … . = 𝑃 𝐴1 ∪ 𝑃 𝐴2 ∪ 𝑃 𝐴3 + . . = 4 𝑃(𝐴𝑖) then A2
!"#

Definition: If either event A or event B can occur but never both simultaneously (there is no overlap
between them(, then they are called disjoint or mutually exclusive.

If 𝐴 and 𝐵 are not disjoint, then

Throwing two 𝑃 𝐴 ∪ B = P A + P B − P(A ∩ 𝐵)


dice at once
6
Probability - Axioms
• Example: Consider the experiment of tossing one fair die.
The sample space is Ω = {1, 2, 3, … , 6}
$
There are 6 equiprobable outcomes, i.e., P 𝑖 = %
, 𝑖 ∈ {1,2,3,4,5,6}

The result of this approach is matched with the result of the counting approach

CS 4755 - Dr Roberto Puch-Solis 7


A mini Vevox Quiz

8
Mathematics for AI
Probability – Conditional Probability
Probability - Conditional Probability

• Consider a situation where event 𝐵 has occurred and where we would like to
calculate the probability of another event 𝐴 while considering that 𝐵 has
occurred.
• The new probability is called the conditional probability of 𝑨 given 𝑩 and is
denoted by 𝑃(𝐴|𝐵) and calculated as:
P(A ∩ 𝐵)
𝑃 𝐴𝐵 = , 𝑖𝑓 𝑃(𝐵) ≠ 0
P(𝐵)

CS 4755 - Dr Roberto Puch-Solis 10


Probability - Conditional Probability
Consider two fait dice, what is the probability of die#2 showing a number larger than die#1,
given that die#1 already had 1.

CS 4755 - Dr Roberto Puch-Solis 11


Tutorial Question

Q.1/ In a group of kids, if one is selected at random the probability that they like oranges is
0.6, the probability that they like oranges AND apples is 0.3. If a kid, who likes oranges, is
selected at random, what is the probability that they also likes apples?

Q.2/ A die is rolled, find the probability that an even number is obtained knowing the
number is greater than 3.

12
Mathematics for AI
Probability – Independent Events
Probability – Independent Events

CS 4755 - Dr Roberto Puch-Solis 16


Probability – Independent Events

CS 4755 - Dr Roberto Puch-Solis 17


Mathematics for AI
Probability - Random variables
Probability - Random variables
Random variables are used to map outcomes of random processes (e.g., roll a die, flip a
coin) to numbers
Let the random experiment be the toss of a coin and let the sample space associated with
the experiment be 𝐶 = {𝐻, 𝑇}, where 𝐻 and 𝑇 represent heads and tails, respectively.
Let 𝑋 be a function such that 𝑋 𝑇 = 1 and 𝑋(𝐻) = 0. Thus 𝑋 is a real-valued function
defined on the sample space which takes us from the sample space 𝐶 to a space of real
numbers 𝐷 = {0,1}.

0 𝐼𝑓 ℎ𝑒𝑎𝑑𝑠
𝑋 =8 𝑃(𝑋 = 1) = …
1 𝐼𝑓 𝑇𝑎𝑖𝑙𝑠

Similarly, if we consider the dice-rolling example,


𝑌 = Sum of dice upward face after rolling 7 dice.
𝑃(𝑌 < 10) = … .
(1)
(2)
http://www.buffalo.edu/content/dam/www/ccr/hswrkshop/ebp-workshop_stats-2017.pdf
Hogg, R.V. and Craig, A.T., 1995. Introduction to mathematical statistics.(5"" edition). Englewood Hills, New Jersey.
19
Probability - Random variables

Random variables:
• Discrete Random Variables: Take on distinct or separate values.

0 𝐼𝑓 ℎ𝑒𝑎𝑑𝑠
𝑋 =8
1 𝐼𝑓 𝑇𝑎𝑖𝑙𝑠

𝑌 = The year a random student in class was born in

• Continuous Random Variables: Take on any value in an interval.


Z = The exact weight of a random person in Birmingham (!= Rounded to nearest)\
55. 67.2. 78.26. 95.356

(1) http://www.buffalo.edu/content/dam/www/ccr/hswrkshop/ebp-workshop_stats-2017.pdf 20
Probability - Random variables

21
Probability - Random variables
Example:

Suppose that a coin is tossed twice, so that the sample space is


𝑆 = {𝐻𝐻, 𝐻𝑇 , 𝑇𝐻, 𝑇𝑇}
Let 𝑋 represent the number of heads that can come up. With each sample point we can associate
a number for 𝑋 as shown in the following table.
Thus, for example, in the case of 𝐻𝐻 (i.e., 2 heads), 𝑋 = 2 while for 𝑇𝐻 (1 head), 𝑋 = 1.
It follows that 𝑋 is a random variable.

Spiegel, M.R., Schiller, J.J. and Srinivasan, R.A., 2013. Schaum's outline of probability and statistics. McGraw-Hill Education. 22
Mathematics for AI
Probability - Binomial Distribution
Probability - Binomial Distribution
A Binomial Distribution is a type of distribution that has two possible outcomes.
It can be thought of as simply the probability of a SUCCESS or FAILURE outcome in an
experiment that is repeated multiple times. It must also meet the following conditions:
1. The number of observations or trials is fixed. In other words, you can only figure out
the probability of something happening if you do it a certain number of times. This is common
sense—if you toss a coin once, your probability of getting a tails is 50%. If you toss a coin a
20 times, your probability of getting a tails is very, very close to 100%.
2. Each observation or trial is independent. In other words, none of your trials have an
effect on the probability of the next trial.
3. Each observation represents one of two outcomes ("success" or "failure"). E.g., Binary.
4. The probability of success (tails, heads, fail or pass) is exactly the same from one
trial to another.

http://www.stat.yale.edu/Courses/1997-98/101/binom.htm
24
Probability - Bernoulli Trial
Bernoulli trial is an experiment where there are only two outcomes: success and failure.
If each Bernoulli trial is independent, then the number of successes in Bernoulli trails has a
binomial Distribution.

The probability of success, 𝑝, remains the same every time the trial is conducted
Example 1. Consider the experiment of flipping a fair coin. The outcomes are heads “𝐻" or tails
“𝑇", each one with probability 0.5 : P(𝐻) = 0.5 and P(𝑇) = 0.5.

PS: The outcomes of a Bernoulli trial do not need to be a success and failure in lay terms. A
‘success’ is simply the outcome of interest.

CS 4755 - Dr Roberto Puch-Solis 25


Probability - Bernoulli Trial
Example 2. Consider the experiment of tossing a fair die where the outcome of interest is to
obtain a four. We can model this experiment with a discrete random variable 𝑋 that takes values
1 and 0.

1 𝐼𝑓 𝑢𝑝𝑤𝑎𝑟𝑑 𝑓𝑎𝑐𝑒 𝑎𝑓𝑡𝑒𝑟 𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑡ℎ𝑒 𝑑𝑖𝑒 𝑖𝑠 4


𝑋 =8
0 𝐼𝑓 𝑢𝑝𝑤𝑎𝑟𝑑 𝑓𝑎𝑐𝑒 𝑎𝑓𝑡𝑒𝑟 𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑡ℎ𝑒 𝑑𝑖𝑒 𝑖𝑠 𝑖𝑛 {1,2,3,5,6}

𝑃 (𝑋 = 1) = 1/6
𝑃 (𝑋 = 0) = 5/6

PS: An event is a set of possible outcomes of an experiment. E.g. {𝑋 = 1} is an event because


represented the set {4} which is a possible outcome.

CS 4755 - Dr Roberto Puch-Solis 26


Probability - Bernoulli Trial
Example 3. Consider the experiment of flipping an unfair coin 𝟒 times, where P(𝐻) = 0.6 and
P(𝑇) = 0.4.
* If 𝑋 denotes the number of heads per trial, which values does 𝑋 take (sample space)?
• What is the outcome that gives rise to 𝑋 = 0

CS 4755 - Dr Roberto Puch-Solis 27


Probability - Bernoulli Trial
Example 4. Consider the experiment of flipping an unfair coin 𝟒 times, where P(𝐻) = 0.6 and
P(𝑇) = 0.4. Ω𝑋 = {0,1,2,3,4}
If 𝑋 denotes the number of heads we can obtain, what are the configurations that gives rise to 𝑋
= 1?
(𝐻, 𝑇, 𝑇, 𝑇) , (𝑇, 𝐻, 𝑇, 𝑇) , (𝑇, 𝑇, 𝐻, 𝑇) , (𝑇, 𝑇, 𝑇, 𝐻)

(𝐻 , 𝑇 , 𝑇 , 𝑇 ) (𝑇 , 𝐻 , 𝑇 , 𝑇 ) (𝑇 , 𝑇 , 𝐻 , 𝑇 ) (𝑇 , 𝑇 , 𝑇 , 𝐻 )
0.6 × 0.4 × 0.4 × 0.4 = 0.6 × 0.43 0.4 × 0.6 × 0.4 × 0.4 = 0.6 × 0.43 0.4 × 0.4 × 0.6 × 0.4 = 0.6 × 0.43 0.4 × 0.4 × 0.4 × 0.6 = 0.6 × 0.43
=0.0384
&
This number is the number of combinations of 4 choose 1 : 𝐶4,1 or $

CS 4755 - Dr Roberto Puch-Solis 29


Probability - Bernoulli Trial
Example 5. Consider the experiment of flipping an unfair coin 𝟒 times, where P(𝐻) = 0.6 and
P(𝑇) = 0.4. Ω𝑋 = {0,1,2,3,4}
If 𝑋 denotes the number of heads we can obtain, what are the configurations that gives rise to
𝑋 = 2?

CS 4755 - Dr Roberto Puch-Solis


30
Probability - Binomial Distribution

A random variable 𝑋 is distributed according to a Binomial distribution:

𝑃 (𝑥|𝑛, 𝑝) : probability of 𝒙 given that the number of trials is 𝑛 and the probability of success
at each trial is 𝑝. Also called the probability mass function (PMF).
The bar “|” is read: “given that”, if we know 𝑛 and 𝑝, we can calculate the probability of 𝑥.
The parameters of the distribution are 𝒏 and 𝒑.

𝑿 is distributed according to a Binomial distribution: 𝑋~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝)

CS 4755 - Dr Roberto Puch-Solis


32
Probability - Binomial Distribution
A random variable 𝑋 is distributed according to a Binomial distribution if:
Prob. Of failure

P
Prob. n It is also called the Probability mass function , Probability
Of x # of successes given n choose
trials and a prob of success p x
distribution function (PDF)

Example 3. Consider the experiment of flipping an unfair coin 𝟒 times, where P(𝐻) = 0.6 and
P(𝑇) = 0.4.
The parameters of the distribution are 𝑛 = 4 and 𝑝 = 0.6.

What is the probability of getting H twice given that the number of trials is 4 and the probability of
success at each trial is 0.6 ?
4 &'(
4!
𝑃 2 4, 0.6 = ∗ 0.62 ∗ 1 − 0.6 = ∗ 0.6( ∗ 1 − 0.6 &'( = 0.3456
2 2! 4 − 2 ! 33
CS 4755 - Dr Roberto Puch-Solis
Probability - Binomial Distribution: Python code
n p

Dr Farzaneh Farhadi- 2021/22 CS3IOS


34
Probability - Binomial Distribution

Expected value:
The population mean for a random variable and is therefore a measure of the centre for
the distribution of a random variable.
Expected value of a discrete random variable 𝑋 : 𝜇 = 𝐸[𝑋] = ∑𝑥∈Ω 𝑥 × 𝑃(𝑥)
𝑋

x 0 1 2 3
P(X=x) 0.1 0.2 0.4 0.3
E(X) 0 x 0.1 + 1 x 0.2 + 2 x 0.4 + 3 x 0.3 = 1.9

35
Probability - Binomial Distribution

Expected value:
The population mean for a random variable and is therefore a measure of centre for the
distribution of a random variable.
For binomial dist. If you know the #
of trials and the prob. Then E(x) is np
If 𝑋~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛, 𝑝) , 𝐸 [𝑋] = 𝑛 𝑝

𝐸 [𝑋] = 10 x 0.7 = 7 36
Probability - Binomial Distribution

Variance (of a discrete random variable)


A measure of spread for a distribution of a random variable that determines the degree to
which the values of a random variable differ from the expected value.
Variance of a discrete random variable 𝑋 ∶ 𝜎2 = 𝑉[𝑋] = ∑𝑥∈Ω (𝑥−𝜇)2 × 𝑃(𝑥)
𝑋

𝜇 = 𝐸[𝑋] = 1.9 x 0 1 2 3
P(X=x) 0.1 0.2 0.4 0.3
𝑉[𝑋] = 0.89
V(X) (0-1.9) 2 x 0.1 + (1-1.9) 2 x 0.2 + (2-1.9) 2 x
0.4 + (3-1.9) 2 x 0.3 = 0.89

𝑰𝒇 𝑋~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛, 𝑝) , 𝝈𝟐 = 𝑽 [𝑋] = 𝑛 𝑝 (𝟏 − 𝑝)

37
Cumulative Distribution Function

The cumulative distribution function (CDF) of random variable X is defined as:


The CDF, denoted as � � ( � ) F X
(x), tells us the probability that � X
takes a value less than or equal to �
x. Mathematically, it's defined as
𝐹𝑿(𝑥) = 𝑃(X ≤ 𝑥)
Example: I toss a coin twice. Let X be the number of observed heads. Find the CDF of X.
𝑋~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛 = 2, 𝑝 = 1/2), range of 𝑋 = {0,1,2}
$
a𝑃 𝑋 = 0 = & ; (prob of not seeing heads) (2 choose 0) * (1/2)^0 * (1/2)^2=1/4

( $
𝑃 𝑋 = 1 = = ; (prob of seeing heads once)((2 choose 1) * (1/2)^1 * (1/2)^1 )
& (

𝑃(𝑋 = 2) = 1/4 (prob of seeing heads twice) ((2 choose 2) * (1/2)^2 * (1/2)^0 = 1/4)

𝑥 0 1 2
𝑃(𝑋 = 𝑥) 1/4 1/2 1/4
𝐹(𝑋) 1/4 3/4 1
Python Example:
https://www.simplilearn.com/tutorials/statistics-tutorial/cumulative-distribution-function
38
Cumulative Distribution Function
Example: Let’s toss a coin twice. Let X be the number of observed heads. Find the CDF of X.
𝑋~Binomial (𝑛 = 2, 𝑝 = 1/2), range of 𝑋 = {0,1,2}
1 2 1 1
𝑃 𝑋=0 = ; 𝑃 𝑋 = 1 = = ; 𝑃(𝑋 = 2) =
4 4 2 4
s𝑭𝑿(𝑥) = 𝑷(𝑿 ≤ 𝑥)
For 𝑥 < 0 𝑷(𝑿 < 0) Does not exist 0
For 0 ≤ 𝑥 < 1 𝑷(𝑿 ≤ 𝟎) 𝑷(𝑿 = 𝟎) 1/4

For 1 ≤ 𝑥 < 2 𝑷(𝑿 ≤ 1) 𝑷(𝑿 = 𝟎) + 𝑷(𝑿 = 𝟏) 3/4


For 𝑥 ≥ 2 𝑷(𝑿 ≤ 2) 𝑷(𝑿 = 𝟎) + 𝑷(𝑿 = 𝟏) + 𝑷(𝑿 = 𝟐) 1

𝑥 0 1 2
𝑃(𝑋 = 𝑥) 1/4 1/2 1/4
𝐹(𝑋) 1/4 3/4 1
39
Probability - Binomial Distribution: Example

A company produces electrical components for computer producer. A total of 100


components are produced in one day but only 80% of the components pass the quality
control phase. The components that pass the quality control can be shipped to the
customer. The company needs to deliver 90 components a day.

1. Is a Binomial distribution suitable for this problem?


2. What is the probability that 90 components pass the quality control phase?
3. If they produce more than 90 components, the company can ship them. So, the
question of interest is: What is the probability that 90 or more components pass the
quality control phase?

CS 4755 - Dr Roberto Puch-Solis


40
Probability - Binomial Distribution: Example

A company produces electrical components for computer producer. A total of 100


components are produced in one day but only 80% of the components pass the quality
control phase. The components that pass the quality control can be shipped to the
customer. The company needs to deliver 90 components a day.
1. Is a Binomial distribution suitable for this problem?

Yes, because

(a) There is a finite number of components produce a day (𝑛 = 100)


(b) The production process is the same for each component (Bernoulli trial)
(c) Each component pass the quality control phase with equal probability (𝑝 = 0.80)

CS 4755 - Dr Roberto Puch-Solis


41
Probability - Binomial Distribution: Example

A company produces electrical components for computer producer. A total of 100


components are produced in one day but only 80% of the components pass the quality
control phase. The components that pass the quality control can be shipped to the
customer. The company needs to deliver 90 components a day.
2. What is the probability that 90 components pass the quality control phase?

100
𝑃(𝑥 = 90 |𝑛 = 100, 𝑝 = 0.8) = 0.890 (1 − 0.8)$11'21= 0.003
90

CS 4755 - Dr Roberto Puch-Solis


42
Probability - Binomial Distribution: Example
A company produces electrical components for computer producer. A total of 100 components
are produced in one day but only 80% of the components pass the quality control phase. The
components that pass the quality control can be shipped to the customer. The company needs
to deliver 90 components a day.
3. If they produce more than 90 components, the company can ship them. So, the question of
interest is: What is the probability that 90 or more components pass the quality control
phase?

F(89|n=100, p=0.8) = P(0|n=100, p=0.8) +


P(1|n=100, p=0.8) + … + P(89|n=100, p=0.8)

CS 4755 - Dr Roberto Puch-Solis


43
Probability - Binomial Distribution: Example
A company produces electrical components for computer
producer. A total of 100 components are produced in one
day but only 80% of the components pass the quality
control phase. The components that pass the quality
control can be shipped to the customer. The company
needs to deliver 90 components a day.
3. If they produce more than 90 components, the
company can ship them. So, the question of interest
is: What is the probability that 90 or more
components pass the quality control phase?
F(89|n=100, p=0.8) =
P(0|n=100, p=0.8) +
P(1|n=100, p=0.8) + … +
P(89|n=100, p=0.8)

CS 4755 - Dr Roberto Puch-Solis


44
Probability - Binomial Distribution: Example

You can advice the company to improve the quality of their production line so that 95% of
the components pass the quality control phase.

45
Additional Resources:

• Conditional Probabilities Examples and Questions (analyzemath.com)


• What is the binominal distribution?

• Python – Binomial Distribution

• Why Does Zero Factorial Equal One?

• What is a Continuous Random Variable?

• khanacademy - Random variables


• The Binomial Distribution

• Examples - Mean and Variance

46
Tutorial Questions
Tutorial Questions 1-3

48
Tutorial#1 Solution

49
Tutorial#1 Solution

50
Tutorial Questions 2-3

51
Tutorial#2 Solution

52
Tutorial Questions 3-3

53
Tutorial#3 Solution

54
End of Lecture 7

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy