Lecture03 CH 03 DiscRVs Baron Inf Stats Final FA24
Lecture03 CH 03 DiscRVs Baron Inf Stats Final FA24
X 0 1 1 2 1 2 2 3
Y 3 1 1 1 1 1 1 3
Discrete random variables (2)
More examples:
• Throwing a die twice, the sum of the two
numbers
• Throwing a die twice, the max of the two
numbers
sum 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
max 1 2 3 4 5 6
1 1 2 3 4 5 6
2 2 2 3 4 5 6
3 3 3 3 4 5 6
4 4 4 4 4 5 6
5 5 5 5 5 5 6
6 6 6 6 6 6 6
Probability Mass Function
• The probability mass function (PMF) p of a
discrete random variable X is the function
pX:R[0, 1] defined by pX(a) = P(X = a) for
-∞ < a < ∞
• If X is a discrete random variable that takes
on the values a1, a2, . . ., then
pX(ai) > 0,
pX(a1) + pX(a2) + · · · = 1,
pX(a) = 0 for all the other values of a
Probability Mass Function (2)
Example: The PMF for the maximum of two
independent throws of a fair die
a 1 2 3 4 5 6
p(a) 1/36 3/36 5/36 7/36 9/36 11/36
As a formula, it is
pX(a) = (2a – 1) / 36 (for a ∈ {1,2,3,4,5,6})
=0 (otherwise)
Cumulative distribution function
• The cumulative distribution function (CDF) F
of a discrete random variable X is the
function FX:R[0, 1] defined by FX(a) = P(X
≤ a) for -∞ < a < ∞
• This function is also called distribution
function
F(a) can be obtained as Ʃp(a’) for all a’ ≤ a
Also, P(a < X ≤ b) = F(b) – F(a)
Distribution function (2)
Example: The p(a) and F(a) for the maximum of
two independent throws of a fair die
a 1 2 3 4 5 6
p(a) 1/36 3/36 5/36 7/36 9/36 11/36
F(a) 1/36 4/36 9/36 16/36 25/36 36/36
1 2, 1 3, 2 4, 3 5, 4 6, 5 7, 6
2 3, 2 4, 2 5, 3 6, 4 7, 5 8, 6
3 4, 3 5, 3 6, 3 7, 4 8, 5 9, 6
4 5, 4 6, 4 7, 4 8, 4 9, 5 10, 6
5 6, 5 7, 5 8, 5 9, 5 10, 5 11, 6
Try!
Expectation
• The expectation (expected value) or mean of a
random variable X is the weighted average of
its values, written as E[X] (also E(X) or µ)
• It is a constant feature value, not random
Expectation (2)
Intuitive meaning: the fair price of a gamble,
or the center of gravity i.e. the point of balance
Expectation (3)
a 2 3 4 SUM
p(a) 0.1 0.7 0.2 1
a*p(a) 0.2 2.1 0.8 3.1
Properties of expectation
• If the n values are equally probable, the
expectation is their average (Ʃai)/n
• The expectation of a discrete random
variable may not be a valid value of the
variable
• The expectation may not be exactly at the
half-way between the min value and the
max value, though it is always in [min, max]
St. Petersburg paradox
•In a game, a fair coin is tossed. The initial award is 2 dollars
and is doubled every time heads appears. The first time tails
appears, the game ends and the player wins the reward.
What would be a fair price to pay for entering the game?
Another question, what should be a better option? Choosing the
game or choosing an option of having $1M?
The answer may not seem rational by considering the data below:
PARADOX!
a 2 4 8 16 32 64 … SUM
p(a) 1/2 1/4 1/8 1/16 1/32 1/64 … 1
a*p(a) 1 1 1 1 1 1 … ∞
Expectation of a function
If a random variable Y = g(X), then
E[Y] = Ʃg(ai)pX(ai) for all X = ai
If a random variable Z = g(X, Y), then
E[Z] = Ʃg(ai,bj)pXY(ai,bj) for all X = ai, Y = bj
Special cases:
• If Z = aX + bY + c, E[Z] = aE[X] + bE[Y] + c
• If X and Y are independent, E[XY] = E[X]E[Y]
Quiz 2.6: Yates and Goodman Book
Quiz 2.6: Solution
Example 2.31: Yates, Goodman
Let a random variable X have the PMF as given below. Also, let
there be another random variable Y derived from X via the
equation given below.
TRY!
Example 2.31: Solution
Variance
Very often, just to know the expectation of a
random variable is not enough, since its spread
(around the expectation) is also of importance
Example: X and Y have the same expectation,
but are still very different in spread
a 4 6 SUM a 1 9 SUM
pX(a) 0.5 0.5 1 pY(a) 0.5 0.5 1
a*pX(a) 2 3 5 a*pY(a) 0.5 4.5 5
Variance and standard deviation
The variance Var(X) of a random variable X is
Var(X) = E[(X − µ)2] = Ʃ[(ai − µ)2p(ai)] for all ai
= E[X2] − (E[X])2
E[X2] is called the second moment of X
The standard deviation of a random variable is
the square root of its variance
Std(X) = σ = Var(X)
Var and std: example
a 4 6 SUM a 1 9 SUM
pX(a) 1/2 1/2 1 pY(a) 1/2 1/2 1
a*pX(a) 2 3 5 a*pY(a) 1/2 9/2 5
a2 16 36 - a2 1 81 -
a2*pX (a) 8 18 26 a2*pY (a) 1/2 81/2 41
E[X] = 5 E[Y] = 5
E[X2] = 26 E[Y2] = 41
Var(X) = 1 Var(Y) = 16
Std(X) = 1 Std(Y) = 4
Var and std: example (2)
E.g. The maximum of two fair die throws
a 1 2 3 4 5 6 SUM
p(a) 1/36 3/36 5/36 7/36 9/36 11/36 1
a*p(a) 1/36 6/36 15/36 28/36 45/36 66/36 161/36
a2 1 4 9 16 25 36 -
a2*p(a) 1/36 12/36 45/36 112/36 225/36 396/36 791/36
TRY YOURSELF!
Ex 3.11:
Solution
Chebyshev's inequality
The range of values of a random variable can be estimated
from its expectation and variance
“The chance for the variable to take a value far away from its
expectation is small.”
This inequality shows that only for large variance σ2, a
variable X may differ significantly from this expectation
Chebyshev's inequality (2)
When ε = k, Chebyshev's inequality becomes
P(|X – μ| > k) ≤ (1/k)2
k = 2: P(|X – μ| > 2) ≤ 1/4 = 0.25
k = 3: P(|X – μ| > 3) ≤ 1/9 ≈ 0.111
k = 4: P(|X – μ| > 4) ≤ 1/16 = 0.0625
k = 5: P(|X – μ| > 5) ≤ 1/25 = 0.04
k = 10: P(|X – μ| > 10) ≤ 1/100 = 0.01
Application to Finance
• Chebyshev’s inequality shows that in general,
higher variance implies higher probabilities of
large deviations, and this increases the risk for a
RV to take values far from its expectation
• This finds a number of immediate intuitive
applications e.g. on evaluating risks of financial
deals, allocating funds, and constructing optimal
portfolios
• The same methods can be used for the optimal
allocation of computer memory, CPU time,
customer support, or other resources
Financial application: Ex 3.13 Baron’s Book
We would like to invest $10,000 into shares of companies XX
and YY.
Shares of XX cost $20 per share. The market analysis shows that
their expected return is $1 per share with a standard deviation of
$0.5.
Shares of YY cost $50 per share, with an expected return of $2.50
and a standard deviation of $1 per share, and returns from the two
companies are independent.
In order to maximize the expected return and minimize the risk
(standard deviation or variance), is it better to invest (A) all $10,000
into XX, (B) all $10,000 into YY, or (C) $5,000 in each company?
Ex 3.13: Solution
(a) The value of a share is X, so the total return A = (10000/20)X = 500X
E[A] = 500 E[X] = (500) (1) = 500
Var(A) = 5002 Var(X) = (5002) (0.52) = 62500, Std(A) = Var(A)0.5 = 250
(b) The value of a share is Y, so the total return B = (10000/50)Y = 200Y
E[B] = 200 E[Y] = (200) (2.5) = 500
Var(B) = 2002 Var(Y) = (2002) (12) = 40000, Std(B) = Var(B)0.5 = 200
(c) The total return C = 250X + 100Y
E[C] = 250 E[X] + 100 E[Y] = 250 + 250 = 500
Var(C) = 2502 Var(X) + 1002 Var(Y) = 25625, Std(C) = Var(C)0.5 ≈ 160
Financial application: Graphic
By diversifying the portfolio, one can keep the
same expectation while reducing the variance
Families of Discrete Distributions: Bernoulli
distribution
A random variable with ONLY two possible
values, 0 and 1, is called a Bernoulli variable,
and its distribution is Bernoulli distribution
Ber(p) is a Bernoulli distribution with
parameter p, where 0 ≤ p ≤ 1, and
p(1) = P(X = 1) = p
p(0) = P(X = 0) = 1 − p
E[X] = p, Var(X) = p(1 − p)
Families: Binomial distribution
A binomial distribution Bin(n, p) is the number
of 1s (successes) in n independent Bernoulli
trials Ber(p)
Special cases: Bin(1, p) = Ber(p)
w 00 01 10 11
X 0 1 1 2
Bin(2, p) P(w) (1-p)2 (1-p)p p(1-p) p2
a 0 1 2
p(a) (1-p)2 2p(1-p) p2
Binomial distribution (2)
w 000 001 010 011 100 101 110 111
X 0 1 1 2 1 2 2 3
Bin(3, p) P(w) (1-p)3 p(1-p)2 p(1-p)2 p2(1-p) p(1-p)2 p2(1-p) p2(1-p) p3
a 0 1 2 3
p(a) (1-p)3 3p(1-p)2 3p2(1-p) p3
In general
P(k) = C(n,k)pk(1 − p)n-k for k = 0, 1, . . ., n
where C(n,k) = n! / [k! (n − k)!]
Binomial Dist: Ex 3.16 (Baron’s)
TRY!
Binomial Distribution: Ex 3.16 (Sol)
Let X the event that a new subscriber gets the special promotion. Then, the required
probability is 𝑃 𝑋 ≥ 4 = 1 − 𝑃 𝑋 ≤ 3 = 1 − 𝑃 0 − 𝑃 1 − 𝑃 2 − 𝑃(3)
Binomial distribution examples
Bin(3, 1/2): tossing three fair coins, the number of heads
Binomial distribution examples (2)
Binomial distribution as program
Bin(n, p) can be remembered as Ber(p)
repeated n times, with their sum returned:
Bin(n, p)
count = 0
for (n)
count = count + Ber(p)
return count
Binomial Distribution in MATLAB
Binomial distribution features
• If X has a Bin(n, p) distribution, then it can
be written as X = R1 + R2 + ... + Rn, where
each Ri has a Ber(p) distribution, and is
independent of the others
E[X] = E[R1] + E[R2] + ... + E[Rn] = np
Var(X) = Var(R1) + ... + Var(Rn) = np(1−p)
Both are the feature of Ber(p) times n
Ex 3.17: Baron’s Book
Geometric distribution
The number of Ber(p) needed to get the first 1
(success) has Geometric distribution, Geo(p)
Example: Geo(0.6)
w 1 01 001 0001 …
a 1 2 3 4 …
p(a) 0.6 0.4*0.6 0.42*0.6 0.43*0.6 …
Textbook: Applied Statistics and Probability for Engineers by Douglas C. Montgomery & George C. Runger
Negative Binomial Distribution
• In a sequence of independent Ber(p), the number of trials
needed to obtain n 1s has Negative Binomial distribution
NegBin(n, p) - Pascal random variable
• Binomial on the other hand counts the # of successes in fixed
number of trails hence the name – Negative Binomial
Example: NegBin(2, 0.6)
w 11 011 101 0011 0101 1001 …
X 2 3 3 4 4 4 …
P(w) 0.62 0.62*0.4 0.62*0.4 0.62*0.42 0.62*0.42 0.62*0.42 …
a 2 3 4 …
Trials are independent the probability that exactly three errors occur in the
first 9 trials and trial 10 results in the fourth error is the product of the
probabilities of these two events, namely
Textbook: Applied Statistics and Probability for Engineers by Douglas C. Montgomery & George C. Runger
Poisson distribution
• Poisson process: a very large population of
independent events, where each has a very
small probability to occur i.e. rare or poissonian
events, and the average occurrences in a range
is roughly stable
• Example: The expected number of telephone
calls arriving at a telephone exchange during a
time interval [0, t] is E[Nt] = λ, where λ is the
frequency of the event in an interval of length t
Poisson distribution (2)
Poisson distribution (3)
Poisson RV: Properties
The probability model of Poisson RV describes phenomena
occurring randomly in time with the timing of each
occurrence random but a known average number of
occurrences per unit time
For example, the arrival of information requests at a World Wide
Web server, the initiation of telephone calls, and the emission of
particles from a radioactive source are often modelled as Poisson RVs
To describe a Poisson random variable, we will call the occurrence
of the phenomenon of interest an arrival
A Poisson model often specifies an average rate, α arrivals per
second and a time interval, T seconds
In this time interval, the number of arrivals X has a Poisson PMF with
λ = αT
Ex 2.19: Yates and Goodman
The number of hits at a website in any time interval is a Poisson random
variable. A particular site has on average α= 2 hits per second. What is the
probability that there are no hits in an interval of 0.25 seconds? What is the
probability that there are no more than two hits in an interval of one second?
Solution: Let H be the # of hits with λ= αT=2(0.25)=0.5hits. The PMF of H is:
Pois(λ) e-λλk/k! k ≥ 0 λ λ
Discrete RV in Python
•Based on libraries e.g. Numpy, SciPy (Scientific Computation
using Numpy underneath), Matplotlib and Seaborn (data
visualization library based on Matplotlib)
•Scipy.stats -> contains large number of statistical functions
including probability distributions, summary and frequency
statistics, correlation functions and statistical tests etc.
Bernoulli Distribution
Binomial Distribution
Geometric Distribution
Negative Binomial Distribution
Poisson Distribution
Further Reading
•Understanding Probability Distributions using Python: An intuitive
and comprehensive guide to probability distributions
https://towardsdatascience.com/understanding-probability-distributions-
using-python-9eca9c1d9d38
https://github.com/reza-bagheri/probability_distributions
THANKS!