Section10-Beta Gamma Conditional Order
Section10-Beta Gamma Conditional Order
Statisticians, like artists, have the bad habit of falling in love with
their models.
George Box
Administrivia
Section notes modified from William and Sebastian.
1
Exponentially with a mean of 31 of a time unit. The time it takes to find both Jules and Vernes
is Expo(3)+Expo(3) ∼ Gamma(2, 3). The time it takes to find Nemo is Expo(3) ∼ Gamma(1, 3).
Thus the proportion of the total hide-and-seek time that you spend finding Nemo is distributed
Beta(1, 2) and is independent from the total time that you’ve spent playing the game.
PDF The PDF of a Beta is:
Γ(a + b) a−1
f (x) = x (1 − x)b−1 , x ∈ (0, 1)
Γ(a)Γ(b)
U(j) ∼ Beta(j, n − j + 1)
n!
fU(j) (u) = tj−1 (1 − t)n−j
(j − 1)!(n − j)!
. . . as the Conjugate Prior of the Binomial - A prior is the distribution of a parameter before you
observe any data (f (x)). A posterior is the distribution of a parameter after you observe data y
(f (x|y)). Beta is the conjugate prior of the Binomial because if you have a Beta-distributed prior on
p (the parameter of the Binomial), then the posterior distribution on p given observed data is also
Beta-distributed. This means, that in a two-level model:
X|p ∼ Bin(n, p)
p ∼ Beta(a, b)
Then after observing the value X = x, we get a posterior distribution p|(X = x) ∼ Beta(a+x, b+n−x)
Order Statistics
Definition - Let’s say you have n i.i.d. random variables X1 , X2 , X3 , . . . Xn . If you arrange them from
smallest to largest, the ith element in that list is the ith order statistic, denoted X(i) . X(1) is the
smallest out of the set of random variables, and X(n) is the largest.
Properties - The order statistics are dependent random variables. The smallest value in a set of random
variables will always vary and itself has a distribution. For any value of X(i) , X(i+1) ≥ X(j) .
2
Distribution - Taking n i.i.d. random variables X1 , X2 , X3 , . . . Xn with CDF F (x) and PDF f (x), the
CDF and PDF of X(i) are as follows:
n
X n
FX(i) (x) = P (X(j) ≤ x) = F (x)k (1 − F (x))n−k
k
k=i
n−1
fX(i) (x) = n F (x)i−1 (1 − F (X))n−i f (x)
i−1
Universality of the Uniform - We can also express the distribution of the order statistics of n i.i.d.
random variables X1 , X2 , X3 , . . . Xn in terms of the order statistics of n uniforms. We have that
F (X(j) ) ∼ U(j)
Conditional Expectation
Conditioning on an Event - We can find the expected value of Y given that event A or X = x has
occurred. This would be finding the values of E(Y |A) and E(Y |X = x). Note that conditioning in
an event results in a number.
Discrete Y Continuous Y
P R∞
Conditional E(Y |A) = y yP (Y = y|A) E(Y |A) = −∞ yf (y|A)dy
P R∞
E(Y |X = x) = y yP (Y = y|X = x) E(Y |X = x) = −∞ yfY |X (y|x)dy
P R∞
Regular E(Y ) = y yP (Y = y) E(Y ) = −∞ yfY (y)dy
Conditioning on a Random Variable - We can also find the expected value of Y given the random
variable X. The resulting expectation, E(Y |X) is not a number but a function of the random variable
X. For an easy way to find E(Y |X), find E(Y |X = x) and then plug in X for all x. This changes the
conditional expectation of Y from a function of a number x, to a function of the random variable X.
Properties of Conditioning on Random Variables
1. E(Y |X) = E(Y ) if X y Y
2. E(h(X)|X) = h(X) (taking out what’s known).
E(h(X)W |X) = h(X)E(W |X)
3. E(E(Y |X)) = E(Y ) (Adam’s Law, aka Law of Iterated Expectation of Law of Total Expec-
tation)
Law of Total Expectation - For any set of events that partition the sample space, A1 , A2 , . . . , An or
just simply A, Ac , the following holds:
Conditional Variance
Eve’s Law (aka Law of Total Variance)
3
Practice Problems
Exercise (Generalized Bank and Post Office). 1. Suppose that X1 , ..., Xn+1 ∼ Expo(λ) i.i.d. Assume
that j < n + 1 Find the distribution of
X1 + ... + Xj
T =
X1 + ... + Xn+1
Exercise (Gamma story). 1. Let X ∼ P oisson(λt) and Y ∼ Gamma(j, λ), where j is a positive inte-
ger. Show that P (X ≥ j) = P (Y ≤ t).
2. Show that
∞
t
du X e−λt (λt)k
Z
1
(λu)j e−λu =
0 (j − 1)! u k!
k=j
Exercise (DNA Sequencing). Suppose I view a DNA sequence and want to find the proportion of base
pairs in the sequence that are Adenosine (A). Let this proportion be denoted by p. From my research, I
have a prior on p given by p ∼ Beta(a, b). Suppose I analyze more DNA sequences and find that out of n
base pairs, k are Adenosine. Let X be the above data.
a) Find the PMF P (X = k) unconditionally. You must simplify all integrals, but may leve your answer in
terms of the Γ function.
4
b) Find the conditional PDF of p|(X = k) up to a scaling constant. What famous distribution does this
follow and what are its parameters?
c) Find E(p | X = k), and discuss how the parameters a, b of our prior affect this value. What happens as
the amount of data we collect increases?
Exercise (Basketball Practice). A certain basketball player (we’ll call him Jeremy Lin) practices shooting
free throws over and over again. The shots are independent, with probability p of success. Now suppose
that the player keeps shooting until making 7 shots in a row for the first time. Let X be the number of
shots taken. Find E(X).
Exercise (Chicken and Egg Revisited). Let’s recall the Chicken and Egg problem: there are N ∼ Pois(λ)
eggs, of which X | N = n ∼ Bin(n, p) hatch and Y | N = n = n − X don’t hatch.
1. What is the expected number of eggs that will hatch?
2. What is the variance in the number of eggs that will hatch?
3. Given that you know the number of eggs that hatched, what’s the expected number of total eggs?
4. For each egg that hatches, you will receive a commission proportional to the total number of eggs that
you attempted to hatch. How much do you expect to make, given proportionality constant c?
5
Exercise (Normal and Normal squared). Let Z ∼ N (0, 1), Y = Z 2 .
1. Find E(Y |Z) and E(Z|Y ).
2. Find V ar(Y |Z) and V ar(Z|Y ).
Exercise (Normal Order Statistics). Let X1 , ..., X10 ∼ N (0, 1) i.i.d. Additionally, let Y ∼ Bin(10, 0.5). If
X(5) is the 5th order statistic among the Normals, prove that P (X(4) ≤ 0) = P (Y ≥ 4)
6
Named and Famed Distributions
Discrete Distributions
Bernoulli P (X = 1) = p
Bern(p) P (X = 0) = q p pq q + pet
P (X = k) = nk pk (1 − p)n−k
Binomial
Bin(n, p) k ∈ {0, 1, 2, . . . n} np npq (q + pet )n
Geometric P (X = k) = q k p
q q p t
Geom(p) k ∈ {0, 1, 2, . . . } p p2 1−qet , qe <1
P (X = n) = n+r−1
r n
Negative Binomial r−1 p q
NBin(r, p) n ∈ {0, 1, 2, . . . } r pq r pq2 p
( 1−qe r t
t ) , qe < 1
P (X = k) = wk n−k b
Hypergeometric / w+b
n
w
Hyper(w, b, n) k ∈ {0, 1, 2, . . . } n b+w
−λ k
Poisson P (X = k) = e k!λ
t −1)
Pois(λ) k ∈ {0, 1, 2, . . . } λ λ eλ(e
⃗ = ⃗n) = n
n1 nk
Multinomial P (X n1 n2 ...nk p1 . . . pk Var(Xi ) = npi (1 − pi )
Multk (n, p⃗) n = n1 + n2 + · · · + nk n⃗
p Cov(Xi , Xj ) = −npi pj
7
Continuous Distributions
f (x) = 1 a −λx 1
Gamma Γ(a) (λx) e x a
a a λ
Gamma(a, λ) x ∈ [0, ∞) λ λ2 λ−t ,t < λ
Γ(a+b) a−1
f (x) = Γ(a)Γ(b) x (1 − x)b−1
Beta
a µ(1−µ)
Beta(a, b) x ∈ [0, 1] µ= a+b a+b+1