09-Multivariate Distributions
09-Multivariate Distributions
TEXTBOOK
Fundamentals of Probability with Stochastic Processes
with Stochastic Processes, Fourth Edition
by Saeed Ghahramani
INSTRUCTOR
Ying-ping Chen
Dept. of Computer Science, NYCU
OVERVIEW
Note that
(a). p(x1 , x2 , . . ., xn ) ≥ 0.
(b). If for some i, 1 ≤ i ≤ n, xi ∉ Ai , then p(x1 , x2 , . . ., xn ) = 0.
(c). ∑xi ∈Ai ,1≤i≤n p(x1 , x2 , . . ., xn ) = 1.
Overview 3 / 49
JOINT DISTRIBUTIONS OF n > 2 RANDOM VARIABLES
If the joint probability mass function of random variables X1 , X2 , . . ., Xn ,
p(x1 , x2 , . . ., xn ), is given, then for 1 ≤ i ≤ n, the marginal probability mass
function of Xi , pXi , can be found from p(x1 , x2 , . . ., xn ) by
pXi (xi ) = P(Xi = xi ) = P(Xi = xi , X j ∈ A j , 1 ≤ j ≤ n, j ≠ i)
= ∑ p(x1 , x2 , . . ., xn ) . (9.1)
x j ∈A j , j≠i
Remark 9.1
A joint probability mass function, such as p(x, y, z) of Example 9.1, is called
multivariate hypergeometric. In general, suppose that a box contains n1
marbles of type 1, n2 marbles of type 2, . . ., and nr marbles of type r. If n
marbles are drawn at random and Xi (i = 1, 2, . . . , r) is the number of the
marbles of type i drawn, the joint probability mass function of X1 , X2 , . . ., Xr is
called multivariate hypergeometric and is given by
where x1 + x2 + ⋯ + xr = n.
P(X1 ∈A1 , X2 ∈A2 , . . ., Xn ∈An ) = P(X 1 ∈A1 )P(X 2 ∈A2 )⋯P(X n ∈An ) .
P(X1 ≤x1 , X2 ≤x2 , . . ., Xn ≤xn ) = P(X 1 ≤x1 )P(X 2 ≤x2 )⋯P(X n ≤xn ) .
P(X1 =x1 , X2 =x2 , . . ., Xn =xn ) = P(X 1 =x1 )P(X 2 =x2 )⋯P(X n =xn ) (9.4)
Now, if in (9.5) we let A2 = R, then since the event Y ∈ R is certain and has
probability 1, we get
This shows that X and Z are independent random variables. In the same
way it can be shown that {X, Y}, and {Y, Z} are also independent sets.
Overview | Joint Distributions of n > 2 Random Variables 10 / 49
JOINT DISTRIBUTIONS OF n > 2 RANDOM VARIABLES
Definition 9.2
A collection of random variables is called independent if all of its finite
subcollections are independent.
It is important to know that the result of Theorem 8.5 is true for any number
of random variables:
If {X1 , X2 , . . .} is a sequence of independent random variables and
for i = 1, 2, . . ., gi : R → R is a real-valued function, then the se-
quence {g1 (X1 ), g2 (X2 ), . . .} is also an independent sequence of
random variables.
Using Theorems 9.1 and 9.2, an almost identical proof to that of Theorem
8.6 implies that
The expected value of the product of several independent discrete
random variables is equal to the product of their expected values.
Overview | Joint Distributions of n > 2 Random Variables 13 / 49
JOINT DISTRIBUTIONS OF n > 2 RANDOM VARIABLES
• Examples:
– Let
2 3 4
k ∑ ∑ ∑ (x 2 + y 2 + yz) = 1 ,
x=0 y=2 z=3
we get k = 1/203.
1 2 2 2
PY,Z (y, z) = ∑ (x + y + yz)
203 x=0
1
= (3y 2 + 3yz + 5) , y = 2, 3; z = 3, 4 .
203
1 2 3 2 2
pZ (z) = ∑ ∑ (x + y + yz)
203 x=0 y=2
15 7
= z+ , z = 3, 4 .
203 29
1 2 3 4 2 2 774
E(XZ) = ∑ ∑ ∑ xz(x + y + yz) = .
203 x=0 y=2 z=3 203
Overview | Joint Distributions of n > 2 Random Variables 15 / 49
JOINT DISTRIBUTIONS OF n > 2 RANDOM VARIABLES
Joint Probability Density Functions
Definition 9.3
Let X1 , X2 , . . ., Xn be continuous random variables defined on the same
sample space. We say that X1 , X2 , . . ., Xn have a continuous joint
distribution if there exists a nonnegative function of n variables,
f (x1 , x2 , . . ., xn ), on R × R × ⋯ × R ≡ Rn such that for any region R in Rn that
can be formed from n-dimensional rectangles by a countable number of set
operations,
=∫ ∫ ⋯ ∫ f (x1 , x2 , . . ., xn ) dx 1 dx 2 ⋯ dx n .
An An−1 A1
F(t1 , t2 , . . ., tn )
tn tn−1 t1
=∫ ⋯∫ f (x1 , x2 , . . ., xn ) dx 1 dx 2 ⋯ dx n , (9.8)
−∞ −∞ −∞
∫
and
∂n F(t1 , t2 , . . ., tn )
f (x1 , x2 , . . ., xn ) = (9.9)
∂x 1 ∂x 2 ⋯ ∂x n
Using Theorems 9.3 and 9.4, an almost identical proof to that of Theorem
8.6 implies that
The expected value of the product of several independent random
variables is equal to the product of their expected values.
Overview | Joint Distributions of n > 2 Random Variables 21 / 49
JOINT DISTRIBUTIONS OF n > 2 RANDOM VARIABLES
• Examples:
– A system has n components, whose lifetimes are exponential
random variables with parameters λ1 , λ2 , . . ., λn , respectively.
Suppose that the lifetimes of the components are independent
random variables, and the system fails as soon as one of its
components fails. Find the probability density function and the
expected value of the time until the system fails.
□ Let X1 , X2 , . . ., Xn be the lifetimes of the n components,
respectively. Then X1 , X2 , . . ., Xn are independent random
variables and for i = 1, 2, . . . , n,
Remark 9.2
If X1 , X2 , . . ., Xn are n independent exponential random variables with
parameters λ1 , λ2 , . . ., λn , respectively, then min(X1 , X2 , . . ., Xn ) is an
exponential random variable with parameter λ1 +λ2 +⋯+λn . Hence
1
E [min(X1 , X2 , . . ., Xn )] = .
λ1 +λ2 +⋯+λn
⎧
⎪ 1 if 0 < t ≤ z ≤ y ≤ x ≤ 1
⎪
f (x, y, z, t) = ⎨ x yz
⎩ 0 elsewhere .
⎪
⎪
1 x y 1
z 1 x y 1
∫ ∫ ∫ ∫ dt dz d y dx = ∫ ∫ ∫ dz d y dx
0 0 0 0 x yz 0 0 0 xy
1 x 1 1
=∫ ∫ d y dx = ∫ dx = 1 .
0 0 x 0
1 1
1 1 ln y
fY,Z,T (y, z, t) = ∫ dx = ln x∣ = − .
y x yz yz y yz
Therefore,
ln y
if 0 < t ≤ z ≤ y ≤ 1
⎧
⎪
⎪ − yz
fY,Z,T (y, z, t) = ⎨
⎩ 0 elsewhere .
⎪
⎪
Definition 9.4
Let S be a subset of the three-dimensional Euclidean space with volume
Vol(S) ≠ 0. A point is said to be randomly selected from S if for any subset
Ω of S with volume Vol(Ω), the probability that Ω contains the point is
Vol(Ω)/Vol(S).
f (x, y, z)
fX,Y∣Z (x, y∣z) =
fZ (z)
at all points z for which fZ (z) > 0. As another example, let X, Y, Z, V, and W
be five continuous random variables with the joint probability density
function f . Then
f (x, y, z, v, w)
fX,Z,W∣Y,V (x, z, w∣y, v) =
fY,V (y, v)
Clearly,
⎧
⎪ 1 0<x<1
⎪
fX (x) = ⎨
⎩ 0 otherwise .
⎪
⎪
□
⎧
⎪
⎪ 1/x 0 < y < x
fY∣X (y∣x) = ⎨
⎪
⎪
⎩ 0 otherwise .
⎧
⎪
⎪ 1/(1 − x) x < z < 1
fZ∣X,Y (z∣x, y) = ⎨
⎪
⎪
⎩ 0 otherwise .
Thus,
⎧
⎪ 1 0< y<x<z<1
⎪
f (x, y, z) = ⎨ x(1−x)
⎩ 0 otherwise .
⎪
⎪
The relationship
f (x1 , x2 , . . ., xn ) = fX1 (x1 )fX2 ∣X1 (x2 ∣x1 )fX3 ∣X1 ,X2 (x3 ∣x1 , x2 )
⋯fXn ∣X1 , X2 , ..., Xn−1 (xn ∣x1 , x2 , . . ., xn−1 ) . (9.12)
Definition 9.6
Let {X1 , X2 , . . ., Xn } be an independent set of identically distributed
continuous random variables with the common probability density and
distribution functions f and F, respectively. Let X(1) be the smallest value in
{X1 , X2 , . . ., Xn }, X(2) be the second smallest value, X(3) be the third
smallest, and, in general, X(k) (1 ≤ k ≤ n) be the kth smallest value in
{X1 , X2 , . . ., Xn }. Then, X(k) is called the kth order statistic, and the set
{X(1) , X(2) , . . . , X(n) } is said to consist of the order statistics of
{X1 , X2 , . . ., Xn }.
Overview 33 / 49
ORDER STATISTICS
Unlike Xi ’s, the random variables X(i) ’s are neither independent nor
identically distributed.
• Examples:
– Suppose that customers arrive at a warehouse from n different
locations. Let Xi , 1 ≤ i ≤ n, be the time until the arrival of the next
customer from location i; then X(1) is the arrival time of the next
customer to the warehouse.
– Suppose that a machine consists of n components with the
lifetimes X1 , X2 , . . ., Xn , respectively, where Xi ’s are independent
and identically distributed. Suppose that the machine remains
operative unless k or more of its components fail. Then X(k) , the
kth order statistic of {X1 , X2 , . . ., Xn }, is the time when the
machine fails. Also, X(1) is the failure time of the first component.
• Examples:
– Let {X1 , X2 , . . ., Xn } be a random sample of size n from a
population with continuous distribution F. Then the following
important statistical concepts are expressed in terms of order
statistics:
1. The sample range is X(n) − X(1) .
2. The sample midrange is [X(n) + X(1) ]/2.
3. The sample median is
⎧
⎪ X(i+1)
⎪ if n = 2i + 1
m = ⎨ X(i) +X(i+1)
⎪
⎪
⎩ 2 if n = 2i .
n n
Fk (x) = ∑ ( )[F(x)]i [1 − F(x)]n−i , −∞ < x < ∞ , (9.13)
i=k i
and
n!
fk (x) = f (x)[F(x)]k−1 [1 − F(x)]n−k , −∞ < x < ∞ .
(k − 1)!(n − k)!
(9.14)
Overview | Order Statistics 37 / 49
ORDER STATISTICS
Proof.
Let −∞ < x < ∞. To calculate P(X(k) ≤ x), note that X(k) ≤ x if and only if at
least k of the random variables X1 , X2 , . . ., Xn are in (−∞, x]. Thus
Fk (x) = P(X(k) ≤ x)
n
= ∑ P(i of the random variables X1 , X2 , . . ., Xn are in (−∞, x])
i=k
n n
= ∑ ( )[F(x)]i [1 − F(x)]n−i ,
i=k i
where the last equality follows because from the random variables
X1 , X2 , . . ., Xn the number of those that lie in (−∞, x] has binomial
distribution with parameters (n, p), p = F(x).
(2n + 1)! n 1
fn+1 (x) = x (1−x)n = x n (1−x)n , 0 < x < 1;
n!n! B(n + 1, n + 1)
0, elsewhere. Hence, X(n+1) is beta with parameters (n + 1, n + 1).
For 1 ≤ k ≤ 2n + 1, X(k) is beta with parameters k and 2n − k + 2.
Overview | Order Statistics 41 / 49
ORDER STATISTICS
Theorem 9.6
Let {X(1) , X(2) , . . . , X(n) } be the order statistics of the independent and
identically distributed continuous random variables {X1 , X2 , . . ., Xn } with the
common probability density and distribution functions f and F, respectively.
Then, for i < j and x < y, fi j (x, y), the joint probability density function of X(i)
and X( j) , is given by
fi j (x, y) =
n!
f (x)f (y)[F(x)]i−1 [F(y) − F(x)] j−i−1 [1 − F(y)]n− j .
(i − 1)!(j − i − 1)!(n − j)!
For x ≥ y, fi j (x, y) = 0.
Theorem 9.7
Let {X(1) , X(2) , . . . , X(n) } be the order statistics of the independent and
identically distributed continuous random variables {X1 , X2 , . . ., Xn } with the
common probability density and distribution functions f and F, respectively.
Then, f12⋯n , the joint probability density function of X(1) , X(2) , . . . , X(n) , is
given by
f12⋯n (x1 , x2 , . . ., xn ) =
⎧
⎪ n!f (x1 )f (x2 )⋯f (xn ) −∞ < x1 < x2 < ⋯ < xn < ∞
⎪
⎨
⎩ 0 otherwise .
⎪
⎪
• Examples:
– The distance between two towns, A and B, is 30 miles. If three gas
stations are constructed independently at randomly selected
locations between A and B, what is the probability that the
distance between any two gas stations is at least 10 miles?
□ Let X1 , X2 , and X3 be the locations at which the gas stations
are constructed. The probability density function of X1 , X2 ,
and X3 is given by
⎧
⎪ 1 if 0 < x < 30
f (x) = ⎨ 30
⎪
⎩ 0 elsewhere.
⎪
⎪
1 3
f123 (x1 , x2 , x3 ) = 3! ( ) 0 < x1 < x2 < x3 < 30 .
30
Overview 46 / 49
MULTINOMIAL DISTRIBUTIONS
• Examples:
– Marginals of Multinomials: Let X1 , X2 , . . ., Xr (r ≥ 4) have the joint
multinomial probability mass function p(x1 , x2 , . . ., xr ) with
parameters n and p1 , p2 , . . ., pr .
□ Find the marginal probability mass functions pX1 :
n!
pX1 (x1 ) = ∑ px11 px22 ⋯pxr r
x2 +x3 +⋯+xr =n−x1 x1 !x2 !⋯xr !
n! (n − x1 )! x2 x3 xr
= px11 ∑ p2 p3 ⋯pr
x1 !(n − x1 )! x2 +x3 +⋯+xr =n−x1 x2 !⋯xr !
n!
= px1 (p2 + p3 + ⋯ + pr )n−x1
x1 !(n − x1 )! 1
n!
= px11 (1 − p1 )n−x1 .
x1 !(n − x1 )!
Remark 9.4
The method of Example 9.14 can be extended to prove the following theorem:
Let the joint distribution of the random variables X1 , X2 , . . ., Xr be
multinomial with parameters n and p1 , p2 , . . ., pr . The joint proba-
bility mass function marginalized over a subset Xi1 , Xi2 , . . ., Xik of k
(k > 1) of these r random variables is multinomial with parameters
n and pi1 , pi2 , . . ., pik , 1 − pi1 − pi2 − . . . − pik .