Joint Probability Distribution Reference 1
Joint Probability Distribution Reference 1
6.1 INTRODUCTION
In Chapters 4 and 5, we discussed various phenomena by enlisting a (single) random
variable and studying its distribution and characteristics. However, in many situations,
experiments are performed that involve two or more random variables. For example, we
may be interested in the diameter and length of rods, the number of dots on two dice when
rolled simultaneously, say (X, Y ), where 1 ≤ X ≤ 6, 1 ≤ Y ≤ 6, or the composition of a
Monel (70% nickel, 30% copper) alloy, where we may focus on solid contents, say X, and
liquid content Y, which again we would quote as a joint pair (X, Y ).
In this chapter, then, we will study the joint distribution functions of two or more
discrete and continuous random variables.
then the set of all possible values {(xi , yj )} of (X(e), Y (e)) is called the sample space of
(X(e), Y (e)), while the set of associated probabilities pij is the joint probability function
(p.f.) of the pair of discrete random variables (X, Y ).
Thus, we may think of k = mn points (xi , yj ) in the xy-plane in which the probabilities
pij are located and are all positive and sum to 1. If we define pi· and p·j such that
n
m
pi. = pij and p.j = pij (6.2.2)
j=1 i=1
then
pi. = P (X(e) = xi ) and p.j = P (Y (e) = yj ) (6.2.3)
The possible values xi , i = 1, 2, . . . , m, of X(e) together with their probabilities pi. consti-
tute the marginal distribution of the random variable X. This gives rise to the probability
function of X, ignoring Y, and is therefore merely the probability function of X. In a simi-
lar manner, the yj , j = 1, 2, . . . , n, of Y (e) together with their probabilities p.j constitute
the marginal distribution of the random variable Y.
Geometrically, if x is the usual horizontal axis and y the vertical axis and if we project
the sum of the probabilities pi1 , . . . , pij , . . . , pin located at the points [(xi , y1 ), . . . , (xi , yj ),
. . . , (xi , yn )], vertically onto the x-axis, we obtain the marginal distribution pi. of the ran-
dom variable X. If instead we project sum of these probabilities p1j , . . . , pij , . . . , pmj hori-
zontally onto the y-axis, we obtain the marginal distribution p.j of the random variable Y.
The mean μ1 and variance σ12 of X are defined by applying (4.2.1) and (4.2.2) to the
probability distribution pi. . Similarly, the mean μ2 and variance σ22 of Y are defined by
applying those formulas to p.j .
230 6 Distribution of Functions of Random Variables
When the probability function pij factors into the product of the two marginal
probability functions, that is, if for all possible (xi , yj ) in the sample space of (X, Y ),
we have
pij = pi. p.j (6.2.4)
Example 6.2.1 (Probability function of two random variables) Roll a pair of fair dice,
of which one die is green and the other is red. Let the random variables X and Y denote
the outcomes on the green and red dies, respectively. Then, the sample space of (X, Y ) is
S = {(1, 1), (1, 2), . . . , (1, 6), . . . , (6, 6)}. Each of the 36 sample points has the probability
1/36. Then, the joint probability function of the random variables X and Y can be written
in tabular form as follows:
X 1 2 3 4 5 6 Total (p·j )
Y
1 1/36 1/36 1/36 1/36 1/36 1/36 1/6
.. .. .. ..
2 1/36 . . . . 1/6
3 1/36 ··· 1/36 ··· 1/36 1/36 1/6
.. .. .. ..
4 1/36 . . . . 1/6
5 1/36 ··· 1/36 ··· 1/36 1/36 1/6
6 1/36 ··· 1/36 ··· 1/36 1/36 1/6
Total (pi· ) 1/6 1/6 1/6 1/6 1/6 1/6 1
This table shows the probabilities assigned to each sample point. Using (6.2.2) for the
probabilities, we easily find the marginal distributions pi· and p·j of the random variables
X and Y, respectively, as shown in the table. The probability function in this example can
also be expressed as
1/36, x = 1, 2, 3, 4, 5, 6 and y = 1, 2, 3, 4, 5, 6
p(x, y) =
0, otherwise
Example 6.2.2 (Marginal probability functions) Let the joint probability function of
random variables X and Y be defined as
x+y
P (x, y) = , x = 1, 2, 3, 4; y = 1, 2, 3
54
Find the marginal probability functions of X and Y and also examine whether X and Y
are independent.
6.2 Distribution Functions of Two Random Variables 231
0.033
0.030
p(x, y)
0.027
6
0.024 5
4
3 y
2
1 2 1
3 4 5 6
x
Solution: From equation (6.2.2), it follows that the probability of X, say p1 (x), is given by
3
x+y x+1 x+2 x+3 3x + 6
P (X = x) = p1 (x) = = + + = , for x = 1, 2, 3, 4
y=1
54 54 54 54 54
Similarly,
4
x+y 10 + 4y
P (Y = y) = p2 (y) = = , for y = 1, 2, 3
x=1
54 54
For (x, y) belonging to the sample space of (X, Y ), say Sxy = {(x, y)|x = 1, 2, 3, 4; y =
1, 2, 3}, we have that
p(x, y) = p1 (x) × p2 (y)
Example 6.2.3 (Joint probability function and its marginals) In dealing a hand of 13
cards from a deck of ordinary playing cards, let X1 and X2 be random variables denoting
the numbers of spades and of hearts, respectively. Obviously, 0 ≤ X1 ≤ 13, 0 ≤ X2 ≤ 13,
and 0 ≤ X1 + X2 ≤ 13. Then, we see that p(x1 , x2 ), the p.f. of (x1 , x2 ), is given by
13 13 26
x1 x2 13−x1 −x2
p(x1 , x2 ) = 52
13
where the sample space of (X1 , X2 ) is all pairs of nonnegative integers (x1 , x2 ) for which
0 ≤ x1 , x2 ≤ 13 and 0 ≤ x1 + x2 ≤ 13. That is, the sample space {(x1 , x2 )} consists of the
105 points:
{(0, 0), . . . , (0, 13), . . . , (12, 0), (12, 1), (13, 0)}
where 0 ≤ x1 ≤ 13.
232 6 Distribution of Functions of Random Variables
In a similar manner, it is easy to find p2 (x2 ) and to show that the random variables
X1 and X2 are not independent.
then F (x1 , x2 ) is called the cumulative distribution function (c.d.f.) of the pair of random
variables (X1 , X2 ) (dropping e). If there exists a nonnegative function f (x1 , x2 ) such that
x1 x2
F (x1 , x2 ) = f (t1 , t2 )dt2 dt1 (6.2.6)
−∞ −∞
then
∂ 2 F (x1 , x2 )
f (x1 , x2 ) =
∂x1 ∂x2
and f (x1 , x2 ) is called the joint probability density function (p.d.f.) of the pair of random
variables (X1 , X2 ). The probability that this pair of random variables represents a point
in a region E, that is, the probability that the event E occurs, is given by
P ((X1 , X2 ) ∈ E) = f (x1 , x2 )dx2 dx1 (6.2.7)
E
Note that if E = {(X1 , X2 )|X1 < x1 , X2 < x2 }, then (6.2.7) equals F (x1 , x2 ). Also, if we let
∞
f1 (x1 ) = f (x1 , x2 )dx2 (6.2.8)
−∞
∞
f2 (x2 ) = f (x1 , x2 )dx1 (6.2.9)
−∞
then f1 (x1 ) and f2 (x2 ) are called the marginal probability density functions of X1 and X2 ,
respectively. This means that f1 (x1 ) is the p.d.f. of X1 (ignoring X2 ), and f2 (x2 ) is the
p.d.f. of X2 (ignoring X1 ).
Geometrically, if we think of f (x1 , x2 ) as a function describing the manner in which
the total probability 1 is continuously “smeared” in the x1 x2 -plane, then the integral in
(6.2.7) represents the amount of probability contained in the region E. Also, f1 (x1 ) is the
p.d.f. one obtains by projecting the probability density in the x1 x2 -plane orthogonally onto
the x1 -axis, and f2 (x2 ) is similarly obtained by orthogonal projection of the probability
density onto the x2 -axis.
If f (x1 , x2 ) factors into the product of the two marginal p.d.f.’s, that is, if
for all (x1 , x2 ) in the sample space of (X1 , X2 ), then X1 and X2 are said to be
independent continuous random variables.
6.2 Distribution Functions of Two Random Variables 233
Example 6.2.4 (Marginal probability functions) Let the joint probability density function
of the random variables X1 and X2 be defined as
Find the marginal probability density functions of X1 and X2 and examine whether or not
X1 and X2 are independent.
Solution: From equations (6.2.8) and (6.2.9), it follows that for x1 > 0
∞ ∞
f1 (x1 ) = 2e−(2x1 +x2 ) dx2 = 2e−2x1 e−x2 dx2
0 0
= 2e−2x1 [−e−x2 ]∞
0 = 2e
−2x1
, x1 > 0
Here, we clearly have that f (x1 , x2 ) = f1 (x1 )f2 (x2 ), which implies that the random vari-
ables X1 and X2 are independent.
Finally, note that the joint distribution function satisfies the properties given below:
The reader should verify that the left-hand side of (6.2.11) gives the value of P (x11 <
X1 < x12 , x21 < X2 < x22 ).
where the summation is over all pairs (x1i , x2j ) in the sample space of (X1 , X2 ), and
for the continuous case we have that
∞ ∞
E(g(X1 , X2 )) = g(x1 , x2 )f (x1 , x2 )dx1 dx2 (6.2.13)
−∞ −∞
We may now state, and the reader should verify equation (6.2.14) in Theorem 6.2.1
stated below.
In many problems, however, we deal with linear functions of two or even more independent
random variables. The following theorem is of particular importance in this connection:
Theorem 6.2.2 Let X1 and X2 be independent random variables such that the
mean and variance of X1 are μ1 and σ12 , and the mean and variance of X2 are μ2
and σ22 . Then, if c1 and c2 are constants, c1 X1 + c2 X2 is a random variable having
mean value c1 μ1 + c2 μ2 and variance c21 σ12 + c22 σ22 .
Proof: To prove this theorem, it is sufficient to consider the case of continuous random
variables. (The proof for discrete random variables is obtained by replacing integral signs
by signs of summation.) For the mean value of c1 X1 + c2 X2 , we have, since X1 and X2 are
independent, that
∞ ∞
E(c1 X1 + c2 X2 ) = (c1 x1 + c2 x2 )f1 (x1 )f2 (x2 )dx1 dx2
−∞ −∞
∞ ∞ ∞ ∞
= c1 x1 f1 (x1 )dx1 f2 (x2 )dx2 + c2 x2 f2 (x2 )dx2 f1 (x1 )dx1
−∞ −∞ −∞ −∞
= c1 E(X1 ) + c2 E(X2 )
= c1 μ1 + c2 μ2
6.2 Distribution Functions of Two Random Variables 235
E(c1 X1 + c2 X2 ) = c1 μ1 + c2 μ2
V ar(c1 X1 + c2 X2 ) = c21 σ12 + c22 σ22 + 2c1 c2 Cov(X1 , X2 )
Theorem 6.2.3 Let X1 , X2 , . . . , Xn be n random variables such that the mean and
variance of Xi are μi and σi2 respectively, and where the covariance of Xi and Xj
is σij , that is, E[(Xi − μi )(Xj − μj )] = σij , i = j. If c1 , c2 , . . . , cn are constants,
then the random variable L = c1 X1 + · · · + cn Xn has mean value and variance that
are given by
E(L) = c1 μ1 + · · · + cn μn (6.2.17)
V ar(L) = c21 σ12 + · · · + c2n σn2 + 2c1 c2 σ12 + 2c1 c3 σ13 + · · · + 2cn−1 cn σn−1,n (6.2.18)
p(x1 , x2 )
p(x2 |x1 ) = (6.2.20)
p1 (x1 )
where p1 (x1 ) = 0.
236 6 Distribution of Functions of Random Variables
Note that p(x2 |x1 ) has all the properties of an ordinary probability function; that is,
as the reader should verify, the sum of p(x2 |x1 ) over all possible values of x2 , for fixed x1 ,
is 1. Thus, p(x2 |x1 ), x2 = x21 , . . . , x2k2 , is a p.f. and is called the conditional probability
function of X2 , given that X1 = x1 .
Note that we can write (6.2.20) as
to provide a two-step procedure for finding p(x1 , x2 ) by first determining p1 (x1 ), then
p(x2 |x1 ), and by multiplying the two together.
f (x1 , x2 )
f (x2 |x1 ) = (6.2.22)
f1 (x1 )
where f1 (x1 ) = 0, which is the analogue of (6.2.20), now for a pair of continuous random
variables. Note that f (x2 |x1 ) has all the properties of an ordinary probability density
function.
Now from (6.2.22), we have the result that is given below.
We now have the analogue of (6.2.21) for obtaining the probability density function of a
pair of continuous random variables in two steps, as given in (6.2.22a).
Solution: Because the probability density function is constant over the triangle defined
by S in the x1 , x2 -plane (see Figure 6.2.2), we sometimes say that (X1 , X2 ) is uniformly
distributed over S.
f(x1, x2)
(0, 0)
x1
S (1, 0)
x1 + x2 = 1
(0, 1)
x2
Hence, ⎧
⎨ 2
= 1
0 < x2 < 1 − x1
2(1−x1 ) (1−x1 ) , for
f (x2 |x1 ) =
⎩0, otherwise
Note that if (X1 , X2 ) is a pair of discrete random variables, then the conditional mean
and variance of X2 given X1 = x1 are defined as given at (6.2.23) and (6.2.24).
238 6 Distribution of Functions of Random Variables
E(X2 |X1 = x1 ) = x2 p(x2 |x1 ) (6.2.23)
x2
where p(x2 |x1 ) is the conditional probability function of the random variable X2 given
X1 = x1 . The mean and variance for other functions of X2 given X1 = x1 can be defined
in the same manner.
Similarly, for the case of a pair of continuous random variables, we have the following
results.
∞
E(X2 |X1 = x1 ) = x2 f (x2 |x1 )dx2 (6.2.25)
∞ −∞
V ar(X2 |X1 = x1 ) = [x2 − E(x2 |x1 )]2 f (x2 |x1 )dx2 (6.2.26)
−∞
where σ1 and σ2 are the population standard deviations of X1 and X2 , respectively. It can
be shown that −1 ≤ ρ ≤ 1, and hence, we have that −σ1 σ2 ≤ Cov(X1 , X2 ) ≤ σ1 σ2 .
Now, from (6.2.16) and using (6.2.17), we have that if X1 and X2 are independent
random variables, then ρ = 0. The converse need not be true however, as the following
example shows.
Example 6.2.7 (Independence and correlation coefficient) Two random variables X1 and
X2 have joint probability function given by
1/3, if , (x1 , x2 ) = (0, 0), (1, 1), (2, 0)
p(x1 , x2 ) =
0, otherwise
and that
2/3, if x2 = 0
p2 (x2 ) =
1/3, if x2 = 1
Hence, p(0, 0) = p1 (0)p2 (0), and so on, and X1 and X2 are not independent. Simple calcu-
lations further show that
Therefore, the correlation coefficient has value ρ = 0, yet X1 and X2 are not independent.
Example 6.2.8 (Joint probability density function) Let the length of life (in years) of both
an operating system and the hard drive of a computer be denoted by the random variables
X1 and X2 , respectively. Suppose that the joint distribution of the random variables of X1
and X2 is given by
x x e−(x1 +x2 )/2 , if x1 > 0, x2 > 0
1 2
f (x1 , x2 ) = 64 1 2
0, otherwise
Solution:
(a) The marginal probability density function of X is given by
∞ ∞
1 2 −(x1 +x2 )/2 1 2 −x1 /2 ∞
f1 (x1 ) = f (x1 , x2 )dx2 = x1 x2 e dx2 = x1 e x2 e−x2 /2 dx2
0 0 64 64 0
x2
f (x1, x2)
x1
This result can be proved in exactly the same manner as in single random-variable
(univariate) case.
6.2 Distribution Functions of Two Random Variables 241
Theorem 6.2.5 Let X1 and X2 be random variables with means μ1 and μ2 , respec-
tively. Then,
Cov(X1 , X2 ) = E(X1 X2 ) − μ1 μ2 (6.2.29)
The reader should now use the result of Theorem 6.2.1 together with equation (6.2.29)
to show the following corollary:
1
f (x1 , x2 ) =
2πσ1 σ2 (1 − ρ2 )
1 (x1 − μ1 )2 (x1 − μ1 )(x2 − μ2 ) (x2 − μ2 )2
× exp − − 2ρ +
2(1 − ρ2 ) σ12 σ1 σ2 σ22
(6.2.30)