Multivariate Discrete Distributions
Multivariate Discrete Distributions
Example 2.1 (Coin Tossing). Consider the experiment of tossing a fair coin three
times. Let X be the number of heads among the first two tosses, and Y the num-
ber of heads among the last two tosses. If we consider X and Y individually, we
realize immediately that they are each Bin.2; :5/ random variables. But the individ-
ual distributions hide part of the full story. For example, if we knew that X was 2,
then that would imply that Y must be at least 1. Thus, their joint behavior cannot
be fully understood from their individual distributions; we must study their joint
distribution.
Here is what we mean by their joint distribution. The sample space of this
experiment is
Each sample point has an equal probability 18 . Denoting the sample points as
!1 ; !2 ; : : : ; !8 , we see that if !1 prevails, then X.!1 / D Y .!1 / D 2. but if !2
prevails, then X.!2 / D 2; Y .!2 / D 1. The combinations of all possible values of
.X; Y / are
.0; 0/; .0; 1/; .0; 2/; .1; 0/; .1; 1/; .1; 2/; .2; 0/; .2; 1/; .2; 2/:
The joint distribution of .X; Y / provides the probability p.x; y/ D P .X Dx; Y Dy/
for each such combination of possible values .x; y/. Indeed, by direct counting us-
ing the eight equally likely sample points, we see that
1 1 1 1
p.0; 0/ D ; p.0; 1/ D ; p.0; 2/ D 0; p.1; 0/ D ; p.1; 1/ D I
8 8 8 4
1 1 1
p.1; 2/ D ; p.2; 0/ D 0; p.2; 1/ D ; p.2; 2/ D :
8 8 8
Y
X 0 1 2
1 1
0 8 8
0
1 1 1
1 8 4 8
1 1
2 0 8 8
Such a layout is a convenient way to present the joint distribution of two discrete
random variables with a small number of values. The distribution itself is called the
joint pmf ; here is a formal definition.
Definition 2.1. Let X; Y be two discrete random variables with respective sets of
values x1 ; x2 ; : : : ; and y1 ; y2 ; : : : ; defined on a common sample space . The joint
pmf of X; Y is defined to be the function p.xi ; yj /DP .X Dxi ; Y Dyj /; i; j 1,
and p.x; y/ D 0 at any other point .x; y/ in R2 .
2.1 Bivariate Joint Distributions and Expectations of Functions 97
Y
X 1 2 3 4 5 6
1
1 36 0 0 0 0 0
1 1
2 18 36
0 0 0 0
1 1 1
3 18 18 36
0 0 0
1 1 1 1
4 18 18 18 36
0 0
1 1 1 1 1
5 18 18 18 18 36
0
1 1 1 1 1 1
6 18 18 18 18 18 36
The individual pmfs of X; Y are easily recovered from the joint distribution. For
example,
X
6
1
P .X D 1/ D P .X D 1; Y D y/ D ; and
yD1
36
X
6
1 1 1
P .X D 2/ D P .X D 2; Y D y/ D C D ;
yD1
18 36 12
98 2 Multivariate Discrete Distributions
and see on. The individual pmfs are obtained by summing the joint probabilities
over all values of the other variable. They are:
x 1 2 3 4 5 6
1 3 5 7 9 11
pX .x/ 36 36 36 36 36 36
y 1 2 3 4 5 6
11 9 7 5 3 1
pY .y/ 36 36 36 36 36 36
X
n X
n
p.x; y/ D 1
xD1 yD1
X
n X
n
,c .x C y/ D 1
xD1 yD1
Xn
n.n C 1/
,c nx C D1
xD1
2
2
n .n C 1/ n2 .n C 1/
,c C D1
2 2
, cn2 .n C 1/ D 1
1
,c D :
n2 .n C 1/
2.1 Bivariate Joint Distributions and Expectations of Functions 99
The joint pmf is symmetric between x and y (because x C y D y C x), and so,
X; Y have the same marginal pmf. For example, X has the pmf
Xn
1 Xn
pX .x/ D p.x; y/ D 2 .x C y/
yD1
n .n C 1/ yD1
1 n.n C 1/
D 2 nx C
n .n C 1/ 2
x 1
D C ; 1 x n:
n.n C 1/ 2n
Suppose now we want to compute P .X > Y /. This can be found by summing
p.x; y/ over all combinations for which x > y. But this longer calculation
can be avoided by using a symmetry argument that is often very useful. Note
that because the joint pmf is symmetric between x and y, we must have
P .X > Y / D P .Y > X / D p (say). But, also,
P .X > Y / C P .Y > X / C P .X D Y / D 1 ) 2p C P .X D Y / D 1
1 P .X D Y /
)pD :
2
Now,
X
n X
n
P .X D Y / D p.x; x/ D c 2x
xD1 xD1
1 1
D n.n C 1/ D :
n2 .n C 1/ n
Therefore, P .X > Y / D p D n1
2n
12 ; for large n.
Example 2.4 (Dice Rolls Revisited). Consider again the example of two rolls of a
fair die, and suppose X; Y are the larger and the smaller of the two rolls. We have
worked out the joint distribution of .X; Y / in Example 2.2. Suppose we want to
find the distribution of the difference, X Y . The possible values of X Y are
0; 1; : : : ; 5, and we find P .X Y D k/ by using the joint distribution of .X; Y /:
1
P .X Y D 0/ D p.1; 1/ C p.2; 2/ C C p.6; 6/ D I
6
5
P .X Y D 1/ D p.2; 1/ C p.3; 2/ C C p.6; 5/ D I
18
2
P .X Y D 2/ D p.3; 1/ C p.4; 2/ C p.5; 3/ C p.6; 4/ D I
9
1
P .X Y D 3/ D p.4; 1/ C p.5; 2/ C p.6; 3/ D I
6
1
P .X Y D 4/ D p.5; 1/ C p.6; 2/ D I
9
1
P .X Y D 5/ D p.6; 1/ D :
18
There is no way to find the distribution of X Y except by using the joint distribution
of .X; Y /.
100 2 Multivariate Discrete Distributions
Suppose now we also want to know the expected value of X Y . Now that we
have the distribution of X Y worked out, we can find the expectation by directly
using the definition of expectation:
X
5
E.X Y / D kP .X Y D k/
kD0
5 4 1 4 5 35
D C C C C D :
18 9 2 9 18 18
161 91 35
E.X Y / D E.X / E.Y / D D
36 36 18
Sometimes we want to know what the expected value is of one of the variables,
say X , if we knew the value of the other variable Y . For example, in the die tossing
experiment above, what should we expect the larger of the two rolls to be if the
smaller roll is known to be 2?
To answer this question, we have to find the probabilities of the various values
of X , conditional on knowing that Y equals some given y, and then average by
using these conditional probabilities. Here are the formal definitions.
Definition 2.4 (Conditional Distribution). Let .X; Y / have the joint pmf p.x; y/.
The conditional distribution of X given Y D y is defined to be
p.x; y/
p.xjy/ D P .X D xjY D y/ D ;
pY .y/
2.2 Conditional Distributions and Conditional Expectations 101
Definition 2.5 (Independence). Let .X; Y / have the joint pmf p.x; y/. Then X; Y
are said to be independent if
The third equivalent condition in the above list is usually the most convenient one
to verify and use.
One more frequently useful fact about conditional expectations is the following.
Proposition. Suppose X; Y are independent random variables. Then, for any func-
tion g.X / such that the expectations below exist, and for any y,
Example 2.5 (Maximum and Minimum in Dice Rolls). In the experiment of two rolls
of a fair die, we have worked out the joint distribution of X; Y , where X is the larger
102 2 Multivariate Discrete Distributions
and Y the smaller of the two rolls. Using this joint distribution, we can now find the
conditional distributions. For instance,
Y
X 0 1
0 s t
1 u v
0 p.0; 0/ C 1 p.1; 0/ u
E.X jY D 0/ D D I
p.0; 0/ C p.1; 0/ sCu
0 p.0; 1/ C 1 p.1; 1/ v
E.X jY D 1/ D D :
p.0; 1/ C p.1; 1/ t Cv
Therefore,
v u vs ut
E.X jY D 1/ E.X jY D 0/ D D :
t Cv sCu .t C v/.s C u/
u vs ut
E.X jY D y/ D C y;
sCu .t C v/.s C u/
1 1
C 18 Œ2 C
1
C 6 41
E.X jY D 1/ D 36
D D 3:73I
1
36
C 185 11
as another example,
3 1
C 18
1
15 33
E.X jY D 3/ D 36
D D 4:71I
1
36
C 18
3 7
and,
5 1
C6 1
17
E.X jY D 5/ D 36 18
D D 5:77:
1
36
C 1
18
3
We notice that E.X jY D 5/ > E.X jY D 3/ > E.X jY D 1/I in fact, it is true that
E.X jY D y/ is increasing in y in this example. This does make intuitive sense.
Just as in the case of a distribution of a single variable, we often also want a mea-
sure of variability in addition to a measure of average for conditional distributions.
This motivates defining a conditional variance.
Definition 2.6 (Conditional Variance). Let .X; Y / have the joint pmf p.x; y/.
Let X .y/ D E.X jY D y/: The conditional variance of X given Y D y is
defined to be
X
Var.X jY D y/ D EŒ.X X .y//2 jY D y D .x X .y//2 p.xjy/:
x
Example 2.8 (Conditional Variance in Dice Experiment). We work out the condi-
tional variance of the maximum of two rolls of a die given the minimum. That is,
suppose a fair die is rolled twice, and X; Y are the larger and the smaller of the two
rolls; we want to compute Var.X jy/.
For example, if y D 3, then X .y/ D E.X jY D y/ D E.X jY D 3/ D 4:71
(see the previous example). Therefore,
X
Var.X jy/ D .x 4:71/2 p.xj3/
x
.3 4:71/2 36
1
C.4 4:71/2 1
C.5 4:71/2 18
1
C.6 4:71/2 1
D 18 18
1
36
C 18 C 18 C 18
1 1 1
D 1:06:
104 2 Multivariate Discrete Distributions
To summarize, given that the minimum of two rolls of a fair die is 3, the expected
value of the maximum is 4.71 and the variance of the maximum is 1.06.
These two values, E.X jy/ and Var.X jy/, change as we change the given
value y. Thus E.X jy/ and Var.X jy/ are functions of y, and for each separate y, a
new calculation is needed. If X; Y happen to be independent, then of course what-
ever y is, E.X jy/ D E.X /, and Var.X jy/ D Var.X /.
The next result is an important one in many applications.
P .X D x; X C Y D t/
P .X D xjX C Y D t/ D
P .X C Y D t/
P .X D x; Y D t x/
D
P .X C Y D t/
e x e t x tŠ
D
xŠ .t x/Š e .C/ . C /t
tŠ x t x
D
xŠ.t x/Š . C /t
! x t x
t
D ;
x C C
which is the pmf of the Bin.t; C / distribution. t
u
Proof. We prove this for the discrete case. By definition of conditional expectation,
P
xp.x; y/
X .y/ D x
pY .y/
X XX XX
) X .y/pY .y/ D xp.x; y/ D xp.x; y/
y y x x y
X X X
D x p.x; y/ D xpX .x/ D E.X /:
x y x
The corresponding variance calculation formula is the following. The proof of
this uses the iterated mean formula above, and applies it to .X X /2 . u
t
Remark. These two formulas for iterated expectation and iterated variance are valid
for all types of variables, not just the discrete ones. Thus, these same formulas still
hold when we discuss joint distributions for continuous random variables in the next
chapter.
Some operational formulas that one should be familiar with are summarized
below.
Let us see some applications of the two iterated expectation and iterated variance
formulas.
106 2 Multivariate Discrete Distributions
Example 2.9 (A Two-Stage Experiment). Suppose n fair dice are rolled. Those that
show a six are rolled again. What are the mean and the variance of the number of
sixes obtained in the second round of this experiment?
Define Y to be the number of dice in the first round that show a six, and X the
number of dice in the second round that show a six. Given Y D y; X Bin.y; 16 /,
and Y itself is distributed as Bin.n; 16 /. Therefore,
hy i n
E.X / D EŒE.X jY D y/ D EY D :
6 36
Also,
and,
Interestingly, the number of eggs actually fertilized has the same mean and variance
p, (Can you see why?)
Remark. In all of these examples, it was important to choose the variable Y wisely
on which one should condition. The efficiency of the technique depends on this very
crucially.
Sometimes a formal generalization of the iterated expectation formula when a
third variable Z is present is useful. It is particularly useful in hierarchical statis-
tical modeling of distributions, where an ultimate marginal distribution for some
X is constructed by first conditioning on a number of auxiliary variables, and then
gradually unconditioning them. We state the more general iterated expectation for-
mula; its proof is exactly similar to that of the usual iterated expectation formula.
2.4 Covariance and Correlation 107
We know that variance is additive for independent random variables; that is, if
X1 ; X2 ; : : : ; Xn are independent random variables, then Var.X1 CX2 C CXn / D
Var.X1 / C C Var.Xn /: In particular, for two independent random variables
X; Y; Var.X CY / D Var.X /CVar.Y /: However, in general, variance is not additive.
Let us do the general calculation for Var.X C Y /.
We thus have the extra term 2ŒE.X Y / E.X /E.Y / in the expression for
Var.X C Y /; of course, when X; Y are independent, E.X Y / D E.X /E.Y /, and
so the extra term drops out. But, in general, one has to keep the extra term. The
quantity E.X Y / E.X /E.Y / is called the covariance of X and Y .
Definition 2.7 (Covariance). Let X; Y be two random variables defined on a
common sample space , such that E.X Y /; E.X /; E.Y / all exist. The covariance
of X and Y is defined as
Cov.X; Y /
X;Y D p p :
Var.X / Var.Y /
Some important properties of covariance and correlation are put together in the next
theorem.
Theorem 2.6 (Properties of Covariance and Correlation). Provided that the re-
quired variances and the covariances exist,
(a) Cov.X; c/ D 0 for any X and any constant cI
(b) Cov.X; X / D var.X / for!any X I
Pn P
m Pn Pm
(c) Cov ai Xi ; bj Yj D ai bj Cov.Xi ; Yj /;
i D1 j D1 i D1 j D1
and in particular,
Var.aX CbY / D Cov.aX CbY ;aX CbY /
D a2 Var.X /Cb 2 Var.Y /C2abCov.X; Y /;
and, !
X
n X
n X X
n
Var Xi D Var.Xi / C 2 Cov.Xi ; Xj /I
i D1 i D1 i <j D1
X
n X
m X
n X
m
D ai bj E.Xi ; Yj / ai bj E.Xi /E.Yj /
i D1 j D1 i D1 j D1
X
n X
m
D ai bj ŒE.Xi ; Yj / E.Xi /E.Yj /
i D1 j D1
X
n X
m
D ai bj Cov.Xi ; Yj /:
i D1 j D1
Part (d) follows on noting that E.X Y / D E.X /E.Y / if X; Y are independent. For
part (e), first note that Cov.a C bX; c C d Y / D bd Cov.X; Y / by using part (a)
and part (c). Also, Var.a C bX / D b 2 Var.X /; Var.c C d Y / D d 2 Var.Y /
bd Cov.X; Y /
) aCbX;cCd Y D p p
b 2 Var.X /
d 2 Var.Y /
bd Cov.X; Y /
D p p
jbj Var.X /jd j Var.Y /
bd
D X;Y D sgn.bd /X;Y :
jbd j
The proof of part (f) uses the Cauchy–Schwarz inequality (see Chapter 1) that for
any two random variables U; V; ŒE.U V /2 E.U 2 /E.V 2 /. Let
X E.X / Y E.Y /
U D p ; V D p :
Var.X / Var.Y /
Example 2.11 (Correlation Between Minimum and Maximum in Dice Rolls). Con-
sider again the experiment of rolling a fair die twice, and let X; Y be the maximum
and the minimum of the two rolls. We want to find the correlation between X; Y .
The joint distribution of .X; Y / was worked out in Example 2.2. From the joint
distribution,
1 2 4 3 6 9 30 36 49
E.X Y / D C C C C C CC C D :
36 18 36 18 18 36 18 36 4
110 2 Multivariate Discrete Distributions
The marginal pmfs of X; Y were also worked out in Example 2.2. From the marginal
pmfs, by direct calculation, E.X / D 161=36; E.Y / D 91=36; Var.X / D Var.Y / D
2555=1296: Therefore,
The correlation between the maximum and the minimum is in fact positive for any
number of rolls of a die, although the correlation will converge to zero when the
number of rolls converges to 1.
Example 2.12 (Correlation in the Chicken–Eggs Example). Consider again the ex-
ample of a chicken laying a Poisson number of eggs N with mean , and each egg
fertilizing, independently of others, with probability p. If X is the number of eggs
actually fertilized, we want to find the correlation between the number of eggs laid
and the number fertilized, that is, the correlation between X and N .
First,
Next, from our previous calculations, E.X / D p; E.N / D ; Var.X / D p;
Var.N / D : Therefore,
Thus, the correlation goes up with the fertility rate of the eggs.
Example 2.13 (Best Linear Predictor). Suppose X and Y are two jointly distributed
random variables, and either by necessity, or by omission, the variable Y was not
observed. But X was observed, and there may be some information in the X value
about Y . The problem is to predict Y by using X . Linear predictors, because of their
functional simplicity, are appealing. The mathematical problem is to choose the best
linear predictor a C bX of Y , where best is defined as the predictor that minimizes
the mean squared error EŒY .a C bX /2 . We show that the answer has something
to do with the covariance between X and Y .
By breaking the square, R.a; b/
@
R.a; b/ D 2a C 2bE.X / 2E.Y / D 0
@a
, a C bE.X / D E.Y /I
@
R.a; b/ D 2bE.X 2 / C 2aE.X / 2E.X Y / D 0
@b
, aE.X / C bE.X 2 / D E.X Y /:
Cov.X; Y / Cov.X; Y /
best linear predictor of Y D E.Y / E.X / C X
Var.X / Var.X /
Cov.X; Y /
D E.Y / C ŒX E.X /:
Var.X /
Example 2.14 (Zero Correlation Does Not Mean Independence). If X; Y are inde-
pendent, then necessarily Cov.X; Y / D 0, and hence the correlation is also zero.
The converse is not true. Take a three-valued random variable X with the pmf
P .X D ˙1/ D p; P .X D 0/ D 1 2p; 0 < p < 12 . Let the other variable
Y be Y D X 2 : Then, E.X Y / D E.X 3 / D 0, and E.X /E.Y / D 0, because
E.X / D 0. Therefore, Cov.X; Y / D 0. But X; Y are certainly not independent; for
example, P .Y D 0jX D 0/ D 1, but P .Y D 0/ D 1 2p ¤ 0:
Indeed, if X has a distribution symmetric around zero, and if X has three finite
moments, then X and X 2 always have a zero correlation, although they are not
independent.
The extension of the concepts for the bivariate discrete case to the multivariate dis-
crete case is straightforward. We give the appropriate definitions and an important
example, namely that of the multinomial distribution, an extension of the binomial
distribution.
112 2 Multivariate Discrete Distributions
Once again, we mention that it is not convenient or interesting to work with the
CDF for discrete random variables; for discrete variables, it is preferable to work
with the pmf.
Analogous to the case of one random variable, we can define the joint mgf for sev-
eral random variables. The definition is the same for all types of random variables,
discrete or continuous, or other mixed types. As in the one-dimensional case, the
joint mgf of several random variables is also a very useful tool. First, we repeat
the definition of expectation of a function of several random random variables; see
Chapter 1, where it was first introduced and defined. The definition below is equiv-
alent to what was given in Chapter 1.
Remark. It is important to note that the last two theorems are not limited to discrete
random variables; they are valid for general random variables. The proofs of these
two theorems follow the same arguments as in the one-dimensional case, namely
that when an mgf exists in a nonempty open rectangle, it can be differentiated in-
finitely often with respect to each variable ti inside the expectation; that is, the order
of the derivative and the expectation can be interchanged.
One of the most important multivariate discrete distributions is the multinomial dis-
tribution. The multinomial distribution corresponds to n balls being distributed to k
cells, independently, with each ball having the probability pi of being dropped into
the i th cell. The random variables under consideration are X1 ; X2 ; : : : ; Xk , where
Xi is the number of balls that get dropped into the i th cell. Then their joint pmf is
the multinomial pmf defined below.
Definition 2.13. A multivariate random vector .X1 ; X2 ; : : : ; Xk / is said to have a
multinomial distribution with parameters n; p1 ; p2 ; : : : ; pk if it has the pmf
nŠ x
P .X1 D x1 ; X2 D x2 ; : : : ; Xk D xk / D p x1 p x2 : : : pk k ;
x1 Šx2 Š xk Š 1 2
X
k
xi 0; xi Dn;
i D1
P
pi 0; kiD1 pi D 1:
We write .X1 ; X2 ; : : : ; Xk / Mult.n; p1 ; : : : ; pk / to denote a random vector
with a multinomial distribution.
Example 2.15 (Dice Rolls). Suppose a fair die is rolled 30 times. We want to find
the probabilities that
(i) Each face is obtained exactly five times.
(ii) The number of sixes is at least five.
If we denote the number of times face number i is obtained as Xi , then
.X1 ; X2 ; : : : ; X6 / Mult.n; p1 ; : : : ; p6 /, where n D 30 and each pi D 16 .
Therefore,
P .X1 D 5; X2 D 5; : : : ; X6 D 5/
5
30Š 1 5 1
D :::
.5Š/6 6 6
30Š 1 30
D
.5Š/6 6
D :0004:
2.5 Multivariate Case 115
Next, each of the 30 rolls will either be a 6 or not, independently of the other rolls,
with probability 16 , and so, X6 Bin.30; 16 /: Therefore,
!
X4
30 1 x 5 30x
P .X6 5/ D 1 P .X6 4/ D 1
xD0
x 6 6
D :5757:
Example 2.16 (Bridge). Consider a Bridge game with four players, North, South,
East, and West. We want to find the probability that North and South together
have two or more aces. Let Xi denote the number of aces in the hands of player
i; i D 1; 2; 3; 4; we let i D 1; 2 mean North and South. Then, we want to find
P .X1 C X2 2/:
The joint distribution of .X1 ; X2 ; X3 ; X4 / is Mult.4; 14 ; 14 ; 14 ; 14 / (think of each
ace as a ball, and the four players as cells). Then, .X1 C X2 ; X3 C X4 /
Mult.4; 12 ; 12 /: Therefore,
4
4Š 1 4Š 1 4 4Š 1 4
P .X1 C X2 2/ D C C
2Š2Š 2 3Š1Š 2 4Š0Š 2
11
D :
16
Important formulas and facts about the multinomial distribution are given in the
next theorem.
Proof. Define Wi r as the indicator of the event that the rth ball lands in the i th cell.
Note that for a given i , the variables Wi r are independent. Then,
X
n
Xi D Wi r ;
rD1
P P
and therefore, E.Xi / D nrD1 EŒWi r D npi , and Var.Xi / D nrD1 Var.Wi r / D
npi .1 pi /: Part (b) follows from the definition of a multinomial experiment
116 2 Multivariate Discrete Distributions
(the trials are identical and independent, and each ball either lands or not in the
i th cell). For part (c),
!
Xn X
n
Cov.Xi ; Xj / D Cov Wi r ; Wjs
rD1 sD1
X
n X
n
D Cov.Wi r ; Wjs /
rD1 sD1
Xn
D Cov.Wi r ; Wjr /
rD1
Part (d) follows immediately from part (c) and part (a). Part (e) is a calculation, and
is omitted. t
u
Example 2.17 (MGF of the Multinomial Distribution). Let .X1 ; X2 ; : : : ; Xk /
Mult.n; p1 :p2 ; : : : ; pk /. Then the mgf .t1 ; t2 ; : : : ; tk / exists at all t, and a formula
follows easily. Indeed,
X nŠ
EŒe t1 X1 CCtk Xk D
x
e t1 x1 e t2 x2 e tk xk p1x1 p2x2 cdotspk k
Pk x1 Š xk Š
xi 0; i D1 xi Dn
X nŠ
D .p1 e t1 /x1 .p2 e t2 /x2 .pk e tk /xk
Pk x1 Š xk Š
xi 0; i D1 xi Dn
D .p1 e C p2 e t2 C C pk e tk /n ;
t1
P .X1 D x1 ; X2 D x2 ; : : : ; Xk D xk /
1
X e n
D P .X1 D x1 ; X2 D x2 ; : : : ; Xk D xk jN D n/
nD0
nŠ
X1 n
.x1 C x2 C C xk /Š x1 x2 x e
D p1 p2 pk k InDx1 Cx2 CCxk
nD0
x Šx
1 2 Š xk Š nŠ
1
D e x1 x2 xk p1 1 p2 2 pk k
x x x
x1 Šx2 Š xk Š
1
D e .p1 /x1 .p2 /x2 .pk /xk
x1 Šx2 Š xk Š
Y
k
e pi .pi /xi
D ;
xi Š
i D1
which establishes that the joint marginal pmf of .X1 ; X2 ; : : : ; Xk / is the product
of k Poisson pmfs, and so X1 ; X2 ; : : : ; Xk must be marginally independent, with
Xi Poi.pi /. t
u
Example 2.18 (No Empty Cells). Suppose n balls are distributed independently and
at random into k cells. We want to find a formula for the probability that no cell
remains empty.
We use the Poissonization technique to solve this problem. We want a formula
for P .Y1 ¤ 0; Y2 ¤ 0; : : : ; Yk ¤ 0/.
Exercises
Exercise 2.1. Consider the experiment of picking one word at random from the
sentence
ALL IS WELL IN THE NEWELL FAMILY
Let X be the length of the word selected and Y the number of Ls in it. Find in a
tabular form the joint pmf of X and Y , their marginal pmfs, means, and variances,
and the correlation between X and Y .
Exercise 2.2. A fair coin is tossed four times. Let X be the number of heads, Z the
number of tails, and Y D jX Zj. Find the joint pmf of .X; Y /, and E.Y /.
Exercise 2.3. Consider the joint pmf p.x; y/ D cxy; 1 x 3; 1 y 3.
(a) Find the normalizing constant c.
(b) Are X; Y independent? Prove your claim.
(c) Find the expectations of X; Y; X Y:
Exercise 2.4. Consider the joint pmf p.x; y/ D cxy; 1 x y 3.
(a) Find the normalizing constant c.
(b) Are X; Y independent? Prove your claim.
(c) Find the expectations of X; Y; X Y:
Exercise 2.5. A fair die is rolled twice. Let X be the maximum and Y the minimum
of the two rolls. By using the joint pmf of .X; Y / worked out in text, find the pmf
of X
Y
, and hence the mean of X Y
.
Exercise 2.6. A hat contains four slips of paper, numbered 1, 2, 3, and 4. Two slips
are drawn at random, without replacement. X is the number on the first slip and Y
the sum of the two numbers drawn. Write in a tabular form the joint pmf of .X; Y /.
Hence find the marginal pmfs. Are X; Y independent?
Exercises 119
Exercise 2.8. A fair die is rolled four times. Find the probabilities that:
(a) At least 1 six is obtained;
(b) Exactly 1 six and exactly one two is obtained,
(c) Exactly 1 six, 1 two, and 2 fours are obtained.
Exercise 2.11. Suppose X and Y are independent Geo.p/ random variables. Find
P .X Y /I P .X > Y /:
Exercise 2.12. * Suppose X and Y are independent Poi./ random variables. Find
P .X Y /I P .X > Y /:
Exercise 2.13. Suppose X and Y are independent and take the values 1, 2, 3, 4 with
probabilities .2, .3, .3, .2. Find the pmf of X C Y .
Exercise 2.14. Two random variables have the joint pmf p.x; x C 1/ D
1
nC1
; x D 0; 1; : : : ; n. Answer the following questions with as little calculation
as possible.
(a) Are X; Y independent?
(b) What is the variance of Y X ?
(c) What is Var.Y jX D 1/?
Exercise 2.18. Suppose a fair die is rolled twice. Let X; Y be the two rolls. Find
the following with as little calculation as possible:
(a) E.X C Y jY D y/.
(b) E.X Y jY D y/.
(c) Var.X 2 Y jY D y/.
(d) XCY;XY :
Exercise 2.19 (A Waiting Time Problem). In repeated throws of a fair die, let X
be the throw in which the first six is obtained, and Y the throw in which the second
six is obtained.
(a) Find the joint pmf of .X; Y /.
(b) Find the expectation of Y X .
(c) Find E.Y X jX D 8/.
(d) Find Var.Y X jX D 8/.
Exercise 2.20 * (Family Planning). A couple want to have a child of each sex, but
they will have at most four children. Let X be the total number of children they
will have and Y the number of girls at the second childbirth. Find the joint pmf of
.X; Y /, and the conditional expectation of X given Y D y; y D 0; 2.
Exercise 2.27 (Joint MGF). Suppose a fair die is rolled four times. Let X be the
number of ones and Y the number of sixes. Find the joint mgf of X and Y , and
hence, the covariance between X; Y .
Exercise 2.29 (Joint MGF). In repeated throws of a fair die, let X be the throw in
which the first six is obtained, and Y the throw in which the second six is obtained.
Find the joint mgf of X; Y , and hence the covariance between X and Y .
Exercise 2.30 * (Poissonization). A fair die is rolled 30 times. By using the Pois-
sonization theorem, find the probability that the maximum number of times any face
appears is 9 or more.