0% found this document useful (0 votes)

16 views27 pages

Multivariate Discrete Distributions

Chapter 2 discusses multivariate discrete distributions, focusing on joint, marginal, and conditional distributions of discrete random variables. It emphasizes the importance of understanding the joint behavior of multiple random variables through examples such as coin tossing and dice rolls. The chapter also introduces concepts like joint probability mass functions (pmf) and cumulative distribution functions (CDF) for discrete variables.

Uploaded by

Nitin Chaturvedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views27 pages

Multivariate Discrete Distributions

Uploaded by

Nitin Chaturvedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Chapter 2

Multivariate Discrete Distributions

We have provided a detailed overview of distributions of one discrete or one

continuous random variable in the previous chapter. But often in applications, we are
just naturally interested in two or more random variables simultaneously. We may
be interested in them simultaneously because they provide information about each
other, or because they arise simultaneously as part of the data in some scientific
experiment. For instance, on a doctor’s visit, the physician may check someone’s
blood pressure, pulse rate, blood cholesterol level, and blood sugar level, because
together they give information about the general health of the patient. In such cases,
it becomes essential to know how to operate with many random variables simul-
taneously. This is done by using joint distributions. Joint distributions naturally
lead to considerations of marginal and conditional distributions. We study joint,
marginal, and conditional distributions for discrete random variables in this chapter.
The concepts of these various distributions for continuous random variables are not
different; but the techniques are mathematically more sophisticated. The continuous
case is treated in the next chapter.

2.1 Bivariate Joint Distributions and Expectations of Functions

We present the fundamentals of joint distributions of two variables in this section.

The concepts in the multivariate case are the same, although the technicalities are
somewhat more involved. We treat the multivariate case in a later section. The idea
is that there is still an underlying experiment , with an associated sample space .
But now we have two or more random variables on the sample space . Random
variables being functions on the sample space , we now have multiple functions,
say X.!/; Y .!/; : : : ; and so on . We want to study their joint behavior.

Example 2.1 (Coin Tossing). Consider the experiment of tossing a fair coin three
times. Let X be the number of heads among the first two tosses, and Y the num-
ber of heads among the last two tosses. If we consider X and Y individually, we
realize immediately that they are each Bin.2; :5/ random variables. But the individ-
ual distributions hide part of the full story. For example, if we knew that X was 2,

A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals 95

and Advanced Topics, Springer Texts in Statistics, DOI 10.1007/978-1-4419-9634-3 2,
c Springer Science+Business Media, LLC 2011
96 2 Multivariate Discrete Distributions

then that would imply that Y must be at least 1. Thus, their joint behavior cannot
be fully understood from their individual distributions; we must study their joint
distribution.
Here is what we mean by their joint distribution. The sample space of this
experiment is

D fHHH; HH T; H TH; H T T; THH; TH T; T TH; T T T g:

Each sample point has an equal probability 18 . Denoting the sample points as
!1 ; !2 ; : : : ; !8 , we see that if !1 prevails, then X.!1 / D Y .!1 / D 2. but if !2
prevails, then X.!2 / D 2; Y .!2 / D 1. The combinations of all possible values of
.X; Y / are

.0; 0/; .0; 1/; .0; 2/; .1; 0/; .1; 1/; .1; 2/; .2; 0/; .2; 1/; .2; 2/:

The joint distribution of .X; Y / provides the probability p.x; y/ D P .X Dx; Y Dy/
for each such combination of possible values .x; y/. Indeed, by direct counting us-
ing the eight equally likely sample points, we see that

1 1 1 1
p.0; 0/ D ; p.0; 1/ D ; p.0; 2/ D 0; p.1; 0/ D ; p.1; 1/ D I
8 8 8 4
1 1 1
p.1; 2/ D ; p.2; 0/ D 0; p.2; 1/ D ; p.2; 2/ D :
8 8 8

For example, why is p.0; 1/ 81 ? This is because the combination .X D 0; Y D 1/ is

favored by only one sample point, namely T TH . It is convenient to present these
nine different probabilities in the form of a table as follows.

Y
X 0 1 2
1 1
0 8 8
0
1 1 1
1 8 4 8
1 1
2 0 8 8

Such a layout is a convenient way to present the joint distribution of two discrete
random variables with a small number of values. The distribution itself is called the
joint pmf ; here is a formal definition.

Definition 2.1. Let X; Y be two discrete random variables with respective sets of
values x1 ; x2 ; : : : ; and y1 ; y2 ; : : : ; defined on a common sample space . The joint
pmf of X; Y is defined to be the function p.xi ; yj /DP .X Dxi ; Y Dyj /; i; j 1,
and p.x; y/ D 0 at any other point .x; y/ in R2 .
2.1 Bivariate Joint Distributions and Expectations of Functions 97

The requirements of a joint pmf are that

(i) p.x; y/ 0 8.x; y/I
P P
(ii) i j p.xi ; yj / D 1:
Thus, if we write the joint pmf in the form of a table, then all entries should be
nonnegative, and the sum of all the entries in the table should be one.
As in the case of a single variable, we can define a CDF for more than one
variable also. For the case of two variables, here is the definition of a CDF.
Definition 2.2. Let X; Y be two discrete random variables, defined on a common
sample space . The joint CDF, or simply the CDF, of .X; Y / is a function F W
R2 ! Œ0; 1 defined as F .x; y/ D P .X x; Y y/; x; y 2 R:
Like the joint pmf, the CDF also characterizes the joint distribution of two dis-
crete random variables. But it is not very convenient or even interesting to work with
the CDF in the case of discrete random variables. It is much preferable to work with
the pmf when dealing with discrete random variables.
Example 2.2 (Maximum and Minimum in Dice Rolls). Suppose a fair die is rolled
twice, and let X; Y be the larger and the smaller of the two rolls (note that X can
be equal to Y ). Each of X; Y takes the individual values 1; 2; : : : ; 6, but we have
necessarily X Y . The sample space of this experiment is
f11; 12; 13; : : : ; 64; 65; 66g:
By direct counting, for example, p.2; 1/ D 36 2
. Indeed, p.x; y/ D 36 2
for each
x; y D 1; 2; : : : ; 6; x > y, and p.x; y/ D 36 for x D y D 1; 2; : : : ; 6. Here is what
1

the joint pmf looks like in the form of a table:

Y
X 1 2 3 4 5 6
1
1 36 0 0 0 0 0
1 1
2 18 36
0 0 0 0
1 1 1
3 18 18 36
0 0 0
1 1 1 1
4 18 18 18 36
0 0
1 1 1 1 1
5 18 18 18 18 36
0
1 1 1 1 1 1
6 18 18 18 18 18 36

The individual pmfs of X; Y are easily recovered from the joint distribution. For
example,
X
6
1
P .X D 1/ D P .X D 1; Y D y/ D ; and
yD1
36

X
6
1 1 1
P .X D 2/ D P .X D 2; Y D y/ D C D ;
yD1
18 36 12
98 2 Multivariate Discrete Distributions

and see on. The individual pmfs are obtained by summing the joint probabilities
over all values of the other variable. They are:
x 1 2 3 4 5 6
1 3 5 7 9 11
pX .x/ 36 36 36 36 36 36
y 1 2 3 4 5 6
11 9 7 5 3 1
pY .y/ 36 36 36 36 36 36

From the individual pmf of X , we can find the expectation of X . Indeed,

1 3 11 161
E.X / D 1 C2 CC6 D :
36 36 36 36
Similarly, E.Y / D 9136
. The individual pmfs are called marginal pmfs, and here is
the formal definition.

Definition 2.3. Let p.x; y/ be the joint pmf

P of .X; Y /. The marginal pmf of a func-
tion Z D g.X; Y / is defined as pZ .z/ D .x;y/Wg.x;y/Dz p.x; y/: In particular,
X X
pX .x/ D p.x; y/I pY .y/ D p.x; y/;
y x

and for any event A, X

P .A/ D p.x; y/:
.x;y/2A

Example 2.3. Consider a joint pmf given by the formula

p.x; y/ D c.x C y/; 1 x; y n;
where c is a normalizing constant.
First of all, we need to evaluate c by equating

X
n X
n
p.x; y/ D 1
xD1 yD1

X
n X
n
,c .x C y/ D 1
xD1 yD1

Xn
n.n C 1/
,c nx C D1
xD1
2
2
n .n C 1/ n2 .n C 1/
,c C D1
2 2
, cn2 .n C 1/ D 1
1
,c D :
n2 .n C 1/
2.1 Bivariate Joint Distributions and Expectations of Functions 99

The joint pmf is symmetric between x and y (because x C y D y C x), and so,
X; Y have the same marginal pmf. For example, X has the pmf
Xn
1 Xn
pX .x/ D p.x; y/ D 2 .x C y/
yD1
n .n C 1/ yD1

1 n.n C 1/
D 2 nx C
n .n C 1/ 2
x 1
D C ; 1 x n:
n.n C 1/ 2n
Suppose now we want to compute P .X > Y /. This can be found by summing
p.x; y/ over all combinations for which x > y. But this longer calculation
can be avoided by using a symmetry argument that is often very useful. Note
that because the joint pmf is symmetric between x and y, we must have
P .X > Y / D P .Y > X / D p (say). But, also,
P .X > Y / C P .Y > X / C P .X D Y / D 1 ) 2p C P .X D Y / D 1
1 P .X D Y /
)pD :
2
Now,
X
n X
n
P .X D Y / D p.x; x/ D c 2x
xD1 xD1
1 1
D n.n C 1/ D :
n2 .n C 1/ n
Therefore, P .X > Y / D p D n1
2n
12 ; for large n.
Example 2.4 (Dice Rolls Revisited). Consider again the example of two rolls of a
fair die, and suppose X; Y are the larger and the smaller of the two rolls. We have
worked out the joint distribution of .X; Y / in Example 2.2. Suppose we want to
find the distribution of the difference, X Y . The possible values of X Y are
0; 1; : : : ; 5, and we find P .X Y D k/ by using the joint distribution of .X; Y /:
1
P .X Y D 0/ D p.1; 1/ C p.2; 2/ C C p.6; 6/ D I
6
5
P .X Y D 1/ D p.2; 1/ C p.3; 2/ C C p.6; 5/ D I
18
2
P .X Y D 2/ D p.3; 1/ C p.4; 2/ C p.5; 3/ C p.6; 4/ D I
9
1
P .X Y D 3/ D p.4; 1/ C p.5; 2/ C p.6; 3/ D I
6
1
P .X Y D 4/ D p.5; 1/ C p.6; 2/ D I
9
1
P .X Y D 5/ D p.6; 1/ D :
18
There is no way to find the distribution of X Y except by using the joint distribution
of .X; Y /.
100 2 Multivariate Discrete Distributions

Suppose now we also want to know the expected value of X Y . Now that we
have the distribution of X Y worked out, we can find the expectation by directly
using the definition of expectation:

X
5
E.X Y / D kP .X Y D k/
kD0
5 4 1 4 5 35
D C C C C D :
18 9 2 9 18 18

But, we can also use linearity of expectations and find E.X Y / as

161 91 35
E.X Y / D E.X / E.Y / D D
36 36 18

(see Example 2.2 for E.X /; E.Y /).

A third possible way to compute E.X Y / is to treat X P P Y as a function of
.X; Y / and use the joint pmf of .X; Y / to find E.X Y / as x y .x y/p.x; y/.
In this particular example, this is an unncessarily laborious calculation, because
luckily we can find E.X Y / by other quicker means in this example, as we just
saw. But in general, one has to resort to the joint pmf to calculate the expectation of
a function of .X; Y /. Here is the formal formula.
Theorem 2.1 (Expectation of a Function). Let .X; Y / have the joint pmf
p.x; y/, and let g.X;P YP
/ be a function of .X; Y /. We say that the expectation
of g.X; Y / exists if x y jg.x; y/jp.x; y/ < 1, in which case,
XX
EŒg.X; Y / D g.x; y/p.x; y/:
x y

2.2 Conditional Distributions and Conditional Expectations

Sometimes we want to know what the expected value is of one of the variables,
say X , if we knew the value of the other variable Y . For example, in the die tossing
experiment above, what should we expect the larger of the two rolls to be if the
smaller roll is known to be 2?
To answer this question, we have to find the probabilities of the various values
of X , conditional on knowing that Y equals some given y, and then average by
using these conditional probabilities. Here are the formal definitions.
Definition 2.4 (Conditional Distribution). Let .X; Y / have the joint pmf p.x; y/.
The conditional distribution of X given Y D y is defined to be

p.x; y/
p.xjy/ D P .X D xjY D y/ D ;
pY .y/
2.2 Conditional Distributions and Conditional Expectations 101

and the conditional expectation of X given Y D y is defined to be

P P
X xp.x; y/ xp.x; y/
E.X jY D y/ D xp.xjy/ D x
D Px :
x
pY .y/ x p.x; y/

The conditional distribution of Y given X D x and the conditional expectation of

Y given X D x are defined analogously, by switching the roles of X and Y in the
above definitions.
We often casually write E.X jy/ to mean E.X jY D y/.
Two easy facts that are nevertheless often useful are the following.

Proposition. Let X; Y be random variables defined on a common sample space .

Then,
(a) E.g.Y /jY D y/ D g.y/; 8y; and for any function g;
(b) E.Xg.Y /jY D y/ D g.y/E.X jY D y/ 8y; and for any function g.
Recall that in Chapter 1, we defined two random variables to be independent if
P .X x; Y y/ D P .X x/P .Y y/ 8 x; y 2 R. This is of course a correct
definition; but in the case of discrete random variables, it is more convenient to
think of independence in terms of the pmf. The definition below puts together some
equivalent definitions of independence of two discrete random variables.

Definition 2.5 (Independence). Let .X; Y / have the joint pmf p.x; y/. Then X; Y
are said to be independent if

p.xjy/ D pX .x/; 8 x; y such that pY .y/ > 0I

, p.yjx/ D pY .y/: 8 x; y such that pX .x/ > 0I
, p.x; y/ D pX .x/pY .y/; 8 x; yI
, P .X x; Y y/ D P .X x/P .Y y/ 8 x; y:

The third equivalent condition in the above list is usually the most convenient one
to verify and use.
One more frequently useful fact about conditional expectations is the following.

Proposition. Suppose X; Y are independent random variables. Then, for any func-
tion g.X / such that the expectations below exist, and for any y,

EŒg.X /jY D y D EŒg.X /:

2.2.1 Examples on Conditional Distributions and Expectations

Example 2.5 (Maximum and Minimum in Dice Rolls). In the experiment of two rolls
of a fair die, we have worked out the joint distribution of X; Y , where X is the larger
102 2 Multivariate Discrete Distributions

and Y the smaller of the two rolls. Using this joint distribution, we can now find the
conditional distributions. For instance,

P .Y D 1jX D 1/ D 1I P .Y D yjX D 1/ D 0; if y > 1I

1=18 2
P .Y D 1jX D 2/ D D I
1=18 C 1=36 3
1=36 1
P .Y D 2jX D 2/ D D I
1=18 C 1=36 3
P .Y D yjX D 2/ D 0; if y > 2I
1=18 2
P .Y D yjX D 6/ D D ; if 1 y 5I
5=18 C 1=36 11
1=36 1
P .Y D 6jX D 6/ D D :
5=18 C 1=36 11

Example 2.6 (Conditional Expectation in a 2 2 Table). Suppose X; Y are binary

variables, each taking only the values 0; 1 with the following joint distribution.

Y
X 0 1
0 s t
1 u v

We want to evaluate the conditional expectation of X given Y D 0; 1, respectively.

By using the definition of conditional expectation,

0 p.0; 0/ C 1 p.1; 0/ u
E.X jY D 0/ D D I
p.0; 0/ C p.1; 0/ sCu
0 p.0; 1/ C 1 p.1; 1/ v
E.X jY D 1/ D D :
p.0; 1/ C p.1; 1/ t Cv

Therefore,

v u vs ut
E.X jY D 1/ E.X jY D 0/ D D :
t Cv sCu .t C v/.s C u/

It follows that we can now have the single formula

u vs ut
E.X jY D y/ D C y;
sCu .t C v/.s C u/

y D 0; 1. We now realize that the conditional expectation of X given Y D y is a

linear function of y in this example. This is the case whenever both X; Y are binary
variables, as they were in this example.
2.2 Conditional Distributions and Conditional Expectations 103

Example 2.7 (Conditional Expectation in Dice Experiment). Consider again the

example of the joint distribution of the maximum and the minimum of two rolls of
a fair die. Let X denote the maximum, and Y the minimum. We find E.X jY D y/
for various values of y.
By using the definition of E.X jY D y/, we have, for example,

1 1
C 18 Œ2 C
1
C 6 41
E.X jY D 1/ D 36
D D 3:73I
1
36
C 185 11

as another example,

3 1
C 18
1
15 33
E.X jY D 3/ D 36
D D 4:71I
1
36
C 18
3 7

and,
5 1
C6 1
17
E.X jY D 5/ D 36 18
D D 5:77:
1
36
C 1
18
3

We notice that E.X jY D 5/ > E.X jY D 3/ > E.X jY D 1/I in fact, it is true that
E.X jY D y/ is increasing in y in this example. This does make intuitive sense.
Just as in the case of a distribution of a single variable, we often also want a mea-
sure of variability in addition to a measure of average for conditional distributions.
This motivates defining a conditional variance.

Definition 2.6 (Conditional Variance). Let .X; Y / have the joint pmf p.x; y/.
Let X .y/ D E.X jY D y/: The conditional variance of X given Y D y is
defined to be
X
Var.X jY D y/ D EŒ.X X .y//2 jY D y D .x X .y//2 p.xjy/:
x

We often write casually Var.X jy/ to mean Var.X jY D y/.

Example 2.8 (Conditional Variance in Dice Experiment). We work out the condi-
tional variance of the maximum of two rolls of a die given the minimum. That is,
suppose a fair die is rolled twice, and X; Y are the larger and the smaller of the two
rolls; we want to compute Var.X jy/.
For example, if y D 3, then X .y/ D E.X jY D y/ D E.X jY D 3/ D 4:71
(see the previous example). Therefore,
X
Var.X jy/ D .x 4:71/2 p.xj3/
x
.3 4:71/2 36
1
C.4 4:71/2 1
C.5 4:71/2 18
1
C.6 4:71/2 1
D 18 18
1
36
C 18 C 18 C 18
1 1 1

D 1:06:
104 2 Multivariate Discrete Distributions

To summarize, given that the minimum of two rolls of a fair die is 3, the expected
value of the maximum is 4.71 and the variance of the maximum is 1.06.
These two values, E.X jy/ and Var.X jy/, change as we change the given
value y. Thus E.X jy/ and Var.X jy/ are functions of y, and for each separate y, a
new calculation is needed. If X; Y happen to be independent, then of course what-
ever y is, E.X jy/ D E.X /, and Var.X jy/ D Var.X /.
The next result is an important one in many applications.

Theorem 2.2 (Poisson Conditional Distribution). Let X; Y be independent

Poisson random variables, with means ; . Then the conditional distribution
of X given X C Y D t is Bin.t; p/, where p D C

.

Proof. Clearly, P .X D xjX C Y D t/ D 0 8x > t: For x t,

P .X D x; X C Y D t/
P .X D xjX C Y D t/ D
P .X C Y D t/
P .X D x; Y D t x/
D
P .X C Y D t/
e x e t x tŠ
D
xŠ .t x/Š e .C/ . C /t

(on using the fact that X C Y Poi. C /; see Chapter 1)

tŠ x t x
D
xŠ.t x/Š . C /t
! x t x
t
D ;
x C C

which is the pmf of the Bin.t; C / distribution. t
u

2.3 Using Conditioning to Evaluate Mean and Variance

Conditioning is often an extremely effective tool to calculate probabilities, means,

and variances of random variables with a complex or clumsy joint distribution. Thus,
in order to calculate the mean of a random variable X , it is sometimes greatly con-
venient to follow an iterative process, whereby we first evaluate the mean of X after
conditioning on the value y of some suitable random variable Y , and then average
over y. The random variable Y has to be chosen judiciously, but is often clear from
the context of the specific problem. Here are the precise results on how this tech-
nique works; it is important to note that the next two results hold for any kind of
random variables, not just discrete ones.
2.3 Using Conditioning to Evaluate Mean and Variance 105

Theorem 2.3 (Iterated Expectation Formula). Let X; Y be random variables

defined on the same probability space . Suppose E.X / and E.X jY D y/ exist
for each y. Then,
E.X / D EY ŒE.X jY D y/I
thus, in the discrete case,
X
E.X / D X .y/pY .y/;
y

where X .y/ D E.X jY D y/.

Proof. We prove this for the discrete case. By definition of conditional expectation,
P
xp.x; y/
X .y/ D x
pY .y/
X XX XX
) X .y/pY .y/ D xp.x; y/ D xp.x; y/
y y x x y
X X X
D x p.x; y/ D xpX .x/ D E.X /:
x y x
The corresponding variance calculation formula is the following. The proof of
this uses the iterated mean formula above, and applies it to .X X /2 . u
t

Theorem 2.4 (Iterated Variance Formula). Let X; Y be random variables de-

fined on the same probability space . Suppose Var.X /; Var.X jY D y/ exist for
each y. Then,

Var.X / D EY ŒVar.X jY D y/ C VarY ŒE.X jY D y/:

Remark. These two formulas for iterated expectation and iterated variance are valid
for all types of variables, not just the discrete ones. Thus, these same formulas still
hold when we discuss joint distributions for continuous random variables in the next
chapter.
Some operational formulas that one should be familiar with are summarized
below.

Conditional Expectation and Variance Rules.

E.g.X /jX D x/ D g.x/I E.g.X /h.Y /jY D y/ D h.y/E.g.X /jY D y/I

E.g.X /jY D y/ D E.g.X // if X; Y are independentI
Var.g.X /jX D x/ D 0I Var.g.X /h.Y /jY D y/ D h2 .y/Var.g.X /jY D y/I
Var.g.X /jY D y/ D Var.g.X // if X; Y are independent:

Let us see some applications of the two iterated expectation and iterated variance
formulas.
106 2 Multivariate Discrete Distributions

Example 2.9 (A Two-Stage Experiment). Suppose n fair dice are rolled. Those that
show a six are rolled again. What are the mean and the variance of the number of
sixes obtained in the second round of this experiment?
Define Y to be the number of dice in the first round that show a six, and X the
number of dice in the second round that show a six. Given Y D y; X Bin.y; 16 /,
and Y itself is distributed as Bin.n; 16 /. Therefore,
hy i n
E.X / D EŒE.X jY D y/ D EY D :
6 36
Also,

Var.X / D EY ŒVar.X jY D y/ C VarY ŒE.X jY D y/

hy i
15
D EY y C VarY
66 6
5 n 1 15
D C n
36 6 36 6 6
5n 5n 35n
D C D :
216 1296 1296
Example 2.10. Suppose a chicken lays a Poisson number of eggs per week with
mean . Each egg, independently of the others, has a probability p of fertilizing.
We want to find the mean and the variance of the number of eggs fertilized in a
week.
Let N denote the number of eggs hatched and X the number of eggs fertilized.
Then, N Poi./, and given N D n; X Bin.n; p/. Therefore,

E.X / D EN ŒE.X jN D n/ D EN Œnp D p;

and,

Var.X / D EN ŒVar.X jN D n/ C VarN .E.X jN D n/

D EN Œnp.1 p/ C VarN .np/ D p.1 p/ C p 2 D p:

Interestingly, the number of eggs actually fertilized has the same mean and variance
p, (Can you see why?)
Remark. In all of these examples, it was important to choose the variable Y wisely
on which one should condition. The efficiency of the technique depends on this very
crucially.
Sometimes a formal generalization of the iterated expectation formula when a
third variable Z is present is useful. It is particularly useful in hierarchical statis-
tical modeling of distributions, where an ultimate marginal distribution for some
X is constructed by first conditioning on a number of auxiliary variables, and then
gradually unconditioning them. We state the more general iterated expectation for-
mula; its proof is exactly similar to that of the usual iterated expectation formula.
2.4 Covariance and Correlation 107

Theorem 2.5 (Higher-Order Iterated Expectation). Let X; Y; Z be random

variables defined on the same sample space . Assume that each conditional
expectation below and the marginal expectation E.X / exist. Then,

E.X / D EY ŒEZjY fE.X jY D y; Z D z/g:

2.4 Covariance and Correlation

We know that variance is additive for independent random variables; that is, if
X1 ; X2 ; : : : ; Xn are independent random variables, then Var.X1 CX2 C CXn / D
Var.X1 / C C Var.Xn /: In particular, for two independent random variables
X; Y; Var.X CY / D Var.X /CVar.Y /: However, in general, variance is not additive.
Let us do the general calculation for Var.X C Y /.

Var.X C Y / D E.X C Y /2 ŒE.X C Y /2

D E.X 2 C Y 2 C 2X Y / ŒE.X / C E.Y /2
D E.X 2 /CE.Y 2 /C2E.X Y /ŒE.X /2 ŒE.Y /2 2E.X /E.Y /
D E.X 2 /ŒE.X /2 CE.Y 2 /ŒE.Y /2 C2ŒE.X Y /E.X /E.Y /
D Var.X / C Var.Y / C 2ŒE.X Y / E.X /E.Y /:

We thus have the extra term 2ŒE.X Y / E.X /E.Y / in the expression for
Var.X C Y /; of course, when X; Y are independent, E.X Y / D E.X /E.Y /, and
so the extra term drops out. But, in general, one has to keep the extra term. The
quantity E.X Y / E.X /E.Y / is called the covariance of X and Y .
Definition 2.7 (Covariance). Let X; Y be two random variables defined on a
common sample space , such that E.X Y /; E.X /; E.Y / all exist. The covariance
of X and Y is defined as

Cov.X; Y / D E.X Y / E.X /E.Y / D EŒ.X E.X //.Y E.Y //:

Remark. Covariance is a measure of whether two random variables X; Y tend to in-

crease or decrease together. If a larger value of X generally causes an increment in
the value of Y , then often (but not always) they have a positive covariance. For ex-
ample, taller people tend to weigh more than shorter people, and height and weight
usually have a positive covariance.
Unfortunately, however, covariance can take arbitrary positive and arbitrary neg-
ative values. Therefore, by looking at its value in a particular problem, we cannot
judge whether it is a large value. We cannot compare a covariance with a standard
to judge if it is large or small. A renormalization of the covariance cures this prob-
lem, and calibrates it to a scale of 1 to C1. We can judge such a quantity as large,
small, or moderate; for example, .95 would be large positive, .5 moderate, and .1
small. The renormalized quantity is the correlation coefficient or simply the corre-
lation between X and Y .
108 2 Multivariate Discrete Distributions

Definition 2.8 (Correlation). Let X; Y be two random variables defined on a

common sample space , such that Var.X /; Var.Y / are both finite. The correla-
tion between X; Y is defined to be

Cov.X; Y /
X;Y D p p :
Var.X / Var.Y /

Some important properties of covariance and correlation are put together in the next
theorem.

Theorem 2.6 (Properties of Covariance and Correlation). Provided that the re-
quired variances and the covariances exist,
(a) Cov.X; c/ D 0 for any X and any constant cI
(b) Cov.X; X / D var.X / for!any X I
Pn P
m Pn Pm
(c) Cov ai Xi ; bj Yj D ai bj Cov.Xi ; Yj /;
i D1 j D1 i D1 j D1

and in particular,
Var.aX CbY / D Cov.aX CbY ;aX CbY /
D a2 Var.X /Cb 2 Var.Y /C2abCov.X; Y /;

and, !
X
n X
n X X
n
Var Xi D Var.Xi / C 2 Cov.Xi ; Xj /I
i D1 i D1 i <j D1

(d) For any two independent random variables X; Y; Cov.X; Y / D X;Y D 0I

(e) aCbX;cCd Y D sgn.bd /X;Y ; where sgn.bd / D 1 if bd > 0; and D 1
if bd < 0:
(f) Whenever X;Y is defined, 1 X;Y 1.
(g) X;Y D 1 if and only if for some a, some b > 0; P .Y D a C bX / D 1;
X;Y D 1 if and only if for some a, some b < 0; P .Y D a C bX / D 1.

Proof. For part (a), Cov.X; c/ D E.cX / E.c/E.X / D cE.X / cE.X / D 0.

For part (b), Cov.X; X / D E.X 2 / ŒE.X /2 D var.X /. For part (c),
0 1
X
n X
m
Cov @ ai Xi ; bj Yj A
i D1 j D1
2 3 ! 0 1
X
n X
m X
n X
m
DE4 ai Xi bj Yj 5 E ai Xi E @ bj Yj A
i D1 j D1 i D1 j D1
0 1 " # 2 3
X
n X
m X
n X
m
DE@ ai bj Xi Yj A ai E.Xi / 4 bj E.Yj /5
i D1 j D1 i D1 j D1
2.4 Covariance and Correlation 109

X
n X
m X
n X
m
D ai bj E.Xi ; Yj / ai bj E.Xi /E.Yj /
i D1 j D1 i D1 j D1

X
n X
m
D ai bj ŒE.Xi ; Yj / E.Xi /E.Yj /
i D1 j D1

X
n X
m
D ai bj Cov.Xi ; Yj /:
i D1 j D1

Part (d) follows on noting that E.X Y / D E.X /E.Y / if X; Y are independent. For
part (e), first note that Cov.a C bX; c C d Y / D bd Cov.X; Y / by using part (a)
and part (c). Also, Var.a C bX / D b 2 Var.X /; Var.c C d Y / D d 2 Var.Y /

bd Cov.X; Y /
) aCbX;cCd Y D p p
b 2 Var.X /
d 2 Var.Y /
bd Cov.X; Y /
D p p
jbj Var.X /jd j Var.Y /
bd
D X;Y D sgn.bd /X;Y :
jbd j

The proof of part (f) uses the Cauchy–Schwarz inequality (see Chapter 1) that for
any two random variables U; V; ŒE.U V /2 E.U 2 /E.V 2 /. Let

X E.X / Y E.Y /
U D p ; V D p :
Var.X / Var.Y /

Then, E.U 2 / D E.V 2 / D 1, and

X;Y D E.U V / E.U 2 /E.V 2 / D 1:

The lower bound X;Y 1 follows similarly.

Part (g) uses the condition for equality in the Cauchy–Schwarz inequality: in
order that X;Y D ˙1, one must have ŒE.U V /2 D E.U 2 /E.V 2 / in the argument
above, which implies the statement in part (g). t
u

Example 2.11 (Correlation Between Minimum and Maximum in Dice Rolls). Con-
sider again the experiment of rolling a fair die twice, and let X; Y be the maximum
and the minimum of the two rolls. We want to find the correlation between X; Y .
The joint distribution of .X; Y / was worked out in Example 2.2. From the joint
distribution,

1 2 4 3 6 9 30 36 49
E.X Y / D C C C C C CC C D :
36 18 36 18 18 36 18 36 4
110 2 Multivariate Discrete Distributions

The marginal pmfs of X; Y were also worked out in Example 2.2. From the marginal
pmfs, by direct calculation, E.X / D 161=36; E.Y / D 91=36; Var.X / D Var.Y / D
2555=1296: Therefore,

E.X Y / E.X /E.Y /

X;Y D p p
Var.X / Var.Y /
49=4 161=36 91=36 35
D D D :48:
2555=1296 73

The correlation between the maximum and the minimum is in fact positive for any
number of rolls of a die, although the correlation will converge to zero when the
number of rolls converges to 1.

Example 2.12 (Correlation in the Chicken–Eggs Example). Consider again the ex-
ample of a chicken laying a Poisson number of eggs N with mean , and each egg
fertilizing, independently of others, with probability p. If X is the number of eggs
actually fertilized, we want to find the correlation between the number of eggs laid
and the number fertilized, that is, the correlation between X and N .
First,

E.XN / D EN ŒE.XN jN D n/ D EN ŒnE.X jN D n/

D EN Œn2 p D p. C 2 /:

Next, from our previous calculations, E.X / D p; E.N / D ; Var.X / D p;
Var.N / D : Therefore,

E.XN / E.X /E.N /

X;N D p p
Var.X / Var.N /
p. C 2 / p2 p
D p p D p:
p

Thus, the correlation goes up with the fertility rate of the eggs.

Example 2.13 (Best Linear Predictor). Suppose X and Y are two jointly distributed
random variables, and either by necessity, or by omission, the variable Y was not
observed. But X was observed, and there may be some information in the X value
about Y . The problem is to predict Y by using X . Linear predictors, because of their
functional simplicity, are appealing. The mathematical problem is to choose the best
linear predictor a C bX of Y , where best is defined as the predictor that minimizes
the mean squared error EŒY .a C bX /2 . We show that the answer has something
to do with the covariance between X and Y .
By breaking the square, R.a; b/

DEŒY .aCbX /2 D a2 Cb 2 E.X 2 /C2abE.X /2aE.Y /2bE.X Y /CE.Y 2 /:

2.5 Multivariate Case 111

To minimize this with respect to a; b, we partially differentiate R.a; b/ with respect

to a; b, and set the derivatives equal to zero:

@
R.a; b/ D 2a C 2bE.X / 2E.Y / D 0
@a
, a C bE.X / D E.Y /I
@
R.a; b/ D 2bE.X 2 / C 2aE.X / 2E.X Y / D 0
@b
, aE.X / C bE.X 2 / D E.X Y /:

Simultaneously solving these two equations, we get

E.X Y / E.X /E.Y / E.X Y / E.X /E.Y /

bD ; a D E.Y / E.X /:
Var.X / Var.X /

These values do minimize R.a; b/ by an easy application of the second derivative

test. So, the best linear predictor of Y based on X is

Cov.X; Y / Cov.X; Y /
best linear predictor of Y D E.Y / E.X / C X
Var.X / Var.X /
Cov.X; Y /
D E.Y / C ŒX E.X /:
Var.X /

The best linear predictor is also known as the regression line of Y on X . It is of

widespread use in statistics.

Example 2.14 (Zero Correlation Does Not Mean Independence). If X; Y are inde-
pendent, then necessarily Cov.X; Y / D 0, and hence the correlation is also zero.
The converse is not true. Take a three-valued random variable X with the pmf
P .X D ˙1/ D p; P .X D 0/ D 1 2p; 0 < p < 12 . Let the other variable
Y be Y D X 2 : Then, E.X Y / D E.X 3 / D 0, and E.X /E.Y / D 0, because
E.X / D 0. Therefore, Cov.X; Y / D 0. But X; Y are certainly not independent; for
example, P .Y D 0jX D 0/ D 1, but P .Y D 0/ D 1 2p ¤ 0:
Indeed, if X has a distribution symmetric around zero, and if X has three finite
moments, then X and X 2 always have a zero correlation, although they are not
independent.

2.5 Multivariate Case

The extension of the concepts for the bivariate discrete case to the multivariate dis-
crete case is straightforward. We give the appropriate definitions and an important
example, namely that of the multinomial distribution, an extension of the binomial
distribution.
112 2 Multivariate Discrete Distributions

Definition 2.9. Let X1 ; X2 ; : : : ; Xn be discrete random variables defined on a com-

mon sample space , with Xi taking values in some countable set Xi . The joint
pmf of .X1 ; X2 ; : : : ; Xn / is defined as p.x1 ; x2 ; : : : ; xn / D P .X1 D x1 ; : : : ; Xn D
xn /; xi 2 Xi ; and zero otherwise:.

Definition 2.10. Let X1 ; X2 ; : : : ; Xn be random variables defined on a common

sample space . The joint CDF of X1 ; X2 ; : : : ; Xn is defined as F .x1 ; x2 ; : : : ; xn / D
P .X1 x1 ; X2 x2 ; : : : ; Xn xn /; x1 ; x2 ; : : : ; xn 2 R.
The requirements of a joint pmf are the usual:
x2 ; : : : ; xn / 0 8 x1 ; x2 ; : : : ; xn 2 RI
(i) p.x1 ; P
(ii) p.x1 ; x2 ; : : : ; xn / D 1:
x1 2X1 ;:::;xn 2Xn

The requirements of a joint CDF are somewhat more complicated.

The requirements of a CDF are that
(i) 0 F 1 8.x1 ; : : : ; xn /:
(ii) F is nondecreasing in each coordiante:
(iii) F equals zero if one or more of the xi D 1:
(iv) F equals one if all the xi D C1:
(v) F assigns a nonnegative probability to every n dimensional rectangle

Œa1 ; b1 Œa2 ; b2 Œan ; bn :

This last condition, (v), is a notationally clumsy condition to write down. If n D 2,

it reduces to the simple inequality that

F .b1 ; b2 / F .a1 ; b2 / F .b1 ; a2 / C F .a1 ; a2 / 0 8a1 b1 ; a2 b2 :

Once again, we mention that it is not convenient or interesting to work with the
CDF for discrete random variables; for discrete variables, it is preferable to work
with the pmf.

2.5.1 Joint MGF

Analogous to the case of one random variable, we can define the joint mgf for sev-
eral random variables. The definition is the same for all types of random variables,
discrete or continuous, or other mixed types. As in the one-dimensional case, the
joint mgf of several random variables is also a very useful tool. First, we repeat
the definition of expectation of a function of several random random variables; see
Chapter 1, where it was first introduced and defined. The definition below is equiv-
alent to what was given in Chapter 1.

Definition 2.11. Let X1 ; X2 ; : : : ; Xn be discrete random variables defined on

a common sample space , with Xi taking values in some countable set Xi .
2.5 Multivariate Case 113

Let the joint pmf of X1 ; X2 ; : : : ; Xn be p.x1 ; : : : ; xn /: Let g.x1 ; : : : ; xn / be

a real-valued
P function of n variables. We say that EŒg.X1 ; X2 ; : : : ; Xn / ex-
ists if x1 2X1 ;:::;xn 2Xn jg.x1 ; : : : ; xn /jp.x1 ; : : : ; xn / < 1, in which case, the
expectation is defined as
X
EŒg.X1 ; X2 ; : : : ; Xn / D g.x1 ; : : : ; xn /p.x1 ; : : : ; xn /:
x1 2X1 ;:::;xn 2Xn

A corresponding definition when X1 ; X2 ; : : : ; Xn are all continuous random vari-

ables is given in the next chapter.
Definition 2.12. Let X1 ; X2 ; : : : ; Xn be n random variables defined on a common
sample space , The joint moment-generating function of X1 ; X2 ; : : : ; Xn is defined
to be
0
.t1 ; t2 ; : : : ; tn / D EŒe t1 X1 Ct2 X2 C:::Ctn Xn D EŒe t X ;
provided the expectation exists, and where t0 X denotes the inner product of the
vectors t D .t1 ; : : : ; tn /; X D .X1 ; : : : ; Xn /.
Note that the joint moment-generating function (mgf) always exists at the origin,
namely, t D .0; : : : ; 0/, and equals one at that point. It may or may not exist at other
points t. If it does exist in a nonempty rectangle containing the origin, then many
important characteristics of the joint distribution of X1 ; X2 ; : : : ; Xn can be derived
by using the joint mgf. As in the one-dimensional case, it is a very useful tool. Here
is the moment-generation property of a joint mgf.
Theorem 2.7. Suppose .t1 ; t2 ; : : : ; tn / exists in a nonempty open rectangle con-
taining the origin t D 0: Then a partial derivative of .t1 ; t2 ; : : : ; tn / of every order
with respect to each ti exists in that open rectangle, and furthermore,

@k1 Ck2 CCkn

E X1k1 X2k2 Xnkn D .t1 ; t2 ; : : : ; tn /jt1 D 0; t2 D 0; : : : ; tn D 0:
@t1k1 @tnkn

A corollary of this result is sometimes useful in determining the covariance between

two random variables.
Corollary. Let X; Y have a joint mgf in some open rectangle around the origin
.0; 0/. Then,

@2 @ @
Cov.X; Y / D .t1 ; t2 /j0;0 .t1 ; t2 /j0;0 .t1 ; t2 /j0;0 :
@t1 @t2 @t1 @t2

We also have the distribution-determining property, as in the one-dimensional case.

Theorem 2.8. Suppose .X1 ; X2 ; : : : ; Xn / and .Y1 ; Y2 ; : : : ; Yn / are two sets of
jointly distributed random variables, such that their mgfs X .t1 ; t2 ; : : : ; tn / and
Y .t1 ; t2 ; : : : ; tn / exist and coincide in some nonempty open rectangle contain-
ing the origin. Then .X1 ; X2 ; : : : ; Xn / and .Y1 ; Y2 ; : : : ; Yn / have the same joint
distribution.
114 2 Multivariate Discrete Distributions

Remark. It is important to note that the last two theorems are not limited to discrete
random variables; they are valid for general random variables. The proofs of these
two theorems follow the same arguments as in the one-dimensional case, namely
that when an mgf exists in a nonempty open rectangle, it can be differentiated in-
finitely often with respect to each variable ti inside the expectation; that is, the order
of the derivative and the expectation can be interchanged.

2.5.2 Multinomial Distribution

One of the most important multivariate discrete distributions is the multinomial dis-
tribution. The multinomial distribution corresponds to n balls being distributed to k
cells, independently, with each ball having the probability pi of being dropped into
the i th cell. The random variables under consideration are X1 ; X2 ; : : : ; Xk , where
Xi is the number of balls that get dropped into the i th cell. Then their joint pmf is
the multinomial pmf defined below.
Definition 2.13. A multivariate random vector .X1 ; X2 ; : : : ; Xk / is said to have a
multinomial distribution with parameters n; p1 ; p2 ; : : : ; pk if it has the pmf
nŠ x
P .X1 D x1 ; X2 D x2 ; : : : ; Xk D xk / D p x1 p x2 : : : pk k ;
x1 Šx2 Š xk Š 1 2
X
k
xi 0; xi Dn;
i D1

P
pi 0; kiD1 pi D 1:
We write .X1 ; X2 ; : : : ; Xk / Mult.n; p1 ; : : : ; pk / to denote a random vector
with a multinomial distribution.
Example 2.15 (Dice Rolls). Suppose a fair die is rolled 30 times. We want to find
the probabilities that
(i) Each face is obtained exactly five times.
(ii) The number of sixes is at least five.
If we denote the number of times face number i is obtained as Xi , then
.X1 ; X2 ; : : : ; X6 / Mult.n; p1 ; : : : ; p6 /, where n D 30 and each pi D 16 .
Therefore,
P .X1 D 5; X2 D 5; : : : ; X6 D 5/
5
30Š 1 5 1
D :::
.5Š/6 6 6

30Š 1 30
D
.5Š/6 6
D :0004:
2.5 Multivariate Case 115

Next, each of the 30 rolls will either be a 6 or not, independently of the other rolls,
with probability 16 , and so, X6 Bin.30; 16 /: Therefore,
!
X4
30 1 x 5 30x
P .X6 5/ D 1 P .X6 4/ D 1
xD0
x 6 6
D :5757:

Example 2.16 (Bridge). Consider a Bridge game with four players, North, South,
East, and West. We want to find the probability that North and South together
have two or more aces. Let Xi denote the number of aces in the hands of player
i; i D 1; 2; 3; 4; we let i D 1; 2 mean North and South. Then, we want to find
P .X1 C X2 2/:
The joint distribution of .X1 ; X2 ; X3 ; X4 / is Mult.4; 14 ; 14 ; 14 ; 14 / (think of each
ace as a ball, and the four players as cells). Then, .X1 C X2 ; X3 C X4 /
Mult.4; 12 ; 12 /: Therefore,
4
4Š 1 4Š 1 4 4Š 1 4
P .X1 C X2 2/ D C C
2Š2Š 2 3Š1Š 2 4Š0Š 2
11
D :
16
Important formulas and facts about the multinomial distribution are given in the
next theorem.

Theorem 2.9. Let .X1 ; X2 ; : : : ; Xk / Mult.n; p1 ; p2 ; : : : ; pk /. Then,

(a) E.Xi / D npi I Var.Xi / D npi .1 pi /I
(b) 8 i; Xi Bin.n; pi /I
(c) Cov.Xi ; Xj /qD npi pj ; 8i ¤ j I
pi pj
(d) Xi ;Xj D .1pi /.1pj /
; 8i ¤ j I
(e) 8m; 1 m < k; .X1 ; X2 ; : : : ; Xm /j.XmC1 C XmC2 C : : : C Xk / D s
Mult.n s; 1 ; 2 ; : : : ; m /;
pi
where i D p1 Cp2 C:::Cpm
:

Proof. Define Wi r as the indicator of the event that the rth ball lands in the i th cell.
Note that for a given i , the variables Wi r are independent. Then,

X
n
Xi D Wi r ;
rD1

P P
and therefore, E.Xi / D nrD1 EŒWi r D npi , and Var.Xi / D nrD1 Var.Wi r / D
npi .1 pi /: Part (b) follows from the definition of a multinomial experiment
116 2 Multivariate Discrete Distributions

(the trials are identical and independent, and each ball either lands or not in the
i th cell). For part (c),
!
Xn X
n
Cov.Xi ; Xj / D Cov Wi r ; Wjs
rD1 sD1
X
n X
n
D Cov.Wi r ; Wjs /
rD1 sD1
Xn
D Cov.Wi r ; Wjr /
rD1

(because Cov.Wi r ; Wjs / would be zero when s ¤ r)

X
n
D ŒE.Wi r Wjr / E.Wi r /E.Wjr /
rD1
Xn
D Œ0 pi pj D npi pj :
rD1

Part (d) follows immediately from part (c) and part (a). Part (e) is a calculation, and
is omitted. t
u
Example 2.17 (MGF of the Multinomial Distribution). Let .X1 ; X2 ; : : : ; Xk /
Mult.n; p1 :p2 ; : : : ; pk /. Then the mgf .t1 ; t2 ; : : : ; tk / exists at all t, and a formula
follows easily. Indeed,
X nŠ
EŒe t1 X1 CCtk Xk D
x
e t1 x1 e t2 x2 e tk xk p1x1 p2x2 cdotspk k
Pk x1 Š xk Š
xi 0; i D1 xi Dn
X nŠ
D .p1 e t1 /x1 .p2 e t2 /x2 .pk e tk /xk
Pk x1 Š xk Š
xi 0; i D1 xi Dn

D .p1 e C p2 e t2 C C pk e tk /n ;
t1

by the multinomial expansion identity

X nŠ x
.a1 C a2 C C ak /n D ax1 ax2 akk :
Pk x1 Š xk Š 1 2
xi 0; i D1 xi Dn

2.6 The Poissonization Technique

Calculation of complex multinomial probabilities often gets technically simplified

by taking the number of balls to be a random variable, specifically, a Poisson random
variable. We give the Poissonization theorem and some examples in this section.
2.6 The Poissonization Technique 117

Theorem 2.10. Let N Poi./, and suppose given N D n; .X1 ; X2 ; : : : ; Xk /

Mult.n; p1 ; p2 ; : : : ; pk /: Then, marginally, X1 ; X2 ; : : : ; Xk are independent
Poisson, with Xi Poi.pi /:

Proof. By the total probability formula,

P .X1 D x1 ; X2 D x2 ; : : : ; Xk D xk /
1
X e n
D P .X1 D x1 ; X2 D x2 ; : : : ; Xk D xk jN D n/
nD0
nŠ
X1 n
.x1 C x2 C C xk /Š x1 x2 x e
D p1 p2 pk k InDx1 Cx2 CCxk
nD0
x Šx
1 2 Š xk Š nŠ
1
D e x1 x2 xk p1 1 p2 2 pk k
x x x
x1 Šx2 Š xk Š
1
D e .p1 /x1 .p2 /x2 .pk /xk
x1 Šx2 Š xk Š
Y
k
e pi .pi /xi
D ;
xi Š
i D1

which establishes that the joint marginal pmf of .X1 ; X2 ; : : : ; Xk / is the product
of k Poisson pmfs, and so X1 ; X2 ; : : : ; Xk must be marginally independent, with
Xi Poi.pi /. t
u

Corollary. Let A be a set in the k-dimensional Euclidean space Rk . Let

.Y1 ; Y2 ; : : : ; Yk / Mult.n; p1 ; p2 ; : : : ; pk /. Then, P ..Y1 ; Y2 ; : : : ; Yk / 2 A/
equals nŠc.n/, where c.n/ is the coefficient of n in the power series expansion
of e P ..X1 ; X2 ; : : : ; Xk / 2 A/: Here X1 ; X2 ; : : : ; Xk are as above: they are
independent Poisson variables, with Xi Poi.pi /.
The corollary is simply a restatement of the identity
X1
n
P ..X1 ; X2 ; : : : ; Xk / 2 A/ D e P ..Y1 ; Y2 ; : : : ; Yk / 2 A/:
nD0
nŠ

Example 2.18 (No Empty Cells). Suppose n balls are distributed independently and
at random into k cells. We want to find a formula for the probability that no cell
remains empty.
We use the Poissonization technique to solve this problem. We want a formula
for P .Y1 ¤ 0; Y2 ¤ 0; : : : ; Yk ¤ 0/.

Marginally, each Xi Poi. k /, and therefore,

P .X1 > 0; X2 > 0; : : : ; Xk > 0/ D .1 e =k /k

) e P .X1 > 0; X2 > 0; : : : ; Xk > 0/ D e .1 e =k /k
118 2 Multivariate Discrete Distributions
!
X
k
k
D .1/x e .1x=k/
xD0
x
! 1
Xk
k X ..1 x=k//n
D .1/x
xD0
x nD0 nŠ
1
!
X n X
k
k
D Œ .1/x .1 x=k/n :
nD0
nŠ xD0
x

Therefore, by the above corollary,

!
X
k
k
P .Y1 ¤ 0; Y2 ¤ 0; : : : ; Yk ¤ 0/ D .1/ x
.1 x=k/n :
xD0
x

Exercises

Exercise 2.1. Consider the experiment of picking one word at random from the
sentence
ALL IS WELL IN THE NEWELL FAMILY
Let X be the length of the word selected and Y the number of Ls in it. Find in a
tabular form the joint pmf of X and Y , their marginal pmfs, means, and variances,
and the correlation between X and Y .
Exercise 2.2. A fair coin is tossed four times. Let X be the number of heads, Z the
number of tails, and Y D jX Zj. Find the joint pmf of .X; Y /, and E.Y /.
Exercise 2.3. Consider the joint pmf p.x; y/ D cxy; 1 x 3; 1 y 3.
(a) Find the normalizing constant c.
(b) Are X; Y independent? Prove your claim.
(c) Find the expectations of X; Y; X Y:
Exercise 2.4. Consider the joint pmf p.x; y/ D cxy; 1 x y 3.
(a) Find the normalizing constant c.
(b) Are X; Y independent? Prove your claim.
(c) Find the expectations of X; Y; X Y:
Exercise 2.5. A fair die is rolled twice. Let X be the maximum and Y the minimum
of the two rolls. By using the joint pmf of .X; Y / worked out in text, find the pmf
of X
Y
, and hence the mean of X Y
.
Exercise 2.6. A hat contains four slips of paper, numbered 1, 2, 3, and 4. Two slips
are drawn at random, without replacement. X is the number on the first slip and Y
the sum of the two numbers drawn. Write in a tabular form the joint pmf of .X; Y /.
Hence find the marginal pmfs. Are X; Y independent?
Exercises 119

Exercise 2.7 * (Conditional Expectation in Bridge). Let X be the number of

clubs in the hand of North and Y the number of clubs in the hand of South in a
Bridge game. Write a general formula for E.X jY D y/, and compute E.X jY D 3/.
How about E.Y jX D 3/?

Exercise 2.8. A fair die is rolled four times. Find the probabilities that:
(a) At least 1 six is obtained;
(b) Exactly 1 six and exactly one two is obtained,
(c) Exactly 1 six, 1 two, and 2 fours are obtained.

Exercise 2.9 (Iterated Expectation). A household has a Poisson number of cars

with mean 1. Each car that a household possesses has, independently of the other
cars, a 20% chance of being an SUV. Find the mean number of SUVs a household
possesses.

Exercise 2.10 (Iterated Variance). Suppose N Poi./, and given N D n; X is

distributed as a uniform on f0; 1; : : : ; ng. Find the variance of the marginal distribu-
tion of X .

Exercise 2.11. Suppose X and Y are independent Geo.p/ random variables. Find
P .X Y /I P .X > Y /:

Exercise 2.12. * Suppose X and Y are independent Poi./ random variables. Find
P .X Y /I P .X > Y /:

Hint: This involves a Bessel function of a suitable kind.

Exercise 2.13. Suppose X and Y are independent and take the values 1, 2, 3, 4 with
probabilities .2, .3, .3, .2. Find the pmf of X C Y .

Exercise 2.14. Two random variables have the joint pmf p.x; x C 1/ D
1
nC1
; x D 0; 1; : : : ; n. Answer the following questions with as little calculation
as possible.
(a) Are X; Y independent?
(b) What is the variance of Y X ?
(c) What is Var.Y jX D 1/?

Exercise 2.15 (Binomial Conditional Distribution). Suppose X; Y are indepen-

dent random variables, and that X Bin.m; p/; Y Bin.n; p/. Show that the
conditional distribution of X given X C Y D t is a hypergeometric distribution;
identify the parameters of this hypergeometric distribution.

Exercise 2.16 * (Poly-Hypergeometric Distribution). A box has D1 red, D2

green, and D3 blue balls. Suppose n balls are picked at random without replace-
ment from the box. Let X; Y; Z be the number of red, green, and blue balls selected.
Find the joint pmf of .X; Y; Z/.
120 2 Multivariate Discrete Distributions

Exercise 2.17 (Bivariate Poisson). Suppose U; V; W are independent Poisson ran-

dom variables, with means ; ; . Let X D U C W I Y D V C W:
(a) Find the marginal pmfs of X; Y .
(b) Find the joint pmf of .X; Y /.

Exercise 2.18. Suppose a fair die is rolled twice. Let X; Y be the two rolls. Find
the following with as little calculation as possible:
(a) E.X C Y jY D y/.
(b) E.X Y jY D y/.
(c) Var.X 2 Y jY D y/.
(d) XCY;XY :

Exercise 2.19 (A Waiting Time Problem). In repeated throws of a fair die, let X
be the throw in which the first six is obtained, and Y the throw in which the second
six is obtained.
(a) Find the joint pmf of .X; Y /.
(b) Find the expectation of Y X .
(c) Find E.Y X jX D 8/.
(d) Find Var.Y X jX D 8/.

Exercise 2.20 * (Family Planning). A couple want to have a child of each sex, but
they will have at most four children. Let X be the total number of children they
will have and Y the number of girls at the second childbirth. Find the joint pmf of
.X; Y /, and the conditional expectation of X given Y D y; y D 0; 2.

Exercise 2.21 (A Standard Deviation Inequality). Let X; Y be two random vari-

ables. Show that XCY X C Y :

Exercise 2.22 * (A Covariance Fact). Let X; Y be two random variables. Suppose

E.X jY D y/ is nondecreasing in y. Show that X;Y 0, assuming the correlation
exists.

Exercise 2.23 (Another Covariance Fact). Let X; Y be two random variables.

Suppose E.X jY D y/ is a finite constant c. Show that Cov.X; Y / D 0:

Exercise 2.24 (Two-Valued Random Variables). Suppose X; Y are both two-

valued random variables. Prove that X and Y are independent if and only if they
have a zero correlation.

Exercise 2.25 * (A Correlation Inequality). Suppose X; Y each have p mean 0 and

variance 1, and a correlation . Show that E.maxfX 2 ; Y 2 g/ 1 C 1 2 .

Exercise 2.26 (A Covariance Inequality). Let X be any random variable, and

g.X /; h.X / two functions such that they are both nondecreasing or both nonin-
creasing. Show that Cov.g.X /; h.X // 0:
Exercises 121

Exercise 2.27 (Joint MGF). Suppose a fair die is rolled four times. Let X be the
number of ones and Y the number of sixes. Find the joint mgf of X and Y , and
hence, the covariance between X; Y .

Exercise 2.28 (MGF of Bivariate Poisson). Suppose U; V; W are independent

Poisson random variables, with means ; ; . Let X D U C W I Y D V C W:
Find the joint mgf of X; Y , and hence E.X Y /.

Exercise 2.29 (Joint MGF). In repeated throws of a fair die, let X be the throw in
which the first six is obtained, and Y the throw in which the second six is obtained.
Find the joint mgf of X; Y , and hence the covariance between X and Y .

Exercise 2.30 * (Poissonization). A fair die is rolled 30 times. By using the Pois-
sonization theorem, find the probability that the maximum number of times any face
appears is 9 or more.

Exercise 2.31 * (Poissonization). Individuals can be of one of three genotypes in

a population. Each genotype has the same percentage of individuals. A sample of n
individuals from the population will be taken. What is the smallest n for which with
probability :9, there are at least five individuals of each genotype in the sample?

SMA 2231 Probability and Statistics III
100% (1)
SMA 2231 Probability and Statistics III
89 pages
Chapter 4: Multiple Random Variables
No ratings yet
Chapter 4: Multiple Random Variables
34 pages
chap 3.1
No ratings yet
chap 3.1
25 pages
STAT2601A (23-24, 1st) Chapter 7
No ratings yet
STAT2601A (23-24, 1st) Chapter 7
24 pages
Joint Distribution
No ratings yet
Joint Distribution
37 pages
Chapter 7
No ratings yet
Chapter 7
55 pages
Course_Chap3_Proba_S5_Centrale_Casa
No ratings yet
Course_Chap3_Proba_S5_Centrale_Casa
99 pages
PRO-Ch4 (2021-22 Note
No ratings yet
PRO-Ch4 (2021-22 Note
52 pages
ST 610 Lect 4
No ratings yet
ST 610 Lect 4
49 pages
Theories Joint Distribution
No ratings yet
Theories Joint Distribution
25 pages
Theories Joint Distribution PDF
No ratings yet
Theories Joint Distribution PDF
25 pages
Chapter 5
No ratings yet
Chapter 5
56 pages
1.10 Two-Dimensional Random Variables: Chapter 1. Elements of Probability Distribution Theory
No ratings yet
1.10 Two-Dimensional Random Variables: Chapter 1. Elements of Probability Distribution Theory
13 pages
1.10 Two-Dimensional Random Variables: Chapter 1. Elements of Probability Distribution Theory
No ratings yet
1.10 Two-Dimensional Random Variables: Chapter 1. Elements of Probability Distribution Theory
13 pages
Joint Distributions, Independence Class 7, 18.05 Jeremy Orloff and Jonathan Bloom
No ratings yet
Joint Distributions, Independence Class 7, 18.05 Jeremy Orloff and Jonathan Bloom
11 pages
Stats 116 SU
No ratings yet
Stats 116 SU
128 pages
Advanced Probability & Statistics __ 23CST-286
No ratings yet
Advanced Probability & Statistics __ 23CST-286
28 pages
Chapter IV-2
No ratings yet
Chapter IV-2
35 pages
Lecture-1-2025 template
No ratings yet
Lecture-1-2025 template
29 pages
Elec2600 Lecture Part III H
No ratings yet
Elec2600 Lecture Part III H
237 pages
PTS2 Reader
No ratings yet
PTS2 Reader
139 pages
CH#7 Multiple-Discrete-Random-Variables
No ratings yet
CH#7 Multiple-Discrete-Random-Variables
30 pages
S201,Lec 1
No ratings yet
S201,Lec 1
33 pages
(Last) Extension of Several Random Variables
No ratings yet
(Last) Extension of Several Random Variables
16 pages
Bivariate Lecture Ppt Conditional Independence ERP
No ratings yet
Bivariate Lecture Ppt Conditional Independence ERP
70 pages
Lecture 7 - Fall 2023
No ratings yet
Lecture 7 - Fall 2023
29 pages
Probability Mammadli Ilgar
No ratings yet
Probability Mammadli Ilgar
27 pages
Supportive Notes & QB-Distribution Theory-PS-Unit2
No ratings yet
Supportive Notes & QB-Distribution Theory-PS-Unit2
11 pages
Sample Math Probability Class in SUTD
No ratings yet
Sample Math Probability Class in SUTD
41 pages
Bivariate Discrete Probability
No ratings yet
Bivariate Discrete Probability
18 pages
Econ-2042- Unit 4-HO
No ratings yet
Econ-2042- Unit 4-HO
13 pages
Formulas Section3 7
No ratings yet
Formulas Section3 7
1 page
Multivariate Distributions Chapter
No ratings yet
Multivariate Distributions Chapter
70 pages
Brief Notes #4 Random Vectors: and X
No ratings yet
Brief Notes #4 Random Vectors: and X
4 pages
CS145: Probability & Computing: Lecture 6: Multiple Discrete Variables, Joint & Conditional Distributions, Independence
No ratings yet
CS145: Probability & Computing: Lecture 6: Multiple Discrete Variables, Joint & Conditional Distributions, Independence
23 pages
M 4
No ratings yet
M 4
18 pages
Sma 2201
No ratings yet
Sma 2201
35 pages
Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables
No ratings yet
Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables
124 pages
S2 Vol1 Jointdistributions
No ratings yet
S2 Vol1 Jointdistributions
72 pages
Chap 5
No ratings yet
Chap 5
34 pages
Chapter 6 - Joint Distributions
No ratings yet
Chapter 6 - Joint Distributions
20 pages
Chap 5 PME
No ratings yet
Chap 5 PME
48 pages
Lecture03 Ch 03 DiscRVs Baron Inf Stats Final FA24
No ratings yet
Lecture03 Ch 03 DiscRVs Baron Inf Stats Final FA24
92 pages
Joint
No ratings yet
Joint
5 pages
Multivariate Random Variables
No ratings yet
Multivariate Random Variables
29 pages
UW MATH-STAT395 Bivariate-Distributions PDF
No ratings yet
UW MATH-STAT395 Bivariate-Distributions PDF
17 pages
LectSlides#3
No ratings yet
LectSlides#3
80 pages
STB251 Unit 2b
No ratings yet
STB251 Unit 2b
19 pages
unit 2b
No ratings yet
unit 2b
19 pages
Lecture 7 - Fall 2023
No ratings yet
Lecture 7 - Fall 2023
28 pages
Review of Basic Probability: 1.1 Random Variables and Distributions
No ratings yet
Review of Basic Probability: 1.1 Random Variables and Distributions
8 pages
Unit - Ii
No ratings yet
Unit - Ii
65 pages
08 Bivariate Distributions
No ratings yet
08 Bivariate Distributions
65 pages
Week 11
No ratings yet
Week 11
24 pages
Module 3
No ratings yet
Module 3
93 pages
Joint Distributions: A Random Variable Is That Maps To Numbers
No ratings yet
Joint Distributions: A Random Variable Is That Maps To Numbers
37 pages
Joint and Conditional Probability Distributions
No ratings yet
Joint and Conditional Probability Distributions
52 pages
Slide 5 01
No ratings yet
Slide 5 01
26 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Differential Equations: A Concise Course
From Everand
Differential Equations: A Concise Course
H. S. Bear
5/5 (3)
Complete Download Lectures on Probability Theory and Mathematical Statistics 2nd Edition Marco Taboga PDF All Chapters
100% (3)
Complete Download Lectures on Probability Theory and Mathematical Statistics 2nd Edition Marco Taboga PDF All Chapters
55 pages
Copula Intro
No ratings yet
Copula Intro
8 pages
Continuous Random Variables III
No ratings yet
Continuous Random Variables III
1 page
Kolmogorov Smirnov Test
100% (1)
Kolmogorov Smirnov Test
3 pages
AE 248: AI and Data Science: Prabhu Ramachandran 2024-01-01
No ratings yet
AE 248: AI and Data Science: Prabhu Ramachandran 2024-01-01
12 pages
GENG5507 STATS Lecture Week1 Introduction
No ratings yet
GENG5507 STATS Lecture Week1 Introduction
22 pages
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
11 pages
Rcourse PDF
No ratings yet
Rcourse PDF
237 pages
You-Tung2018 Article DerivationOfRainfallIDFRelatio
No ratings yet
You-Tung2018 Article DerivationOfRainfallIDFRelatio
16 pages
Exercise Probability
No ratings yet
Exercise Probability
87 pages
RVSP micro syllabus
No ratings yet
RVSP micro syllabus
3 pages
Notes Feb 13 Binomial Distribution Day 2
No ratings yet
Notes Feb 13 Binomial Distribution Day 2
3 pages
18.05 Problem Set 4, Spring 2014
No ratings yet
18.05 Problem Set 4, Spring 2014
3 pages
Econometrics-CH-4 (1)
No ratings yet
Econometrics-CH-4 (1)
14 pages
A Methodology For Quantifying The Number of Lightning-Initiated Simultaneous Outages of Parallel Transmission Lines
No ratings yet
A Methodology For Quantifying The Number of Lightning-Initiated Simultaneous Outages of Parallel Transmission Lines
12 pages
Statistical Method in Civil Enginerng
No ratings yet
Statistical Method in Civil Enginerng
80 pages
Lecture Prob
No ratings yet
Lecture Prob
160 pages
Advanced BASIC Scientific Subroutines
No ratings yet
Advanced BASIC Scientific Subroutines
189 pages
Univariate Continuous Distribution Theory - m347 - 1
No ratings yet
Univariate Continuous Distribution Theory - m347 - 1
30 pages
statics CH5
No ratings yet
statics CH5
89 pages
EXAM1 Practise MTH2222
No ratings yet
EXAM1 Practise MTH2222
4 pages
Ca1 BSM301
No ratings yet
Ca1 BSM301
10 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
14 pages
Probabilistic Design Using ANSYS
No ratings yet
Probabilistic Design Using ANSYS
26 pages
Matlab Simulation
100% (1)
Matlab Simulation
183 pages
Tutorial 3
No ratings yet
Tutorial 3
2 pages
LTE Power
No ratings yet
LTE Power
2 pages
MA202 Probability Distributions Transforms and Numerical Methods
No ratings yet
MA202 Probability Distributions Transforms and Numerical Methods
3 pages
Guttman 1999
No ratings yet
Guttman 1999
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Multivariate Discrete Distributions

Uploaded by

Multivariate Discrete Distributions

Uploaded by

Chapter 2

Multivariate Discrete Distributions

We have provided a detailed overview of distributions of one discrete or one

2.1 Bivariate Joint Distributions and Expectations of Functions

We present the fundamentals of joint distributions of two variables in this section.

A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals 95

D fHHH; HH T; H TH; H T T; THH; TH T; T TH; T T T g:

For example, why is p.0; 1/ 81 ? This is because the combination .X D 0; Y D 1/ is

The requirements of a joint pmf are that

the joint pmf looks like in the form of a table:

From the individual pmf of X , we can find the expectation of X . Indeed,

Definition 2.3. Let p.x; y/ be the joint pmf

and for any event A, X

Example 2.3. Consider a joint pmf given by the formula

But, we can also use linearity of expectations and find E.X  Y / as

(see Example 2.2 for E.X /; E.Y /).

2.2 Conditional Distributions and Conditional Expectations

and the conditional expectation of X given Y D y is defined to be

The conditional distribution of Y given X D x and the conditional expectation of

Proposition. Let X; Y be random variables defined on a common sample space .

p.xjy/ D pX .x/; 8 x; y such that pY .y/ > 0I

EŒg.X /jY D y D EŒg.X /:

2.2.1 Examples on Conditional Distributions and Expectations

P .Y D 1jX D 1/ D 1I P .Y D yjX D 1/ D 0; if y > 1I

Example 2.6 (Conditional Expectation in a 2 2 Table). Suppose X; Y are binary

We want to evaluate the conditional expectation of X given Y D 0; 1, respectively.

It follows that we can now have the single formula

y D 0; 1. We now realize that the conditional expectation of X given Y D y is a

Example 2.7 (Conditional Expectation in Dice Experiment). Consider again the

We often write casually Var.X jy/ to mean Var.X jY D y/.

Theorem 2.2 (Poisson Conditional Distribution). Let X; Y be independent

Proof. Clearly, P .X D xjX C Y D t/ D 0 8x > t: For x t,

(on using the fact that X C Y  Poi. C /; see Chapter 1)

2.3 Using Conditioning to Evaluate Mean and Variance

Conditioning is often an extremely effective tool to calculate probabilities, means,

Theorem 2.3 (Iterated Expectation Formula). Let X; Y be random variables

where X .y/ D E.X jY D y/.

Theorem 2.4 (Iterated Variance Formula). Let X; Y be random variables de-

Var.X / D EY ŒVar.X jY D y/ C VarY ŒE.X jY D y/:

Conditional Expectation and Variance Rules.

E.g.X /jX D x/ D g.x/I E.g.X /h.Y /jY D y/ D h.y/E.g.X /jY D y/I

Var.X / D EY ŒVar.X jY D y/ C VarY ŒE.X jY D y/

E.X / D EN ŒE.X jN D n/ D EN Œnp D p;

Var.X / D EN ŒVar.X jN D n/ C VarN .E.X jN D n/

Theorem 2.5 (Higher-Order Iterated Expectation). Let X; Y; Z be random

E.X / D EY ŒEZjY fE.X jY D y; Z D z/g:

2.4 Covariance and Correlation

Var.X C Y / D E.X C Y /2  ŒE.X C Y /2

Cov.X; Y / D E.X Y /  E.X /E.Y / D EŒ.X  E.X //.Y  E.Y //:

Remark. Covariance is a measure of whether two random variables X; Y tend to in-

Definition 2.8 (Correlation). Let X; Y be two random variables defined on a

(d) For any two independent random variables X; Y; Cov.X; Y / D X;Y D 0I

Proof. For part (a), Cov.X; c/ D E.cX /  E.c/E.X / D cE.X /  cE.X / D 0.

Then, E.U 2 / D E.V 2 / D 1, and

X;Y D E.U V / E.U 2 /E.V 2 / D 1:

The lower bound X;Y 1 follows similarly.

E.X Y /  E.X /E.Y /

E.XN / D EN ŒE.XN jN D n/ D EN ŒnE.X jN D n/

E.XN /  E.X /E.N /

DEŒY .aCbX /2 D a2 Cb 2 E.X 2 /C2abE.X /2aE.Y /2bE.X Y /CE.Y 2 /:

To minimize this with respect to a; b, we partially differentiate R.a; b/ with respect

Simultaneously solving these two equations, we get

E.X Y /  E.X /E.Y / E.X Y /  E.X /E.Y /

These values do minimize R.a; b/ by an easy application of the second derivative

The best linear predictor is also known as the regression line of Y on X . It is of

2.5 Multivariate Case

Definition 2.9. Let X1 ; X2 ; : : : ; Xn be discrete random variables defined on a com-

Definition 2.10. Let X1 ; X2 ; : : : ; Xn be random variables defined on a common

The requirements of a joint CDF are somewhat more complicated.

Œa1 ; b1 Œa2 ; b2    Œan ; bn :

This last condition, (v), is a notationally clumsy condition to write down. If n D 2,

F .b1 ; b2 /  F .a1 ; b2 /  F .b1 ; a2 / C F .a1 ; a2 / 0 8a1 b1 ; a2 b2 :

2.5.1 Joint MGF

Definition 2.11. Let X1 ; X2 ; : : : ; Xn be discrete random variables defined on

Let the joint pmf of X1 ; X2 ; : : : ; Xn be p.x1 ; : : : ; xn /: Let g.x1 ; : : : ; xn / be

A corresponding definition when X1 ; X2 ; : : : ; Xn are all continuous random vari-

But, we can also use linearity of expectations and find E.X Y / as

(on using the fact that X C Y Poi. C /; see Chapter 1)

where X .y/ D E.X jY D y/.

E.X / D EN ŒE.X jN D n/ D EN Œnp D p;

Var.X C Y / D E.X C Y /2 ŒE.X C Y /2

Cov.X; Y / D E.X Y / E.X /E.Y / D EŒ.X E.X //.Y E.Y //:

(d) For any two independent random variables X; Y; Cov.X; Y / D X;Y D 0I

Proof. For part (a), Cov.X; c/ D E.cX / E.c/E.X / D cE.X / cE.X / D 0.

X;Y D E.U V / E.U 2 /E.V 2 / D 1:

The lower bound X;Y 1 follows similarly.

E.X Y / E.X /E.Y /

E.XN / E.X /E.N /

DEŒY .aCbX /2 D a2 Cb 2 E.X 2 /C2abE.X /2aE.Y /2bE.X Y /CE.Y 2 /:

E.X Y / E.X /E.Y / E.X Y / E.X /E.Y /

Œa1 ; b1 Œa2 ; b2 Œan ; bn :

F .b1 ; b2 / F .a1 ; b2 / F .b1 ; a2 / C F .a1 ; a2 / 0 8a1 b1 ; a2 b2 :

@k1 Ck2 CCkn

Theorem 2.9. Let .X1 ; X2 ; : : : ; Xk / Mult.n; p1 ; p2 ; : : : ; pk /. Then,

Theorem 2.10. Let N Poi./, and suppose given N D n; .X1 ; X2 ; : : : ; Xk /

Marginally, each Xi Poi. k /, and therefore,

P .X1 > 0; X2 > 0; : : : ; Xk > 0/ D .1 e =k /k

Exercise 2.10 (Iterated Variance). Suppose N Poi./, and given N D n; X is