Problems: MN) MN
Problems: MN) MN
PROBLEMS
of h, and a rough guess of 1 .0 meters for the standard deviation of the samples Xt •
(a) How large should n be so that the standard deviation of Mn is at most 1 cen
timeter?
(b) How large should n be so that Chebyshev's inequality guarantees that the esti
mate is within 5 centimeters from h, with probability at least 0.99?
(c) The statistician realizes that all persons in the population have heights between
1 .4 and 2.0 meters, and revises the standard deviation figure that he uses based
on the bound of Example 5.3. How should the values of n obtained in parts (a)
and (b) be revised?
Problem 2. * The Chernoff bound. The Chernoff bound is a powerful tool that
relies on the transform associated with a random variable, and provides bounds on the
probabilities of certain tail events.
(a) Show that the inequality
holds for every a and every s � 0, where M(s) = E[e S X ] is the transform associ
ated with the random variable X, assumed to be finite in a small open interval
containing s = O.
(b) Show that the inequality
(e) Apply the result of part (c) to obtain a bound for P(X � a ) , for the case where
X is a standard normal random variable and a > 0.
(f) Let Xl , X2 , . . . be independent random variables with the same distribution as
X. Show that for any a > E[X], we have
so that the probability that the sample mean exceeds the mean by a certain
amount decreases exponentially with n.
Solution. (a) Given some a and s � 0, consider the random variable Ya defined by
if X < a,
if X � a.
(b) The argument is similar to the one for part (a) . We define Ya by
{ e sa , if X � a,
Ya =
0, if X > a.
(c) Since the inequality from part (a) is valid for every s � 0, we obtain
P(X � a) � min
s�o
(e - sa j\,f (s))
= mIn. e - ( sa - In M( s ))
s�o
- e max s -> O ( sa - In M ( s ) )
_ -
- e -q)( a ) .
_
I s =O = a - 1 · E[X] > o.
.
Since the function sa - In M (s) is zero and has a positive derivative at s = 0, it must be
positive when s is positive and small. It follows that the maximum ¢(a) of the function
sa - In M (s) over all s � 0 is also positive.
(e) For a standard normal random variable X , we have M(s) = e s 2 /2 • Therefore,
sa - In M(s) = sa - s 2 /2. To maximize this expression over all s � 0, we form the
derivative, which is a - s, and set it to zero, resulting in s = a. Thus, ¢(a) = a2 /2,
which leads to the bound
Note: In the case where a � E [XJ , the maximizing value of s turns out to be s = 0,
resulting in ¢( a) = 0 and in the uninteresting bound
P(X � a) � 1 .
(f) Let Y = Xl + . . . + Xn Using the result of part (c), we have
.
P (* i?' a) = � n
P(Y � na) :o e -> y ( n) ,
where
¢y (na ) = max
s�O
(nsa - In My (s) ) ,
and
=
is the transform associated with Y. We have In My (s) n ln M(s) , from which we
obtain
¢y ( )na . maxO ( sa - In M (s)) n¢( )
= n = a ,
s�
Problems 281
and
p (;; t ) Xi ::: a :; , - n .( a )
Note that when a > E [XJ , part (d) asserts that ¢ (a) > 0, so the probability of interest
decreases exponentially with n.
Problem 3. * Jensen inequality. A twice differentiable real-valued function J or a
single variable is called convex if its second derivative ( d2 J Idx2 )( x ) is nonnegative for
all x in its domain of definition.
(a) Show that the functions J(x ) eOx, J( x ) - In x, and J( x ) = X 4 are all convex.
= =
(b) Show that if J is twice differentiable and convex, then the first order Taylor
approximation of J is an underestimate of the function. that is,
d
J(a) + ( x - a) dxJ (a) � J( x ).
(b) Since the second derivative of J is nonnegative, its first derivative must be nonde
creasing. Using the fundamental theorem of calculus, we obtain
J( x ) = J(a) + J.x : (t) dt � J(a) + J.x : (a) dt = J(a) + ( x - a) �� (a).
(c) Since the inequality from part (b) is assumed valid for every possible value x of the
random variable X , we obtain
J(a) + ( X - a) �� (a) � J( X ) .
size n to be the smallest possible number for which the Chebyshev inequality yields a
guarantee that
where € and 6 are some prespecified tolerances. Determine how the value of n recom
mended by the Chebyshev inequality changes in the following cases.
(a) The value of f is reduced to half its original value.
(b) The probability 6 is reduced to half its original value.
some limit, and identify the limit, for each of the following cases;
• • • •
(a) Yn = Xn ln .
(b) Yn = (Xn) n .
(c) Yn = Xl ' X2 X• . •
n.
cXn. Xn + Yn . max{O, Xn} . IXn l . and XnYn all converge in probability to corresponding
limits.
Solution. Let x and y be the limits of Xn and Yn , respectively. Fix some € > 0 and a
constant If = 0, then cXn equals zero for all n, and convergence trivially holds. If
e. e
and
nlim lim p (IXn - xl 2: €/2) + nlim
-+� p (IXn + Y,, - x - yl 2: €) � n-+oo -+oc
p (IYn - y l � €/2) = 0,
Problems 289
where the last equality follows since Xn and Yn converge, in probability, to x and y ,
respectively.
By a similar argument, it is seen that the event { I max{O, Xn } - max{O, x}1 2: } E
To bound this probability. we note that for I(Xn - X)(Yn - y ) 1 to be as large as E/2, we
need either IXn - xl or IYn - xl (or both) to be at least v;J2. The rest of the proof
is similar to the earlier proof that Xn + Yn converges in probability.
Problem 7. * A sequence Xn of random variables is said to converge to a number c
(a) Show that convergence in the mean square implies convergence in probability.
(b) Give an example that shows that convergence in probability does not imply con
vergence in the mean square.
Solution. (a) Suppose that Xn converges to c in the mean square. Using the Markov
inequality, we have
(c) Let N be the first day on which the total number of gadgets produced exceeds
1000 . Calculate an approximation to the probability that N 2: 220.
Problem 1 1 . Let X YI , X2 . Y2 •
I. be independent random variables, uniformly
distributed in the unit interval [0. l ] , and let
• • •
W=
( Xl + . . . + X 1 6) - ( YI
16
+ . . . + Y16 )
.
Find a numerical approximation to the quantity
p(I"" - E[ltV1 1 < 0.001) .
ance and associated transform Alx (s) . We assume that Mx (s) is finite when
-d < s < d, where d is some positive number. Let
( b) Suppose that the transform Mx (s) has a second order Taylor series expansion
around s = 0, of the form
No t e: The central limit theorem follows from the result of part ( c ) , together with the
fact ( whose proof lies beyond the scope of this text ) that if the transforms MZn (s)
converge to the transform Mz (s) of a random variable Z whose CDF is continuous,
then the CDFs FZn converge to the CDF of Z. In our case, this implies that the CDF
of Zn converges to the CDF of a standard normal.
Solution. ( a) We have, using the independence of the Xi ,
MZn (s) = E [e S zn ]
=E [exp { ��t,x'}l
n
i=l
and using the formulas for a, b, and c from part ( b ) , it follows that
292 Limit Theorems Chap. 5
to obtain
Xn + Yn converges to a+b, with probability 1. Also, assuming that the random variables
Yn cannot be equal to zero, show that Xn / Yn converges to a l b, with probability 1.
Solution. Let A ( respectively, B) be the event that the sequence of values of the
random variables Xn ( respectively, Yn ) does not converge to a (respectively, b) . Let C
be the event that the sequence of values of Xn + Yn does not converge to a + b and
notice that C C A U B .
Since Xn and Yn converge to a and b, respectively, with probability 1. we have
P (A ) = 0 and P(B) O. Hence,
=
tributed random variables. We assume that the Xl and � have finite mean, and that
• • •
c, c
probability.
Solution. Let C be the event that the sequence of values of the random variables Yn
converges to By assumption, we have P( C) = 1. Fix some € > 0, and let Ak be
c.
the event that I Yn l < € for every n � k. If the sequence of values of the random
c
variables Yn converges to then there must exist some k such that for every n � k,
-
c,
Problems 293
this sequence of values is within less than E from c. Therefore, every element of C
belongs to Ak for some k, or
U Ak.
00
Cc
k= 1
Note also that the sequence of events Ak is monotonically increasing, in the sense
that Ak C Ak+ l for all k. Finally, note that the event A k is a subset of the event
{ I Yk - cl < E } . Therefore,
lim p ( I Yk - cl < E ) � lim P(Ak) = P(U�= I Ak ) � P(C) = 1,
k -oo k -oc
where the first equality uses the continuity property of probabilities (Problem 13 in
Chapter 1). It follows that
lim p ( I Yk - cl � E ) = 0,
k-oc
which establishes convergence in probability.
Problem 16. '" Consider a sequence Yn of nonnegative random variables and suppose
that
E [t. ] �
Yn = E[Yn ) .
The fact that the expectation and the infinite summation can be interchanged, for
the case of nonnegative random variables, is known as the monotone convergence
theorem. a fundamental result of probability theory, whose proof lies beyond the scope
of this text.
Solution. We note that the infinite sum 2::'= 1 Yn must be finite, with probability
1 . Indeed. if it had a positive probability of being infinite, then its expectation would
also be infinite. But if the sum of the values of the random variables Yn is finite, the
sequence of these values must converge to zero . Since the probability of this event is
equal to 1 , it follows that the sequence Yn converges to zero, with probability 1 .
Problem 17. '" Consider a sequence of Bernoulli random variables Xn , and let pn =
P(Xn = 1) be the probability of success in the nth trial. Assuming that 2::'= 1 pn < 00 ,
show that the number of successes is finite, with probability 1 . [Compare with Problem
48(b) in Chapter 1 .]
Solution. Using the monotone convergence theorem (see above note) , we have
294 Limit Theorems Chap. 5
L Xn < 00 ,
n =l
with probability 1 . We then note that the event { 2:::'1 Xn < oo } is the same as the
event that there is a finite number of successes.
Problem 18. * The strong law of large numbers. Let Xl , X2, . . . be a sequence
of independent identically distributed random variables and assume that E[XtJ < 00 .
Prove the strong law of large numbers.
Solution. We note that the assumption E[XtJ < 00 implies that the expected value of
the Xi is finite. Indeed, using the inequality Ixl � 1 + X4 , we have
E [IXi l] � 1 + E[XtJ < 00 .
E L [
X (Xl + . . . + Xn) 4 <
n4
] 00 .
n=l
We have
[ 4 ]
E (Xl + . n. 4. + Xn) = n14 � ���
L- L- L- L- E[Xi l Xi 2 X' 3 Xi4 J ·
' 1 = 1 '2 = 1 13 = 1 '4 = 1
Let us consider the various terms in this sum. If one of the indices is different from
all of the other indices, the corresponding term is equal to zero. For example, if i l is
different from i2, i3, or i4 , the assumption E[XiJ = 0 yields
E[Xi l Xi2 Xi3 Xi4 J = E[Xil JE[Xi2 Xi3 Xi4 J = o .
Therefore, the nonzero terms in the above sum are either of the form E[xtl (there are
n such terms) , or of the form E[Xl XJl, with i =f j . Let us count how many terms
there are of this form. Such terms are obtained in three different ways: by setting
il = i2 =f i3 = i4, or by setting i l = i3 =f i2 = i4, or by setting i l = i4 =f i2 = i3 . For
each one of these three ways, we have n choices for the first pair of indices, and n - 1
choices for the second pair. We conclude that there are 3n(n - 1 ) terms of this type.
Thus,
E [(Xl + . . . + Xn) 4 ] nE[Xt l + 3n(n - I)E[X� X�l .
=
2 2
Using the inequality xy � (x + y ) / 2, we obtain E[Xf xiJ � E[XtJ, and
E [(XI + . . . + Xn) 4 ] � (n + 3n(n - 1 ))E [Xtl � 3n2 E [Xt J .
It follows that
2
where the last step uses the well known property 2:�=1 n - < 00. This implies that
(Xl + . . . +Xn) 4 /n4 converges to zero with probability 1 (cf. Problem 16), and therefore,
(Xl + . . . + Xn)/n also converges to zero with probability 1 , which is the strong law of
large numbers.
For the more general case where the mean of the random variables Xi is nonzero,
the preceding argument establishes that ( X l + . . . + Xn - nE [Xl.l )/n converges to zero,
which is the same as (X l + . . . + Xn)/n converging to E[Xl], with probability 1 .