Math 20 - Inequalities of Markov and Chebyshev
Math 20 - Inequalities of Markov and Chebyshev
Chebyshev
Often, given a random variable X whose distribution is unknown but whose expected value
µ is known, we may want to ask how likely it is for X to be ‘far’ from µ, or how likely it is
for this random variable to be ‘very large.’ This would give us some idea of the spread of
the distribution, though perhaps not a complete picture.
Proposition 1 (Markov’s Inequality). Let X be a random variable that takes only nonneg-
ative values. Then for any positive real number a,
E(X)
P (X ≥ a) ≤
a
provided E(X) exists.
For example, Markov’s inequality tells us that as long as X doesn’t take negative values,
the probability that X is twice as large as its expected value is at most 21 , which we can see
by setting a = 2E(X). More generally, the probability that a random variable is at least k
times its expected value is at most k1 . Notice that the only things we assumed about this
random variable are that it can’t be negative and has finite mean; we don’t need to know
anything about its variance or it’s probability distribution, in general.
Proof. We’ll prove this for discrete RVs, but the proof for continuous RVs is essentially the
same, replacing sums with integrals.
P
By definition, E(X) = x xP (X = x). We’ll split this sum into two pieces, depending on
whether or not x ≥ a.
X X
E(X) = xP (X = x) + xP (X = x)
x≥a x<a
X
≥ aP (X = x) + 0 (since in the first sum we assume x ≥ a)
x≥a
X
=a P (X = x)
x≥a
= aP (X ≥ a)
Example 2. Suppose that the average grade on the upcoming Math 20 exam is 70%. Give
an upper bound on the proportion of students who score at least 90%.
E(X) 7
P (X ≥ 90) ≤ =
90 9
so at most 77.8% of students can possible score this high. But in order to achieve this
average, we would need 79 of the class to score a 90 and the remaining 29 to score a 0...
1
2
Example 3. A coin is weighted so that its probability of landing on heads is 20%. Suppose
the coin is flipped 20 times. Find a bound for the probability it lands on heads at least 16
times.
We actually do know this distribution; it’s the the binomial distribution with n = 20 and
p = 15 . It’s expected value is 4. Markov’s inequality tells us that
E(X) 1
P (X ≥ 16) ≤ = .
16 4
Let’s compare this to the actual probability that this happens:
20
X 20
P (X ≥ 16) = 0.2k · 0.820−k ≈ 1.38 · 10−8 .
k=16
k
So it seems like this is not a very good estimate. We’ll see later that this distribution (at
least, for large n) is close to normal, and Markov’s inequality doesn’t get close to the true
value for such “compact” distributions.
Example 4 (Markov’s Inequality is Tight). Consider a random variable X that takes the
24 1
value 0 with probability 25 and the value 1 with probability 25 . Then
1 1
E(X) = ·5= .
25 5
Let’s use Markov’s inequality to find a bound on the probability that X is at least 5:
E(X) 1/5 1
P (X ≥ 5) ≤ = = .
5 5 25
But this is exactly the probability that X = 5! We’ve found a probability distribution for
X and a positive real number k such that the bound given by Markov’s inequality is exact;
we say that Markov’s inequality is tight in the sense that in general, no better bound (using
only E(X)) is possible.
If P (X = 0) = 1 − k12 and P (X = k) = k12 (as the example above in which k = 5), Markov’s
inequality gives the best possible bound, but as we saw earlier, there are cases for which
Markov’s inequality provides a terrible bound. Fortunately, we can say a bit more about a
probability distribution if we know it’s variance σ 2 as well as it’s expected value.
Proposition 5 (Chebyshev’s Inequality). Let X be any random variable with finite expected
value and variance. Then for every positive real number a,
Var(X)
P (|X − E(X)| ≥ a) ≤ .
a2
3
There is a direct proof of this inequality in Grinstead and Snell (p. 305) but we can also
prove it using Markov’s inequality!
But notice that the event Y ≥ a2 is the same as |X − E(X)| ≥ a, so we conclude that
Var(X)
P (|X − E(X)| ≥ a) ≤ .
a2
Chebyshev’s inequality gives a bound on the probability that X is far from it’s expected
value. If we set a = kσ, where σ is the standard deviation, then the inequality takes the
form
Var(X) 1
P (|X − µ)| ≥ kσ) ≤ 2 2
= 2.
k σ k
Example 6. Suppose a fair coin is flipped 100 times. Find a bound on the probability that
the number of times the coin lands on heads is at least 60 or at most 40.
Let X be the number of times the coin lands on heads. We know X has a binomial distri-
bution with expected value 50 and variance 100 · 0.5 · 0.5 = 25. By Chebyshev, we have
25 1
P (X < 40 ∪ X > 60) = P (|X − µ| ≥ 10) ≤ 2 = .
10 4
P (X ≥ 16) = P (0 ≤ X ≤ 16)
= P (−8 ≤ X ≤ 16) (since X can’t be negative)
= P (|X − 4| ≥ 12)
Var(X)
≤
122
20 · 0.2 · 0.8
=
144
3.2 1
= =
144 45
4
This is a much better bound than given by Markov’s inequality, but still far from the actual
probability.
1
Exercise 8. A biased coin lands heads with probability 10 . This coin is flipped 200 times.
Use Markov’s inequality to give an upper bound on the probability that the coin lands heads
at least 120 times. Improve this bound using Chebyshev’s inequality.
1. Given an upper bound on the probability that a certain raccoon is at least 15 inches
tall.
2. The standard deviation this height distribution is 2 inches. Find a lower bound on the
probability that a certain raccoon is between 5 and 15 inches tall.
3. Now assume this distribution is normal. Use a normal CDF table to repeat the calcu-
lation from part (b). How close was your bound to the true probability?
Exercise 10. Like we did in Example 4 for Markov’s inequality, prove that Chebyshev’s
inequality is tight: find a probability distribution for X and a value a such that P (|X −
E(X)| ≥ a) = Var(X)
a2
. (Hint: This random variable will take only three values.)