0% found this document useful (0 votes)
72 views12 pages

Transition To MATH503

This document introduces some key concepts in mathematical statistics. It defines population parameters as functions of a population random variable X, such as the mean, variance, and moments. Sample statistics are defined as functions of random variables from a random sample from X's distribution. Examples include the sample mean, variance, sum, and order statistics. The field of inferential statistics aims to use sample statistics to learn about unknown population parameters. Unbiased estimators are sample statistics whose expected value equals the population parameter being estimated. The sample mean and variance are shown to be unbiased estimators of the population mean and variance, respectively.

Uploaded by

Funmath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views12 pages

Transition To MATH503

This document introduces some key concepts in mathematical statistics. It defines population parameters as functions of a population random variable X, such as the mean, variance, and moments. Sample statistics are defined as functions of random variables from a random sample from X's distribution. Examples include the sample mean, variance, sum, and order statistics. The field of inferential statistics aims to use sample statistics to learn about unknown population parameters. Unbiased estimators are sample statistics whose expected value equals the population parameter being estimated. The sample mean and variance are shown to be unbiased estimators of the population mean and variance, respectively.

Uploaded by

Funmath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Math 501: a transition to mathematical statistics

David Caraballo

1 Notational assumptions
Throughout these notes, unless otherwise stated X will be a random variable on a probability space ( ; F; P ) ; and having
a second moment (so that its variance is de…ned and …nite).
As usual, we will let = E (X) and 2 = V AR (X) ; so that = ST DDEV (X) :
We will consider X to be a population random variable, in that we will use it to model a population of interest.

2 Population parameters
De…nition 1 By a population parameter (or simply a parameter) we mean a random variable h (X) which is a function
of X (so that it is computed by using our population random variable X).

Example 2 The population mean = E (X) ; the population variance 2 = V AR (X) ; and the population standard
deviation = ST DDEV (X) are population parameters. They are de…ned and …nite throughout this section since we
assumed above that X has a second moment.

Example 3 For each positive integer k for which X k has …nite expectation, the kth moment of X; E X k ; is a population
parameter.
k
Example 4 For each positive integer k for which X k has …nite expectation, the kth central moment of X; E (X ) ; is
a population parameter.

3 Random samples and sample statistics


De…nition 5 A random sample from a population having the distribution of X is a collection of independent and
identically distributed (i.i.d.) random variables each having the same cdf as X:

In particular, if X1 ; X2 ; : : : ; Xn is a random sample from a population having the distribution of X; then

FXi = FX for each i;


E (Xi ) = for each i;
2
V AR (Xi ) = for each i:

Terminology Many authors use the terminology that X1 ; X2 ; : : : ; Xn are observations of a random sample of size n taken
from a population having the distribution of X:

De…nition 6 Whenever X1 ; X2 ; : : : ; Xn is a random sample from a population having the distribution of X; we de…ne the
sample sum (or sample total) Tn ; the sample mean X n ; and (if n > 1) the sample variance Sn2 as follows:

Tn = X1 + ::: + Xn ;
1 X1 + : : : + Xn
Xn = Tn = ;
n n
n
1 X 2
Sn2 = Xi X n :
n 1 i=1

De…nition 7 By a sample statistic (or simply a statistic) we mean a random variable s (X1 ; X2 ; : : : ; Xn ) which is a
function of the variables in a random sample (so that it is computed by using one or more elements of the random sample).

Formally, each statistic s (X1 ; X2 ; : : : ; Xn ) is a random variable on ( ; F; P ) which is itself a function of the random
variables X1 ; X2 ; : : : ; Xn on ( ; F; P ) : As such, it assigns a numerical value, s (X1 ; X2 ; : : : ; Xn ) (!) ; to each ! 2 :

Example 8 For each positive integer n; Tn and X n are sample statistics.

1
Example 9 For each integer n > 1; Sn2 is a sample statistic.

Example 10 For each positive integer n; the sample minimum,

min (X1 ; X2 ; : : : ; Xn ) ;

and the sample maximum,


max (X1 ; X2 ; : : : ; Xn ) ;
are sample statistics.

Example 11 For each integer k such that 1 k n; the kth order statistic X(k) is de…ned as follows:

X(k) (!) = the kth smallest number among fX1 (!) ; X2 (!) ; : : : ; Xn (!)g :

For each k as above, X(k) is a sample statistic. Using this notation, we can rewrite the minimum and maximum as follows:

min (X1 ; X2 ; : : : ; Xn ) = X(1) ;


max (X1 ; X2 ; : : : ; Xn ) = X(n) :

Example 12 For any positive integer n; the range,

range (X1 ; X2 ; : : : ; Xn ) = max (X1 ; X2 ; : : : ; Xn ) min (X1 ; X2 ; : : : ; Xn )


= X(n) X(1) ;

is a sample statistic.

Example 13 There are many other sample statistics, such as the lower quartile Q1 ; the upper quartile Q3 ; the in-
terquartile range (IQR) Q3 Q1 ; the median, percentiles, and so on.

4 The main idea of inferential statistics


Generally speaking, we are interested in knowing about population parameters.
The …eld of inferential statistics is concerned with learning about population parameters by using sample statistics.
There are many ways to use sample statistics to estimate population parameters. Some of these methods, results, and
notions apply to …xed sample sizes n: Others are stated in terms of limits as the sample size n goes to in…nity. We will
consider results of both types.
For the latter type of result, we will need to consider what it means for an in…nite sequence of random variables to
“converge” as n goes to in…nity.

5 Unbiased estimators
De…nition 14 A sample statistic s (X1 ; X2 ; : : : ; Xn ) is called an unbiased estimator for a population parameter provided

E (s (X1 ; X2 ; : : : ; Xn )) = for each n for which s (X1 ; X2 ; : : : ; Xn ) is de…ned.

Unbiased estimators are useful because they correctly estimate a given population parameter “on average.” Generally
speaking, among unbiased estimators for a given population parameter, it is desirable to have as small a variance as possible,
so as to increase the probability that the estimator will take a value close to its mean (which, since the estimator is unbiased,
is the true value of the population parameter of interest).

Theorem 15 For each positive integer n; the sample mean X n is an unbiased estimator for the population mean : I.e.,

E Xn = for each positive integer n:

Proof: For each positive integer n;


n
! n
! n n
1X 1 X 1X 1X
E Xn = E Xi = E Xi = E (Xi ) = = :
n i=1 n i=1
n i=1 n i=1

q.e.d.

2
Theorem 16 The sample variance Sn2 is an unbiased estimator for the population variance 2
: I.e.,

E Sn2 = 2
for each integer n > 1:

Remark 17 The n 1 term in the denominator of the formula for Sn2 is there precisely so that Sn2 will be an unbiased
estimator for 2 :

Proof: Suppose that n is an integer greater than one. We will begin by establishing three formulas, after which the proof
will be easy to complete.

E Xi2 = 2
+ 2
for each i = 1; 2; : : : ; n; (1)
2 2
E Xi X n = + =n for each i = 1; 2; : : : ; n; (2)
2 2 2
E Xn = + =n: (3)

Suppose i is an integer between 1 and n: Then


2 2
= V AR (Xi ) = E Xi2 E (Xi ) = E Xi2 2
;
2
and so the …rst formula (1) follows by adding to both sides. Also,
0 1
Xn
1
E Xi X n = E @Xi Xj A
n j=1
0 1
Xn
1 @
= E Xi Xj A
n j=1
0 1
1 @ 2 X
= E Xi + Xi Xj A
n
1 j n; j6=i
2 0 13
14 X
= E Xi2 + E @ Xi Xj A5
n
1 j n; j6=i
2 3
14 2 X
= + 2+ E (Xi Xj )5
n
1 j n; j6=i
2 3
14 2 X
= + 2+ E (Xi ) E (Xj )5
n
1 j n; j6=i
2 3
14 2 X
= + 2+ 5
n
1 j n; j6=i
1 2 2 2
= + + (n 1)
n
2 2
= + =n;

and so the second formula (2) follows.


Next,
2 2 2 2 2
=n = V AR X n = E X n E Xn = E Xn ;
2
and so the third formula (3) follows by adding to both sides.

3
We can now complete the proof.
n
!
1 X 2
E Sn2 = E Xi Xn
n 1 i=1
n
!
1 X 2
= E Xi Xn
n 1 i=1
n
X
1 2
= E Xi Xn
n 1 i=1
n
X
1 2
= E Xi2 2Xi X n + X n
n 1 i=1
n h
X i
1 2
= E Xi2 2E Xi X n + E X n :
n 1 i=1

We now use all three of the formulas (1), (2), and (3) to rewrite the expression in brackets. We get
n
X
1
E Sn2 = 2
+ 2
2 2
+ 2
=n + 2
+ 2
=n
n 1 i=1
Xn
1 n 1 2
=
n 1 i=1
n
n
1X 2 2
= = :
n i=1

q.e.d.

Example 18 To make the preceding results more concrete, it is helpful to consider a speci…c example. Suppose that X is a
random variable which takes each of the values 1; 5; and 6 with probability 1=3: X here is modeling random selection from the
set f1; 5; 6g : X is a simple random variable, so it has moments and central moments of all orders. We calculate
1 1 1
= E (X) = (1) + (5) + (6) = 4;
3 3 3
2 1 2 1 2 1 2 14
= (1 4) + (5 4) + (6 4) = :
3 3 3 3
Consider random sampling with replacement using sample size n = 2: There are 3 3 = 9 possible (distinct) samples of size
2: For each one, we will compute the sample mean and the sample variance. We will then average the sample means and the
sample variances (to …nd E X 2 and E S22 ). We will see that these averages are precisely and 2 :

sample, ! X 2 (!) S22 (!)


f1; 1g 1 0
f1; 5g 3 8
f1; 6g 3:5 12:5
f5; 1g 3 8
f5; 5g 5 0
f5; 6g 5:5 0:5
f6; 1g 3:5 12:5
f6; 5g 5:5 0:5
f6; 6g 6 0

Observe that the X 2 (!) values add up to 36 and hence average to 4; which is ; while the S22 values add up to 42 and hence
average to 14=3; which is 2 :

In the preceding example, note the importance of sampling with replacement (which is what is needed to have X1 and X2
be independent and identically distributed). If we had considered only the samples of size two without replacement, the S22
values would still add up to 42 but would average instead to 7 (there are just 6 distinct pairs without replacement), which
is not the value of 2 :

4
6 Consistent estimators and the Weak and Strong Laws of Large Numbers
De…nition 19 A sample statistic s (X1 ; X2 ; : : : ; Xn ) is called a consistent estimator for a population parameter provided

s (X1 ; X2 ; : : : ; Xn ) ! “in probability” as n ! 1;

by which we mean precisely that for every > 0 we have

lim P (f! 2 : js (X1 ; X2 ; : : : ; Xn ) (!) j < g) = 1:


n!1

In other words, with probability which converges to 1 as n ! 1; a consistent estimator s (X1 ; X2 ; : : : ; Xn ) with sample
size n is as close as we wish (within ; for any as small as we wish) to the parameter that it is estimating.
Whereas bias is about the behavior of s (X1 ; X2 ; : : : ; Xn ) for each n for which it is de…ned, consistency is about the
long-term behavior (as n ! 1) of the in…nite sequence

(s (X1 ) ; s (X1 ; X2 ) ; s (X1 ; X2 ; X3 ) ; s (X1 ; X2 ; X3 ; X4 ) ; : : :)

of estimators.
Unbiased estimators satisfy a given useful property for each …xed n for which the estimators are de…ned, but consistent
estimators satisfy a di¤erent, arguably more useful approximation property in the limit as n ! 1:
With an unbiased estimator, our estimator is correct “on average,”but it might still be incorrect with very high probability
for each n; no matter how large.
By contrast, consistent estimators do not need to be correct “on average” for each n but do need to be approximately
correct (within ; where can be taken as small as we wish) with probability that for each > 0 (as small as we wish)
converges to 1 as n ! 1: Provided we have the ability to take large random samples, in many respects consistency is a more
useful property than unbiasedness.
We have already seen that X n and Sn2 are unbiased estimators for and 2 ; respectively. It turns out that they are also
consistent estimators as well.

Theorem 20 (Weak Law of Large Numbers) The sample mean X n is a consistent estimator for the population mean
: I.e., for each positive integer n; for every > 0 we have

lim P !2 : X n (!) < = 1:


n!1

The Weak Law of Large Numbers is often stated more succinctly as “X n ! in probability as n ! 1:”

Theorem 21 The sample variance Sn2 is a consistent estimator for the population variance 2
: I.e., for every > 0 we have

lim P !2 : Sn2 (!) 2


< = 1:
n!1

The preceding theorem is often stated more succinctly as “Sn2 ! 2


in probability as n ! 1:”

Warning “convergence in probability”(de…ned above) and “convergence with probability 1”mean entirely di¤erent things.
The latter is another name for “strong convergence.”

The Weak Law of Large Numbers has the following improvement, which is called the Strong Law of Large Numbers
since the form of convergence involved is called strong convergence, almost sure convergence, or convergence with
probability 1 (which, as noted above, is distinct from convergence in probability). In general, strong convergence implies
convergence in probability, but not vice versa.

Theorem 22 (Strong Law of Large Numbers) X n ! strongly as n ! 1: I.e.,


n o
P ! 2 : lim X n (!) = = 1:
n!1

The Strong Law of Large Numbers is often stated as “X n ! almost surely as n ! 1:” The term “almost surely” is
often abbreviated as “a.s.,”and many authors choose to put this abbreviation above the arrow itself rather than to the right
of it.
Some authors restate the Strong Law of Large Numbers by saying that X n is a strongly consistent estimator for ;
the term “strongly” referring to the fact that the convergence is strong convergence rather than convergence in probability
(which is implied by strong convergence).

5
7 Sampling from normal populations
When our random sample is taken from a normal population –formally, when our population random variable X is normally
distributed –there is much more that we can say about X n and Sn2 ; even when n is small.
2
Theorem 23 Suppose that X N ; : Then
2
(1) For each positive integer n; X n is normally distributed with mean and variance =n: I.e.,
p 2
Xn N ; = n for each positive integer n:

(2) For each integer n > 1;


(n 1) Sn2 2
2
(n 1) :

(3) For each integer n > 1; the random variables X n and Sn2 are independent.

It is important to note that none of these conclusions may be assumed unless we suppose that X is normal (counterexamples
show that each of the conclusions can be false when X is not normal).
Conclusion (1) is particularly impressive in that it represents a major improvement over the Central Limit Theorem’s
conclusions in this case. When X is normal, each X n is exactly normally distributed no matter how small n is (even if n = 1
or n = 2).
By contrast, the Central Limit Theorem ensuresp that FWn (a) ! (a) for each real a as n ! 1; where Wn is the
standardization of X n (Wn = X n = ( = n)) and is the cdf of a N 0; 12 random variable. Consequently, the
Central Limit Theorem allows us to deduce that X n is approximately normally distributed for large enough values of n
(which is a weaker conclusion than X n being exactly normally distributed for each n).
Of course, when n = 1 we have X 1 = X1 : Since X1 and X are identically distributed, we have X1 N ; 2 ; and hence
X 1 N ; 2 ; so that conclusion (1) holds trivially when n = 1:

8 Introduction to con…dence intervals: estimating an unknown population


mean when 2 is known
There are many important problems in inferential statistics. One of the most important, and most elementary, is that of
estimating an unknown population mean ; using a sample mean X n ; when 2 is known, …nite, and positive. A di¤erent
procedure is needed when 2 is unknown since the relevant statistic in that case will not be normally distributed, as it will
be in the situation under consideration.
Under the conditions of the Central Limit Theorem (either the Xi s are i.i.d. and n 30; or the Xi s arise from selecting
a random sample of size n without replacement from a population having size N; n = N 0:05; and n 30), we will be able
to suppose that X n is approximately normally distributed with mean and variance 2 =n:
The standard deviation of X n is also called the standard error of X n ; and we may write

SE X n = p :
n
p 2
Suppose that Y N ; ( = n) ; let
E = 1:96 p ;
n
and let
J =( E; + E) :
Then
P Xn 2 J is approximately equal to P (fY 2 Jg) ;
which equals

P (fY 2 ( E; + E)g)

= P 1:96 p < Y < + 1:96 p


n n
= P (f 1:96 < Z < 1:96g) ;

6
where Z N 0; 12 ; since
p
( 1:96 = n)
z-score of 1:96 p is p = 1:96
n = n
and p
( + 1:96 = n)
z-score of + 1:96 p is p = 1:96:
n = n
We then calculate Z 1:96
1 (1=2)z 2
P (f 1:96 < Z < 1:96g) = p e dz = 0:9500042097:::;
2 1:96

which is very close to 0:95:


Thus, P X n 2 J is approximately 0:95: This guarantees that roughly 95% of all samples of size n (the percentage
converging to 95:00042097:::% as n ! 1) will yield a sample mean X n taking a value in the interval J: If and 2 are
known, we therefore have good probabilistic information about X n : p
Note that the interval J is centered at and gets narrower and narrower as n gets larger, since E = 1:96 = n ! 0 as
n ! 1 (since is known and …xed).
Thus, for large sample sizes n we are assured, with fairly high probability, of obtaining a random sample whose sample
mean is close to the population mean that we wish to estimate. This is important since we often have a single sample.
Our one random sample could be terrible – i.e., it could result in an X n which is quite far from – even if we meet the
conditions of the Central
p Limit Theorem, but about 95% of the samples of size n (for large enough n) will result in an X n
within E = 1:96 = n units of ; and so the probability that our one sample of size n; chosen at random from among all such
samples, is one of these “good” ones is about 0.95. This is extremely useful information, and it explains how large sample
sizes can help us estimate unknown parameters.
Let us revisit the result above that P X n 2 J is approximately 0:95: How can we use this information to explicitly
estimate when is unknown? We will carry out an extremely important algebraic manipulation which will give us what we
need almost immediately. The idea behind it is simple: for any two real numbers a and b; and for any positive real number
E; a is within E units of b if and only if b is within E units of a:
Let
I95 = X n E; X n + E = Xn 1:96 p ; X n + 1:96 p :
n n
We now have

Xn 2 J = !2 : X n (!) 2 J
= !2 : X n (!) is within E units of
= !2 : is within E units of X n (!)
= f! 2 : 2 I95 g
= fI95 3 g :

I wrote it this way (with I95 on the left, where the random variables normally go, even though writing it as f 2 I95 g
would also have been correct / equivalent) for an important reason: to emphasize the fact that the random variables are part
of I95 (which depends on X n ; a random variable), NOT a part of ; which is a parameter and which is a …xed constant. The
reason that this distinction is important will become apparent below (see the last example of this section).
Because the events X n 2 J and fI95 3 g are equal, their probabilities are equal. Thus, P (fI95 3 g) is approximately
0:95: This means that about 95% of all samples of size n (the percentage converging to 9500042097:::% as n ! 1) will be
such that the interval I95 (de…ned using the value of X n ) contains :
Each sample of size n results in an X n ; which in turn results in an interval I95 ; which may or may not contain : About
95% of the samples of size n (the percentage converging to 9500042097:::% as n ! 1) will result in I95 which contains
(which means that our X n is within E units of the true mean ), while about 5% of the samples of size n (the percentage
converging to (100 95:00042097:::) % as n ! 1) will result in I95 which does not contain (which means that our X n is
not within E units of the true mean ).
For the moment (this is not standard terminology), let us call our random sample “good”if the corresponding I95 contains
; and let us call it “bad” if the corresponding I95 does not contain : Our I95 has been constructed in such a way as to
ensure that about 95% of our samples of size n (the percentage converging to 9500042097:::% as n ! 1) will be “good.”
Since we typically have just one random sample of size n; assuming all samples of size n were a priori equally likely to
be selected (which requires great care in our choice of sampling methodology), there is about a 95% chance that our sample
will be a good one, and there is about a 5% chance that our sample will be a bad one.

7
We call I95 a 95% con…dence interval for (even though the percentage, for any given …nite n; even a large one, may
be unequal to 95%), and we call E the margin of error at the 95% con…dence level. We say that we may be 95%
con…dent that is in the interval I95 :
Terminology Regrettably, many authors use the less precise term maximum error instead of “margin of error,” which I
prefer greatly. Why is the term “maximum error” somewhat inaccurate? About 5% of the samples of size n will be
such that the interval I95 does not contain ; and for each such sample the error X n is greater than E (potentially
much, much greater than E), and so it makes little sense and is perhaps quite misleading to call E the maximum error.
Example 24 A random sample of size 80 is selected from a population of size 5000 having unknown mean but having
known variance 2 = 49: The sample mean X 80 is computed and equals 104: Find a 95% con…dence interval for : Find the
margin of error.
Solution: We have n = N = 80=5000 0:05; and we have n 30: Our sample is a random sample. We may therefore use the
Central Limit Theorem to deduce that X 80 is approximately normally distributed with mean and variance 2 =n = 49=80:
Our margin of error at the 95% con…dence level is
7
E = 1:96 p = 1:96 p = 1:533942632:::
n 80
and so our 95% con…dence interval is
I95 = Xn E; X n + E
7 7
= 104 1:96 p ; 104 + 1:96 p (exactly)
80 80
= (102:466057367:::; 105:533942632:::) (approximately).
Example 25 Suppose our population is as in the preceding example, and suppose that our sample size is 80; as in that
example. Suppose that our …rst sample was as indicated in that example as well, but now suppose we select another random
0
sample of size 80 and get X n = 90 (which is entirely possible). The prime symbol here is meant simply to distinguish results
from this sample from those of our …rst sample, for which X n = 104; notice that E itself depends on and on n but not on
our sample). This would give the 95% con…dence interval
0 0 0
I95 = Xn E; X n + E
7 7
= 90 1:96 p ; 90 + 1:96 p (exactly)
80 80
= (88:466057367:::; 91:533942632:::) (approximately).
Observe that these con…dence intervals do not overlap. That can very easily occur. We will now see exactly why it is so
important to avoid writing something like
P (f 2 I95 g) = 0:95
in general (which, unfortunately, many people do). Let us do that for our two samples (with I95 for the …rst sample and with
0
I95 for the second sample) to see what happens. We get
P (f 2 (102:466057367:::; 105:533942632:::)g) = 0:95;
P (f 2 (88:466057367:::; 91:533942632:::)g) = 0:95:
I.e.,
P (f 2 I95 g) = 0:95;
0
P (f 2 I95 g) = 0:95:
0
Since I95 and I95 are disjoint, our partition formulas would then give
0
P (f 2 I95 [ I95 g) = 0:95 + 0:95 = 1:9 (!),
which makes no sense at all. Where is the ‡aw? The ‡aw is that is not the random variable we have been studying (and
0
does not even depend on our sample in any way); X 80 is our random variable, and I95 and I95 are our intervals depending
on it. What is true is that the probability is about 0:95 that our random sample of size 80 is one of the ones which yields a
sample mean within E = 1:533942632::: units of ; equivalently, the probability is about 0:95 that our random sample of size
80 is one of the ones which yields a 95% con…dence interval (with margin of error E = 1:533942632:::) which contains :

8
9 Sample size considerations for the Central Limit Theorem
We saw above that larger sample sizes n result (for …xed ) in smaller margins of error E; when estimating an unknown
from a population with a known …nite positive 2 : Thus, larger sample sizes result in narrower, hence better, con…dence
intervals, for the same con…dence level, 0.95.
Is it always better to use larger samples? For sampling with replacement, or for theoretical problems where one is choosing
independent and identically distributed random variables, the answer is yes. However, for practical problems where one is
sampling from a …nite population having size N; the answer is no. The issue p is that, on the one hand, we want n to be large
(n 30 minimally, and ideally n would be even larger so that E = 1:96 = n will be very small, the smaller the better). On
the other hand, we need n = N to be small as well, and selecting larger samples will make this ratio larger, which makes our
assumptions that the Xi s are independent and identically distributed less reasonable, which makes using the Central Limit
Theorem increasingly invalid. The X n s might be quite far from normally distributed when the hypotheses of the Central
Limit Theorem, such as the n = N 0:05 condition, are not satis…ed, and thus our con…dence interval and margin of error –
both computed under assumptions of normality –will be unreliable.
So, in general, we would like n to be as large as possible, consistent with the requirement that it be small enough relative
to N: What if N < 600? In this case, it is mathematically impossible to choose n 30 such that n = N 0:05; for doing so
would require
n 30
N = 600:
0:05 0:05
In such a situation, you could consider using a larger population (to increase N ). You could also consider trying to …nd the
exact probability distribution of X n ; instead of using the Central Limit Theorem to approximate it. There are various other
methods as well.

10 Introduction to population proportions


Throughout this section, suppose that p 2 (0; 1) but that its exact value is unknown. Suppose that we have a population
strati…ed into two groups, without loss of generality called successes and failures, such that p is the proportion of successes
and 1 p is the proportion of failures.
De…ne a random variable X as follows: X = 1 if our outcome is a success, and X = 0 if our outcome is a failure. Then
X is a simple random variable, with
p; if X = 1
pX (x) = :
1 p; if X = 0
It follows1 that X B (1; p) ; and so

= E (X) = np = 1 p = p;
2
= np (1 p) = (1) (p) (1 p) = p (1 p) :

Let X1 ; X2 ; : : : ; Xn be independent and identically distributed random variables with the distribution of X: These repre-
sent observations from a random sample of size n (chosen with replacement).
Let
Tn = X1 + X2 + : : : + Xn
for some (arbitrary) positive integer n: Since the Xi s are independent B (1; p) random variables, it follows that Tn B (n; p) ;
and hence

E (Tn ) = np;
V AR (Tn ) = np (1 p) :

Each Xi is 1 or 0 depending on whether the ith element of the sample is a success or a failure, respectively, and thus Tn
denotes the number of successes in the sample.
It follows that
1
X n = Tn
n
1 Alternatively, we can observe that X is simple and so it has moments and central moments of all orders, and we can calculate its mean and

variance directly from the pmf as follows:


= E (X) = (1) (p) + (0) (1 p) = p;
2
2
= V AR (X) = E (X ) = (1 p)2 (p) + (0 p)2 (1 p) = p (1 p) :

9
is the proportion of successes in the sample. It is customary to use the notation pbn for this sample proportion. Thus,
1
pbn = X n = Tn :
n
We now calculate
1 1 1
E (b
pn ) = E Tn = E (Tn ) = (np) = p;
n n n
1 1 1 p (1 p)
V AR (b
pn ) = V AR Tn = 2 V AR (Tn ) = 2 np (1 p) = :
n n n n

Because pbn = X n ; which is a mean of independent and identically distributed random variables, pbn will be approximately
normally distributed for large enough values of n: Commonly used “cuto¤”conditions for n being “large enough”are that n
should satisfy

np 10 and n (1 p) 10 if p is known,
nb
pn 10 and n (1 pbn ) 10 if p is unknown.

Provided our sampling is done without replacement (as is the case in practice most of the time), we must also have
n = N 0:05:
Thus, for large enough values of n; provided the Xi s are independent and identically distributed (as with random sampling
with replacement) or provided random sampling without replacement is used and also n = N 0:05; pbn will be approximately
normally distributed with mean p and with variance p (1 p) =n:
Because
E (b
pn ) = p for each positive integer n;
pbn is an unbiased estimator for p:
Once again, for large values of n the quantity p (1 p) =n will be small (converging to 0 as n ! 1), and so “most”
samples of size n will yield a pbn close to p; ensuring that the probability that pbn will be very close to (its mean) p will be
very high.
We may use pbn to estimate an unknown population proportion p by following precisely the same procedure given above
for means.
Here, the standard deviation of pbn depends on p; which is unknown, so we need to approximate it by pbn : Thus, we de…ne
our margin of error as follows: r
pbn (1 pbn )
E = 1:96 :
n
Our 95% con…dence interval for the unknown p is then

I95 = (b
p E; pb + E) :

As with means, about 95% of the samples of size n will result in a pb within E units of p (equivalently, will result in an I95
which contains p).
It is common to want to know how large a sample size is needed to ensure that the margin of error will be no more than,
say 3%. Our margin of error formula above depends on pbn ; which will not be known until the sample is obtained (which
requires us to …nd n …rst!).
Fortunately, calculus can help. The function
f (x) = x (1 x)
attains its absolute maximum value, 1=4; when x = 1=2: This is easy to demonstrate with di¤erential calculus (or even with
algebra, after completing the square).
Therefore, it is always the case that
r r
pbn (1 pbn ) 1=4
E = 1:96 1:96 :
n n
Therefore, if we would like to ensure that, no matter what, E will not exceed 0:03; we simply choose n large enough to ensure
that r
1=4
1:96 0:03:
n

10
Since both quantities are positive, this inequality is true if and only if

2 1=4 2
(1:96) (0:03) ;
n
which is true if and only if
2
(1:96) (1=4)
n 2 = 1067:1:
(0:03)
Since n must be an integer, we require n 1068 (since n = 1067 does not satisfy the requisite inequality).
Choosing n 1068 will ensure that
r r
pbn (1 pbn ) 1=4
E = 1:96 1:96 0:03;
n n
no matter what pbn turns out to be later.

Example 26 An election between two candidates is going to be held in a city in which there are 120,000 eligible voters. A
random sample of 284 eligible voters is obtained, and it is found that 150 of the respondents favor the …rst candidate, Amy,
while 134 of the respondents favor the second candidate, Jack. Find a 95% con…dence interval for the proportion p of eligible
voters who favor Amy. What is the margin of error? Why is a smaller margin of error desirable? How large a sample size
would be required to ensure that the margin of error does not exceed 2% no matter what?

Solution: We …rst check the hypotheses. We have a random sample without replacement, so we need to ensure that
n = N 0:05: Here, N is the number of eligible voters (the population of interest for this study), which is 120,000, which is
certainly at least 20 times as large as our sample size, so n is not too large.
Next, we compute pb = 150=284 = 0:528169014:::; and we check that
150
nb
p = 284 = 150 10;
284
134
n (1 pb) = 284 = 134 10;
284
so n is large enough.
We may now use the procedure we carefully derived above. We set
r r
pbn (1 pbn ) (150=284) (1 (150=284))
E = 1:96 = 1:96 = 0:058059940:::;
n 284
which yields the 95% con…dence interval

I95 = (b
p E; pb + E)
= (0:470109073:::; 0:586228955:::) (approximately).

This con…dence interval is a bit problematic since part of it is below 0:5 and part is above 0:5; yet, perhaps more than
anything we would like to know the probability that p will be greater than 0:5 (in which case Amy wins) or that p will be less
than 0:5 (in which case Jack wins). Our margin of error is too large to help us make this determination with high probability.
Since our pb is about 0:528; something like a 2% margin of error would be more desirable.
Since we do not know what the pb will really be once we select a larger sample size, in order to estimate the smallest
sample size required to ensure that the margin of error E will certainly not exceed 0:02; we use the inequality (derived above)
r r
pbn (1 pbn ) 1=4
E = 1:96 1:96 ;
n n
and we select n to be large enough so as to ensure that the right side of this inequality does not exceed 0:02; which will also
ensure that E does not exceed 0:02:
Since both quantities are positive, this inequality is true if and only if

2 1=4 2
(1:96) (0:02) ;
n
which is true if and only if
2
(1:96) (1=4)
n 2 = 2401 (exactly).
(0:02)

11
Since this is exact, we do not need to round up.
Choosing n 2401 will ensure that
r r
pbn (1 pbn ) 1=4
E = 1:96 1:96 0:02;
n n
no matter what pbn turns out to be later. It is also still small enough that it satis…es the condition n = N 0:05:
If our new sample of size 2401 (or larger) yields a value of pb of around 0:52 or higher, then we’ll be quite con…dent that
Amy will win, since the entire 95% con…dence interval will be above 0:5: However, it could very well happen that our new
sample will yield a result like pb = 0:51; and then our margin of error (our E; using pb = 0:51; will be 0:019995999:::; which is
just barely under 2%) would be too large to allow us to predict the winner with any great degree of certainty.

12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy