A Comparison of Bootstrap Methods For Variance Estimation
A Comparison of Bootstrap Methods For Variance Estimation
Variance Estimation
Research Report
Centre of Biostochastics
Abstract
This paper presents a comparison of the nonparametric and parametric
bootstrap methods, when the statistic of interest is the sample vari-
ance estimator. Conditions when the nonparametric bootstrap method
of variance performs better than the parametric bootstrap method are
described.
1
E-mail address to the correspondence author: saeid.amiri@et.slu.se
1 Introduction
There has been much theoretical and empirical research on properties of the
bootstrap method and it has become a standard tool in statistical analysis.
The idea behind the bootstrap method is that if the sample distribution is
a good approximation of the population distribution, the sampling distribu-
tion of interest can be used to generate a large number of new samples from
the original sample via sampling with replacement. Bootstrapping treats the
sample as the actual population.
The most important property of the bootstrap method is the ability to es-
timate the standard error of any well-defined function of the random variables
corresponding to the sample data. Applying the bootstrap method requires
fewer assumptions than are needed for conventional methods. There are many
books and papers on the bootstrap method and its applications in a variety of
fields, see e.g. Hall (1992), Efron and Tibshirani (1993), Shao and Tu (1996),
Davison and Hinkley (1997), Mackinnon (2002), Janssen and Pauls (2003) and
Athreya and Lahiri (2006). Many use the bootstrap method without focusing
on the theory. However by considering theoretical aspects, it is possible to
understand the mechanism behind the simulations.
In this paper we present a finding that helps explain the different in perfor-
mance of the nonparametric and parametric bootstrap methods. The statistic
of interest is the variance. The commonly applied nonparametric bootstrap
resamples the observations from an original sample, whereas the parametric
bootstrap method generates bootstrap observations by a given parametric dis-
tribution. If justification cannot be provided for the use of a specific parametric
distribution, then a nonparametric bootstrap can be used. This is discussed
in the rest of this paper.
The nonparametric and parametric methods are simultaneously considered
in some studies, but often the results of the simulations are given without ex-
plicit discussions of their different performances. For example, Efron and
Tibshirani (1993) discuss the nonparametric and parametric bootstrap con-
fidence intervals of variance by using an example, Ostaszewski and Rempala
(2000) explain how to use the bootstrap methods within the actuary sciences
and Lee (1994) explains how to use a tuning parameter to find more accurate
estimation.
It is difficult to study the comparison in general but it is possible for
the main parameters such as the mean and variance. Here, We will use a
heuristic criterion to compare the bootstrap method with the real distribu-
tion. Although the bootstrap method is based on the sample, it is intended
to approach the real distribution. According to Hall (1992), the bootstrap
1
method may be expressed as an expectation conditional on the sample or as
an integral with respect to the sample distribution function. This allows us
to make direct comparisons of the nonparametric and parametric bootstrap
method and to draw conclusions from these comparisons.
We can show that the behavior of the nonparametric and parametric boot-
strap methods of variance estimation is affected by the kurtosis, which is ex-
plained in Theorem 1. This can be expected because the variance of variance
depends on the fourth moment. Distributions are usually classified by how
flat-topped the distribution is relative to the normal distribution. This can
be done via the sample kurtosis. It should be mentioned that there is not
a universal agreement about what the kurtosis is, see Darlington (1970) and
Joanes and Gill (1998). In the case of variance estimation, we show that the
bootstrap estimation depends on the kurtosis. However there is no difference
between the parametric and nonparametric bootstrap method in the case of
mean estimation. In the case of variance estimation, we show that the non-
parametric bootstrap method can be better than the parametric bootstrap
under some conditions, regardless whether the real distribution and the dis-
tribution of the parametric bootstrap method belong to the same distribution
family.
In Section 2, we briefly outline the bootstrap approaches. In Section 3,
the main results are presented. In Section 4, the theoretical discussion is
illustrated using some examples.
2 Bootstrap method
Let us look at the bootstrap stages, which can be formulated as below:
1. Suppose X = (X1 , · · · , Xn ) is an i.i.d. random sample of the distribution
F . Assume V (X) = σ 2 and EX 4 < ∞.
2. We are interested in θ(F ) = σ 2 and consider plug-in estimation: θb =
2 ,
θ(X1 , · · · , Xn ) = θ(Fn ) = SX
n
2 1X 2
SX = Xj − (X)2 , (1)
n
j=1
3. Generate the bootstrap samples. This can be done in two different ways,
the nonparametric and parametric bootstrap, with the symbols ”∗” and
”#” used to distinguish the approaches.
2
iid
(i) The nonparametric bootstrap method: Xij∗ ∼ Fn (x), i = 1, . . . , B,
j = 1, . . . , n. Note that if Z ∼ Fn (x) then EZ = X and
V (Z) = SX 2 , where S 2 is the second centered moment estimator.
X
The kurtosis of Fn is defined as:
Pn 4
j=1 (Xj − X) /n
KFn = ³ P ´2 . (2)
n 2 /n
j=1 (X j − X)
iid
(ii) The parametric bootstrap method: Xij# ∼ Gλb , i = 1, . . . , B, j =
1, . . . , n where Gλb = G(.|X ) is an element of a class {Gλ , λ ∈ Λ} of
distributions. The parameter λ is estimated by statistical methods.
We also have E(X # ) = X and V (X # ) = SX 2 . In this case, the
EX (X − X)4
KG(.|X ) = ,
(EX (X − X)2 )2
S 2 (Xi× ) = S 2 (Xi1
× ×
, . . . , Xin ) i = 1, . . . , B,
1 X³ 2 ×
B ´2 PB ¡S 2 (X × )¢2
V ×
= S (Xi ) − S 2×
= i=1 i
− (S 2× )2 . (4)
B B
i=1
3
It is difficult to offer a specific guideline to compare the variances explicitly.
Here e is a proposed criterion in order to compare the ratio of the conditional
expectation of V ∗ and V # , i.e.
E(V ∗ |X )
e= . (5)
E(V # |X )
3.1 Biasedness
The following theorem clarifies the properties of the nonparametric and para-
metric bootstrap estimators of variance. It can be shown explicitly how boot-
strapping is affected by the kurtosis.
iid
Theorem 1 Let X = (X1 , · · · , Xn ) ∼ F with EX 4 < ∞. Then for the
bootstrap methods in (i) and (ii), presented in the previous section:
n−1 2
E(S 2∗ |X ) = E(S 2# |X ) = SX , (6)
n
KFn < KG(.|X ) ⇐⇒ E(V ∗ |X ) < E(V # |X ), (7)
where KFn and KG(.|X ) are the sample kurtosis and the kurtosis corresponding
to the parametric distribution Gλ̂ used in (ii).
¡ ¢ ³1 Xn ´
×
E S 2 (Xi× )|X = EX Xij×2 − (X i )2
n
j=1
n − 1³ ¡ ¢2 ´
= EX (X ×2 ) − EX (X × ) ,
n
4
and according to (3)
B
¡1 X ¢ ¡
EX (S 2× ) = EX S 2 (Xi× ) = EX S 2 (Xi× )). (8)
B
i=1
Therefore
n−1 2
E(S 2∗ |X ) = E(S 2# |X ) =
SX ,
n
and (7) is verified. The conditional expectation of V × is given by:
B
1 X
EX (V × ) = EX (S 2 (Xi× )2 ) − EX ((S 2× )2 )
B
i=1
£1 B−1 ¤
= EX (S 2 (Xi× )2 ) − EX (S 2 (Xi× )2 ) + EX ((S 2× )2
B B
B − 1£ ¡ ¢2 ¤
= EX (S 2 (Xi× )2 ) − EX (S 2× ) , (9)
B
where
EX (S 2 (Xi× )2 )
n n
1 £ X ×2 2 2 ×4 ×2 X ×2
= EX ( ( X j ) + n X − 2nX Xj ])
n2
j=1 j=1
1£
= (n − 1)2 nEX (X ×4 ) + (4 − 4n)n(n − 1)EX (X ×3 )EX (X × )
n4
+ n(n − 1)(n2 + 3 − 2n)(EX (X ×2 ))2
µ ¶ µ ¶
n ×2 × 2 n
+ 3(12 − 4n) EX (X )(EX (X )) + 24 (EX (X × ))4 ]
3 4
n−1 £ ¡ ¡ ¢4 ¢
= ( 3 ) (n − 1) EX X × − EX (X × )
n
¡ ¢2
+ (n2 − 2n + 3) EX (X × − EX (X × ))2 ]. (10)
By using (8) and (10),
B − 1¡ ¢
EX (V × ) = EX (S 2 (Xi× )2 ) − (EX (S 2× ))2
B
B−1 n−1 £ ¡ ¡ ¢4 ¢
= ( )( 3 ) (n − 1) EX X × − EX (X × )
B n
¡ ¢2
− (n − 3) EX (X × − EX (X × ))2 ]
¡ B − 1 ¢¡ n − 1 ¢¡ × × 2 2
¢ ¡ ×
¢
= E X (X − E X (X )) (n − 1)K − (n − 3)
B n3
¡ B − 1 ¢¡ n − 1 ¢¡ 2 ¢2 ¡ ¢
= 3
SX (n − 1)K × − (n − 3) , (11)
B n
5
where K × can be either KFn or KG(.|X ) . Thus the difference between EX (V ∗ )
and EX (V # ) depends on the kurtosis. The ratio of the conditional expectation
of V ∗ and V # by using (11) equals:
¡
2 )2 (n − 1)K
¢
E(V ∗ |X ) (SX Fn − (n − 3)
e = = ¡ ¢2 ¡ ¢, (12)
E(V # |X ) S2 (n − 1)KG(.|X ) − (n − 3)
X
¤
From relation (7), it follows that the unconditional expectation of the
bootstrap parametric and nonparametric estimator equal
µ ¶ µ ¶
2∗ 2# n−1 2 n−1 2 2
E(S ) = E(S )= E(SX ) = σ . (14)
n n
Thus
¡n − 1¢ B − 1 4
EX (V # ) = 2 ( )SX . (15)
n2 B
Hence in the case of the normal distribution, if KFn < 3 holds, then EX (V ∗ )
is less than EX (V # ). Hence the replications of the nonparametric bootstrap
6
Figure 1: Percentage of times KFn < 3 happen in the simulation as a function
of sample size.
7
iid
Theorem 2 Let X = (X1 , · · · , Xn ) ∼ F with EX 4 < ∞. Then for the
bootstrap methods in (i) and (ii) in Section 1 where KG(.|X ) is assumed to
be independent of observations, the following relations hold for V ∗ and V # ,
which are defined in (4):
µ
∗ B − 1 (n − 1)2 4
E(V ) = σ K(n − 1)(n2 − 4n + 6)
B n6
¶
3 2
+ (−n + 11n − 24n + 18) , (17)
µ ¶
B − 1 (n − 1)2 4
E(V # ) = σ K(n − 1) + (n2 − 2n + 3)
B n6
µ ¶
(n − 1)KG(.|X ) − (n − 3)) , (18)
The proof of theorem is completed by inserting these equations into the cor-
responding terms in (11). ¤
This theorem states that E(V ∗ ) depends on K, whereas E(V # ) depends on
K and KG(.|X ) . It should be noted that if KG(.|X ) depends on the observations,
for example the lognormal distribution, then it is impossible to present a closed
form solution. Hence in this case, study of the performance of the parametric
bootstrap is rather difficult. However for the nonparametric bootstrap, (17)
always holds. It is obvious that the methods are biased. In the case of the
normal distribution, the following corollary is a direct result of Theorem 2.
iid
Corollary 1 If X = (X1 , · · · , Xn ) ∼ F = N (µ, σ 2 ) and also if G(.|X ) =
2 ), then the following relations hold:
N (X̄, SX
Bn3 ∗ Bn3
E(V ) = E(V # ) = V (SX
2
).(22)
(B − 1)(n − 1)(n2 − 2n + 3) (B − 1)(n2 − 1)n
8
# E(V ∗ )
Figure 2: Plots of E(V
V (SX
)
2 ) and 2 )
V (SX
versus kurtosis, for the sample size of 10
and 30, respectively.
9
E(V # ) E(V ∗ )
Figure 3: Plot of 2 )
V (SX
and 2 )
V (SX
versus kurtosis, for sample size of 10 and
2 ).
30, where G(.|X ) ∼ N (X, SX
1 ¡ −|x − µ|p ¢
fX (x) = exp p > 0, x²R, (23)
2p1/p Γ(1 + 1/p)σp pσpp
where µ and σp are location and scale parameters and p is a shape parameter.
The following relations hold:
Γ(1/p)Γ(5/p)
µ = E(X), σp = E(|X − µ|p )1/p , K= ¡ ¢2 .
Γ(3/p)
10
Figure 4: Violin plot of the nonparametric and parametric bootstrap of EPD.
The violin plot is a combination of a box plot and a kernel density plot. It
helps to study the results of the simulations. The violin plot is a combination
of a box plot and a kernel density plot, see Hintze and Nelson (1998).
3.2 MSE
The variability of estimation can also be assessed by its MSE, defined as
b = V (θ)
M SE(θ) b + (Bias)2 . (24)
2 2 1 4
M SE(SX ) = V (SX )+ σ . (25)
n2
In the following, first the conditional M SE, which is the direct result of the
bootstrap method, is first discussed. Actually the conditional MSE is the
direct result of the bootstrap method and the unconditional is a combination of
the bootstrap and the frequentist approaches. The following lemma discusses
the bootstrap estimation of S 2 .
iid
Lemma 1 Let X = (X1 , · · · , Xn ) ∼ F with E(X 4 ) < ∞. Then for the
bootstrap methods in (i) and (ii) in section 2:
11
Proof: Because of independency of the conditional Si2× , the following equa-
tion holds,
1
V (S 2× |X ) = V (Si2× |X ).
B
For B −→ ∞, This tends to zero and therefore M SE(S 2× |X ) converges to
the squared biasedness which is given in (26). ¤
iid
Lemma 2 If X = (X1 , · · · , Xn ) ∼ F with E(X 4 ) < ∞. Then for the boot-
strap methods explained in (i) and (ii) in Section 2:
µ ¶2 µ ¶2
n−1 1 − 2n
lim M SE(S 2∗ ) = lim M SE(S 2# ) = 2
V (SX )+ σ4 .
B−→∞ B−→∞ n n2
(27)
12
Proof: It holds that:
µ B ¶
× 1 X 2× 2× 2
V (V |X ) = V (S (Xi ) − S ) |X
B
i=1
B−1
= ((B − 1)E(S 2× (Xi ) − S 2× |X )
B3
−(B − 3)(E(S 2× (Xi ) − S 2× |X )2 )
¤
This lemma shows that the conditional M SE(V × ) is affected by the kur-
tosis via E(V × |X ) which is expected by the discussion of Theorem 1. The
following lemma is necessary for the discussion of M SE(V × ).
iid
Lemma 4 Let X = (X1 , · · · , Xn ) ∼ F and E(X 8 ) < ∞.
µ ¶2 µ ¶
n−1
V (Vb (SX
2
)) = 2
n V (b 2
µ2 ) + O(n−4 ),
µ4 ) + n V (b2
(31)
n3
where Vb (SX
2 ) is the estimate of (16) and
Proof: Let
n−1
Vb (SX
2
)= ((n − 1)b µ22 ),
µ4 − (n − 3)b
n3
where µ
b2 and µb4 are the estimators of the second and fourth centrad moments.
Then it can be shown that
µ ¶ µ
b 2 n−1 2
V (V (SX )) = (n − 1)2 V (b
µ4 ) + (n − 3)2 V (b
µ22 )
n3
¶
− 2(n − 1)(n − 3)Cov(b b22 ) ,
µ4 , µ
13
where
1 n−1 ¡ ¢
Cov(b b22 ) =
µ4 , µ µ4 ) + 2 2 Cov (X1 − X)4 , (X2 − X)2 (X3 − X)2
V (b
n n
(n − 1)(n − 2) ¡ ¢
+ 2
Cov (X1 − X)4 , (X1 − X)2 (X2 − X)2 .
n
Moreover, It can be shown by some algebra that:
¡ ¢
Cov (X1 − X)4 , (X1 − X)2 (X2 − X)2
1
= (µ6 µ2 − µ4 µ22 ) + (21µ4 µ22 − 7µ6 µ2 − 6µ42 ) + O(n−2 )
¡ n ¢
Cov (X1 − X)4 , (X2 − X)2 (X3 − X)2
1
= 2 (23µ22 µ4 − 85µ42 + 2µ6 µ2 ) + O(n−3 )
n
Therefore by using these relations the following result is obtained
µ ¶2 µ ¶
n − 1
V (Vb (SX )) =
2 2
n V (b 2
µ2 ) + O(n−4 ).
µ4 ) + n V ar(b2
(33)
n3
¤
The next theorem discusses M SE(V ×) in general.
iid
Theorem 3 Let X = (X1 , · · · , Xn ) ∼ F and E(X 8 ) < ∞. If KG(.|X ) is
independent of the observations, then by using the bootstrap estimation given
in (i) and (ii) in Section 2
µ ¶2
B−1
∗
M SE(V ) = V (Vb (SX
2
)) + (E(V ∗ ) − V (SX
2 2
)) , (34)
B
µ ¶2 µ ¶2
# B−1 n−1
M SE(V ) = ((n − 1)KG(.|X ) − (n − 3))2 V (SX
4
)
B n3
+ (E(V ∗ ) − V (SX
2 2
)) , (35)
where E(V ∗ ) and E(V # ) are given in Theorem 2 and V (Vb (SX
2 )) can be found
by Lemma 4.
Proof: The proof can be obtained directly by using the definition of MSE,
Lemma 3 and Lemma 4. ¤
×
This theorem can be used to find M SE(V ) for the nonparametric and
parametric bootstrap. The next corollary is an application of this theorem for
the normal distribution.
14
Table 1: Data used to study the simulation of variance
Variable KFn Data
x 2.59 48 36 20 29 42 42 20 42 22 41 45 14 6
0 33 28 34 4 32 24 47 41 24 26 30 41
y 3.41 48 36 20 29 42 42 20 42 22 41 45 14 30
0 33 28 34 24 32 24 47 41 24 26 30 41
iid
Corollary 2 Let X = (X1 , · · · , Xn ) ∼ F = N (µ, σ 2 ) and if G(.|X ) =
2 ), then the following relation holds asymptotically,
N (X̄, SX
4 Simulations
This section includes the simulations of the bootstrap methods to clarify the
results which are obtained theoretically in Section 3.
15
Table 2: Simulation of S 2× and V 2×
Variable S 2∗ , S 2# Ratio† V ∗, V # Ratio‡
x 165.00,165.00 0.517 1755.70,2178.04 0.990
y 118.31,118.37 0.492 1338.85,1124.60 0.042
†
The ratio of S 2∗ < S 2# in the 1000 simulations,
‡
The ratio of V ∗ < V # in the 1000 simulations.
b×
S 2 ± tα/2 seB,
b×
where seB is the bootstrap estimate of standard error.
16
S 2× − S 2
where t×α/2 is α/2 percentile of t × = p , where S 2× and V (S 2 )×
V (S 2 )×
are estimated by the bootstrap method.
nS 2×
where χ2×
α/2 is the percentile of χ
2× = .
S2
Method VI This method is called the percentile CI:
[θb%low , θb%up ] = [G
b −1 (α/2), G
b −1 (1 − α/2)],
Tables 3 and 4 include the bootstrap confidence interval at the level of 95%
for x and y with B = 500 which are discussed in the previous example. The
parametric bootstrap is done by the normal distribution. The first two lines
of both tables are the standard method for the construction of CI of variance,
which is based on the t and χ2 . It is obvious that Method I has smaller length
than Method II because the former is based on the symmetrical distribution
but in reality the distribution of variance is asymmetrical. Method II is known
as the exact method, as a criterion which can be used to study the different
methods.
It is obvious that the length of parametric bootstrap CI for the variance
of x is wider than the nonparametric bootstrap. In contrast the length of
parametric bootstrap CI for the variance of y is shorter than that of the
nonparametric bootstrap. Because Method III uses the square root of V (S 2 )
which depends on the kurtosis, this method is directly affected by kurtosis.
Method IV uses bootstrap resamples in t. Although the methods V and VI
do not use V (S 2 )× directly, they are based on the 5th and 95th percentiles
and of course the spread is directly affected by kurtosis. Method VII is also
17
Table 3: Confidence interval at 95% for x and y
x y
Method Low Up Length Low Up Length
I 99.018 244.049 145.031 59.050 187.094 128.043
II 118.448 305.233 186.784 84.984 218.999 134.014
III non 100.064 243.003 142.938 61.886 184.258 122.372
III par 91.483 251.584 160.101 66.084 180.060 113.976
IV non 110.249 283.828 173.578 75.743 295.122 219.379
IV par 115.847 309.760 193.912 77.494 236.850 159.356
V non 124.379 306.281 181.902 83.498 225.435 141.936
V par 120.598 311.475 190.876 85.460 219.482 134.022
VI non 99.927 233.364 133.437 65.223 183.094 117.870
VI par 96.051 248.405 152.353 69.262 175.156 103.660
VII non 119.520 258.307 138.786 79.792 227.236 147.443
VII par 113.565 289.907 176.342 84.531 217.723 133.192
18
5 Conclusions
This paper discusses bootstrap estimation of the variance, in the nonpara-
metric and the parametric setting and studies their behavior. It shows that
the parametric and nonparametric bootstrap estimation of variance are equal
(7), but that the bootstrap standard error depends on the sample kurtosis
(7). If the distribution of the sample is normal and the parametric bootstrap
is based on the normal distribution then this parametric bootstrap method
with normal distribution can be expected to be better than the nonparamet-
ric bootstrap, i.e. closer to the sample distribution. If KFn > 3, then for small
sample sizes, the nonparametric bootstrap method seems more appropriate.
Moreover, Theorem 2 gives the expectation of V ∗ and V # . In the case of
the nonparametric method, this depends on K but for the parametric method,
it depends on K and KG(.|X ) . When KG(.|X ) depends on the observations, the
given general form of the parametric bootstrap does not hold.
Figure 2 explains the expected result. It clarifies that when K is between
1.4 and 2, the result of nonparametric bootstrap is more appropriate, regard-
less of whether KG(.|X ) and F have the same distribution.
This paper emphasizes that special care should be taken when making
claims about the accuracy of the parametric bootstrap approach in applica-
tions, Figure 3, which is based on Theorem 2, clarifies how much the result is
affected by the wrong choice if the distribution of population is not normal.
Two kind of expectations are discussed throughout, conditional and uncon-
ditional. The conditional expectation clarifies the result of the bootstrapping,
whereas the unconditional expectation is the combination of the bootstrapping
and a frequentist approach.
References
[1] Athreya, K.B. and Lahiri. S.N. (2006). Measure Theory and Probability
Theory. Springer, New York.
[2] Chiodi, M. (1995). Generation of pseudo random variates from a normal
distribution of order P. Pubblicato su Statistica Applicata. 7(4), 401-416.
[3] Cramér, H. (1945). Mathematical Methods of Statistics. Almqvist & Wik-
sells, Uppsala.
[4] Darlington, R.B. (1970). Is kurtosis really peakedness? The American
Statistician. 24, 19-22.
[5] Davison, A.C. and Hinkley, D.V. (1997). Bootstrap Methods and Their
Application. Cambridge University Press. Cambridge.
[6] Efron, B. and Tibshirani, R. (1993). Introduction to the Bootstrap. Chap-
man & Hall, New York.
19
[7] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New
York.
[8] Hintze, J.L. and Nelson, R.D. (1998). Violin plots: a box plot-density
trace synergism. The American Statistician, 2(5), 181-4.
[9] Janssen, A. and Pauls,T. (2003). How do bootstrap and permution tests
work? Annals of Statistics, 31, 768-806.
[10] Joanes, D.N. and Gill, C.A. (1998). Comparing measures of sample skew-
ness and kurtosis. The Statistician, 47(1), 183-189.
[11] Lee, S.S. (1994). Optimal choice between parametric and non-parametric
bootstrap estimates. Math. Proc. Comb. Phil. Soc, 115, 335.
[12] Mackinnon, J.G. (2002). Bootstrap inference in econometrics. Canadian
Journal of Economics, 35, 615-645.
[13] Ostaszewski, K. and Rempala, G.A. (2000). Paramet-
ric and nonparametric bootstrap in acturial practice.
www.actuarialfoundation.org/research edu/parametic.pdf.
[14] Shao, J. and Tu, D. (1996). The Jackknife and Bootstrap. Springer, New
York.
20