a-general-bootstrap-algorithm-for-hypothesis-testing
a-general-bootstrap-algorithm-for-hypothesis-testing
a r t i c l e i n f o abstract
Article history: The bootstrap is a intensive computer-based method originally mainly devoted to
Received 14 December 2010 estimate the standard deviations, confidence intervals and bias of the studied statistic.
Received in revised form This technique is useful in a wide variety of statistical procedures, however, its use for
2 June 2011
hypothesis testing, when the data structure is complex, is not straightforward and each
Accepted 5 September 2011
Available online 10 September 2011
case must be particularly treated. A general bootstrap method for hypothesis testing is
studied. The considered method preserves the data structure of each group indepen-
Keywords: dently and the null hypothesis is only used in order to compute the bootstrap statistic
Gini index values (not at the resampling, as usual). The asymptotic distribution is developed and
Survival model
several case studies are discussed.
Competing risk
& 2011 Elsevier B.V. All rights reserved.
Cumulative incidence function
1. Introduction
The Bootstrap method, introduced and explored in detail by Efron (1979, 1982), is a (not only but mainly)
nonparametric intensive computer-based method of statistical inference which is often used in order to solve many real
questions without the need of knowing the underlying mathematical formulas. In particular, the bootstrap is really useful
in order to asses measures of accuracy to statistical estimates.
Obviously, there exists a vast literature about it. Among others, the monographs of Efron and Tibshirani (1993), Hall
(1992) or Shao and Tu (1995) addressed the problem from different approaches. Besides, alternative to the original one
bootstrap resampling plans have been proposed. González-Manteiga and Prada-Sánchez (1994) provided a brief review for
the smoothed, symmetrized and Bayesian bootstraps.
Originally, the bootstrap was devoted to the confidence interval construction and there exists a huge number of papers
about this topic (see, e.g., Hall, 1988 or DiCiccio and Efron, 1996 and references therein). Of course, there is an intimate
connection between confidence intervals and hypothesis testing. However, both procedures can differ because of the need (for
hypothesis testing) to generate the bootstrap distribution of the selected test statistics under a specific null hypothesis (Martin,
2007). Once saved this point, the bootstrap methods provide a creative way for building hypothesis testing without the need for
restrictive parametric assumptions (see, e.g., Silverman, 1981 or Davison and Hinkley, 1997 and references therein).
Nonetheless and despite of that authors as (for instance) Davison and Hinkley (1997) dealt with the problem of
developing fully nonparametric null models from which resampling can be carried out when no simple null model exists,
there is a vast variety of problems in which the resampling under the null implies, in someway, the (partially) loss of the
original data structure (e.g., marginal distribution comparison of k dimensional random variables). Moreover, when the
null does not imply necessarily the equality among the involved cumulative distribution functions (CDFs), the usual
bootstrap resampling plan may not be a good method for estimating the statistic variability (see, for instance, the analysis
for the Gini index made in Section 3 of the present paper).
n
Correspondence to: Oficina de Investigación Biosanitaria del Principado de Asturias, C/Rosal 7 bis, 33009 Oviedo, Spain. Tel.: þ 34 985109805.
E-mail addresses: pablomc@ficyt.es (P. Martı́nez-Camblor), norbert@uniovi.es (N. Corral).
0378-3758/$ - see front matter & 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.jspi.2011.09.003
590 P. Martı́nez-Camblor, N. Corral / Journal of Statistical Planning and Inference 142 (2012) 589–600
In this paper, a general resampling plan for hypothesis testing is studied. Our bootstrap algorithm, previously
considered to develop a kernel density based test for the classical k-sample problem for paired design (Martı́nez-Camblor,
2010a), allows to preserve the particular structure of each (involved) group. The key of this algorithm is that the null is
considered in order to compute the statistic (bootstrap) values instead of at the resampling moment (as usual).
Rest of the paper is organized as follows: in Section 2, the problem for general hypothesis testing is introduced. General
asymptotic distribution for the traditional and the new bootstrap procedures are developed. In Section 3, we analyze the
problem of testing the equality among Gini indices from independent samples (equality among Gini indices does not imply
equality among underlying distribution functions). Section 4 is devoted to the survival curves comparison under non-equal
censoring time distributions. Finally, in Section 5 we deal with the comparison of cumulative incidence functions (CIFs) in
competing risk setting.
The studied algorithm is really simple and easily to implement, in addition it performs well (in the sense that it leads to
good approximation to the distribution of interest) in all considered problems. Finally, we want to remark that, arguing as
in Horváth (1991), we can assume without loss of generality that all random variables and processes are defined on the
same (and adequate) probability space.
With the purpose of developing a test to check whether the parameter (or function) J (J(t) for functions) (of course, J
depends on the underlying distribution function, F) is the same for k different populations, i.e., to contrast the hypothesis:
H0 : J1 ¼ ¼ Jk ð ¼ JÞ, ð1Þ
in first place, we must choose an adequate estimator for estimating the target. Let J^ n be this estimator. If J^n ¼ ðJ^ n1 , . . . , J^ nk Þ
stands for the k-dimensional vector where J^ ni is the estimator of Ji (J value in the ith group, i 2 1, . . . ,k) we assume that, for
certain fixed l 2 ð0; 1Þ, it is satisfied the weak convergence:
L
nl fJ^n Jg!n Dk ½J, ð2Þ
with J ¼ ðJ1 , . . . ,Jk Þ and Dk ½J is a k-dimensional probability law which, probably, depends on the real (and unknown) J value.
For different sample sizes, nl fJ^n Jg denotes the vector fnl1 ðJ^ n1 J1 Þ, . . . ,nlk ðJ^ nk Jk Þg. Note that (2) implies that, if Ck ½J ¼
L
fGð1Þ ½J1 , . . . , GðkÞ ½Jk g is a k-dimensional vector with distribution Dk ½J then, we also have the convergence nl fJ^n Jg!n Ck ½J.
L
Moreover, for each i 2 1, . . . ,k we have that nli fJ^ ni Ji g!ni GðiÞ ½Ji (D½Ji denotes the (marginal) distribution of GðiÞ ½Ji , 1 ri r k).
Let X ¼ fX1 , . . . ,Xk g, with Xi ¼ fxi1 , . . . ,xini g for i 2 1, . . . ,k, be k random samples (X could also be a random sample from a
k-dimensional random variable with marginals X 0i s). It can be assumed (without loss of generality) that the hypothesis in
(1) is rejected for large values of the statistic
X
k
TN¼ cN ðnli fJ^ni J^ gÞ, ð3Þ
i¼1
where fcN gN2N is a sequence of real functions such that cN -N c, J^ ni is the estimation of J in the ith sample (1 ri rk) and
P P
J^ ¼ N 1 ki ¼ 1 ni J^ ni ðN ¼ ki ¼ 1 ni Þ. Under the null (and only under the null), it is easy to derive the equality:
0 1
Xk X k
TN¼ cN @ aij ðnÞnj fJ^nj Jj gA,
l
ð4Þ
i¼1 j¼1
with aii ðnÞ ¼ ð1ni =NÞ and aij ðnÞ ¼ ðnli n1 l =NÞ for jai (1 ri,j r k). Under general and mild conditions (continuity is
j
sufficient although not need) on the functions cN (N 2 N) and if aij ðnÞ-N aij 2 R (i.e., there exist real constants, cij, such
that ni =nj -ni ,nj cij for i,j 2 1, . . . ,k) directly from (2) and (4) is derived the following convergence:
0 1
L X k Xk
T N !N c @ aij GðjÞ ½Jj A, ð5Þ
i¼1 j¼1
where Gð1Þ ½J, . . . , GðkÞ ½J are the k components of the k-dimensional random vector Ck ½J which appears in statement (2)
(note that the marginal distributions are D½Jj , 1 rj rk).
The classical bootstrap method for hypothesis testing (assuming independence among the samples) replicates the
original problem by drawing k independent (bootstrap) samples from the pooled sample distribution (F^ N ) following the
general algorithm:
B 1. Compute the statistic value, T N , from the original sample X ¼ fX1 , . . . ,Xk g.
B 2. Draw B samples X n,b ¼ fX1n,b , . . . ,Xkn,b g (1r b rB) from the pool sample distribution (F^ N ).
B 3. n,b
For b 2 1, . . . ,B, compute T N , statistic value referred to the sample X n,b .
B 4. The distribution under the null of the statistic T N is approximated by fT nN,1 , . . . ,T N
n,B
g, i.e., the final P-value is
P. Martı́nez-Camblor, N. Corral / Journal of Statistical Planning and Inference 142 (2012) 589–600 591
computed by
1XB
n,b
Pn ¼ IfT N r T N g,
Bb¼1
IfAg stands for the usual indicator function on the set A (takes value 1 if A is true and 0 otherwise).
If T nN denotes the bootstrap version of T N (the underlying distribution function is the ECDF computed from the pool sample,
L
F^ N ), since for each particular i 2 1, . . . ,k nli fJ^ ni Ji g!ni GðiÞ ½Ji we have that, for each u 2 R:
0 8 0 1 91
<X k Xk =
@P X fT n r ugP cN @ ^
aij GðjÞ ½J N r u A!N 0 a:s:,
A ð6Þ
N
: ;
i¼1 j¼1
where Gð1Þ ½J^ N , . . . , GðkÞ ½J^ N are k independent random variables with distribution D½J^ N and P X denotes probability
conditionally on sample X.
In a vast variety of situations, traditional bootstrap works correctly and provides a valuable tool in order to compute the
statistic distribution under the null. Moreover, all the (bootstrap) samples generated by the above algorithm are from the
same distribution, F^ N and, therefore, the distribution derived from B1 B4 is always a distribution under the null. However,
the null not necessarily implies the equality among the k-distribution functions and, in some particular problems, the
derived distribution from the above algorithm can produce unsatisfactory critical regions.
Note that, the origin of the error lies in to assume equality among the underlying distribution functions when this were
untrue. If we resampling for each sample (in particular, from F^ ni with i 2 1, . . . ,k), the Eq. (4) (which only uses the relevant
information contained in the null) suggests the following bootstrap estimator:
0 1
X k X k X
k
l ^ nn ^ A nn nn
nn
TN ¼ cN @ aij ðnÞnj fJ nj J nj g ¼ cN ðnli ½ðJ^ni J^ ÞðJ^ni J^ ÞÞ, ð7Þ
i¼1 j¼1 i¼1
nn nn P
where, for i 2 1, . . . ,k, J^ ni is the value of the statistic in a sample from F^ ni and J^ ¼ N 1 ki ¼ 1 ni J^ ni .
This expression allows to define the following (simple) algorithm:
Pk
N1. Compute the statistic value, T N ¼ i¼1 cN ðnli fJ^ni J^ gÞ, from the original sample X ¼ fX1 , . . . ,Xk g.
N2. For each i 2 1, . . . ,k, draw B samples (with size ni) Xinn,b from F^ ni to built X nn,b ¼ fX1nn,b , . . . ,Xknn,b g (1r br B).
P nn,b nn,b
N3. For b 2 1, . . . ,B, compute T nnN
,b
¼ ki ¼ 1 cN ðnli ½ðJ^ ni J^ ÞðJ^ ni J^ ÞÞ, statistic value referred to the sample X n,b (J^ ni and J^
are still the estimations from the original sample).
,1 nn,B
N4. The distribution under the null of T N is approximated by fT nn N , . . . ,T N g, i.e., the final P-value is computed by
1X B
,b
Pnn ¼ IfT N r T nn
N g:
Bb¼1
If T nn
N stands for the new bootstrap version of T N (following the above algorithm), the Theorem 1 warrants that, under the
null hypothesis joint with usual and mild conditions, the distribution of T nn
N approximates the T N distribution.
Theorem 1. Under the conditions and notations previously mentioned, if Dk ½J þ d-d-0 Dk ½J (Dk is the one involved in (2)),
then for each u 2 R, it had the convergence
N rugPfT N rugÞ!N 0
ðP X fT nn a:s:, ð8Þ
where P X denotes probability conditionally on sample X.
where Gð1Þ ½J^ n1 , . . . , GðkÞ ½J^ nk are the k components of the k-dimensional random vector Ck ½J^n (note that the marginal
distributions are D½J^ nj (1r j r k) and the independence among them is not need).
On the other hand, under the null, J^ ni -ni J (1r i rk), hence the properties required to fcN gN2N and to Dk let us to write:
0 1 0 1
Xk X k
L Xk X
k
cN @ aij ðnÞGðjÞ ½J^nj A!N c@ aij GðjÞ ½JA a:s: ð10Þ
i¼1 j¼1 i¼1 j¼1
592 P. Martı́nez-Camblor, N. Corral / Journal of Statistical Planning and Inference 142 (2012) 589–600
Obviously, 8u 2 R,
0 8 0 1 91
<Xk X
k =
nn
ðP X fT N r ugPfT N rugÞ ¼ @P X fT nn r ugP cN @ ^
aij ðnÞGðjÞ ½J nj r u A
A
N
: ;
i¼1 j¼1
0 8 0 1 9 8 0 1 91
<Xk X
k = <Xk X
k =
@
þ P cN @ ^ A
aij ðnÞGðjÞ ½J nj r u P c@ ^
aij GðjÞ ½J ru A
A
: ; : ;
i¼1 j¼1 i¼1 j¼1
0 8 0 1 9 1
<Xk X
k =
@
þ P c@ ^
aij GðjÞ ½J r u PfT N rugA:
A ð11Þ
: ;
i¼1 j¼1
Convergence in (8) is derived from (11), (9), (10) and (5). &
The proposed algorithm (N1 N4 ) behaves good in a variety of problems. It always resamples from the original data
(without taking into account, at this moment, the null hypothesis). This point, which is the main particularity of the method
(step N3), allows to preserve the whole original data structure (variance–covariance matrix), even the (possible) dependence
(resampling on the individual in paired design). The null (and only the null) is incorporate to the algorithm in the bootstrap
estimator definition (Eq. (7)) and is used at the moment of computing the statistic (bootstrap) values. Any other assumptions
(for instance, other kind of equalities among the groups) are necessary. In addition, under the null, arguing like in Theorem 1
and from (6) and (8), for each u 2 R, it is derived the convergence ðP X fT nN rugP X fT nn
N ruÞgÞ-N 0 a.s. However, under the
alternative, this statement is, in general, untrue.
Despite of that our theoretical developments have been focused on the k-sample problem, the proposed algorithm can
be applied on general hypothesis testing. In particular, in one sample problem (H0: J1 ¼ J for a fixed J). However, in these
cases steps N2 and B2 (resampling procedures) are equal and traditional and new bootstraps are equivalent. Note that, in
this setting, two resampling plans get to preserve the original data structure.
The hypothesis test in which the traditional bootstrap method reaches its best and most direct application is, obviously,
the k-sample problem for independent samples, i.e., to check the null:
H0 : F1 ¼ ¼ Fk ð ¼ FÞ:
There exist a number of statistic for this goal, probably, one of the most popular is the k-sample version of the traditional
Cramér–von Mises test proposed by Kiefer (1959). Let Xi ¼ fxi1 , . . . ,xini g (1 ri rk) be k independent samples, it is defined by
X
k Z
CN2 ðkÞ ¼ ni fF^ ni ðXi ,tÞF^ N ðX,tÞg2 dF^ N ðX,tÞ, ð12Þ
i¼1
P
where F^ ni ðXi ,tÞ (1r i rk) and F^ N ðX,tÞ (N ¼ ki ¼ 1 ni ) denote the empirical cumulative distribution function (ECDF) referred
to the ith sample and to the pooled sample, respectively.
It is well-known (see, for instance, Van der Vaart, 1998) that, under general (and mild) conditions, it had the
convergence:
pffiffiffi L
nfF^ n ðX,tÞFðtÞg!n W 0 fF ðt Þg,
where W 0 ft g (0 r t r 1) stands for typical Brownian bridge. Following the above exposition, we also know that, under the
null:
0 12
k Z
X X
k
L
CN2 ðkÞ!N @ a 0
ij W ðjÞ fF ðt Þg
A dFðtÞ,
i¼1 j¼1
where W 0ð1Þ fF ðt Þg, . . . ,W 0ðkÞ fF ðt Þg are k-independent Brownian bridges. If CN2, n ðkÞ and CN2, nn ðkÞ denote, respectively, the
bootstrap (where the underlying distribution function always is F^ N ) and the new bootstrap (the resample is done by
following the step N2 and the statistic value is computed by using N3) versions of CN2 ðkÞ, it had (particular equations for (6)
and (8)) that, for each u 2 R, we have the convergences:
0 8 0 12 91
<X k Z X k =
B 2, n
@P X fCN ðkÞ r ugP @ aij W 0ðjÞ fF^ N ðt ÞgA dF^ N ðtÞ r u C
A!N 0 a:s:
: ;
i¼1 j¼1
P. Martı́nez-Camblor, N. Corral / Journal of Statistical Planning and Inference 142 (2012) 589–600 593
and
0 8 0 12 91
<Xk Z X
k =
B
@P X fCN2, nn ðkÞ r ugP @ aij W 0ðjÞ fF^ nj ðt ÞgA dF^ N ðtÞ ru C
A!N 0 a:s:
: ;
i¼1 j¼1
It is worth to note that the traditional bootstrap resampling is from the pooled sample, therefore ECDF estimation is
P
made from a sample with size N (N ¼ ki ¼ 1 ni ). However, in the considered new bootstrap, ECDF estimations are always
made from each particular group (sizes nj, 1 rj r k). In practice, under the null, the effects are almost negligible (although,
in this case, for too small sample sizes the new bootstrap could have some problems with the nominal level) but, under the
alternative, the estimations could be different (see Fig. 1).
Let us consider a homogeneous (equal sample size) two-sample problem. Let X1 and X2 be two independent random
samples (with size n) drawn from the distributions F1 and F2, respectively, and let F ¼ ð1=2ÞF1 þð1=2ÞF2 . CN2 ðkÞ-distribution
can be the approximated from:
Z
pffiffiffi
Dn ¼ 2 ½W 0ð1Þ fF1 ðtÞgW 0ð2Þ fF2 ðtÞg þ n eðtÞ2 dFðt Þ,
where W 0ð1Þ fF 1 ðt Þg and W 0ð2Þ fF 2 ðt Þg are two independent Brownian bridges and eðtÞ ¼ ½F2 ðtÞF1 ðtÞ. It is easy to compute the
corresponding expected value:
Z Z Z
E½Dn ¼ 2 F1 ðtÞð1F1 ðtÞÞ dFðtÞ þ2 F2 ðtÞð1F2 ðtÞÞ dFðtÞ þ 2n ½F2 ðtÞF1 ðtÞ2 dFðtÞ:
Bootstrap approximation is based on resampling from the ECDF assuming that the null hypothesis is true, i.e., on
resampling from F^ N ðX,tÞ with X ¼ fX1 ,X2 g (N ¼ 2n). Therefore, CN2, n ðkÞ-distribution can be approximated by
Z
Dn ¼ 2 ½W 0ð1Þ fFðtÞgW 0ð2Þ fFðtÞg2 dFðtÞ,
Density
N−Boots.
Boots.
0.5 0.5
0.0 0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
t t
H0 (n1 = n2 = 100) H1 (n1 = n2 = 100)
1.5 1.5
1.0 1.0
Density
Density
0.5 0.5
0.0 0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
t t
Fig. 1. Density estimations for two studied procedures. The histograms depict the real distribution under the null computed from 10,000 Monte Carlo
replications.
594 P. Martı́nez-Camblor, N. Corral / Journal of Statistical Planning and Inference 142 (2012) 589–600
Studied resampling plan (algorithm N1 N4 ) always resampling from the original data (without any additional assump-
tion), therefore it preserves the original data structure. The null is (only) taking into account in order to compute the
bootstrap statistic values (following the step N3). CN2, nn ðkÞ-distribution can be approximated by
Z
Dnn ¼ 2 ½W 0ð1Þ fF1 ðtÞgW 0ð2Þ fF2 ðtÞg2 dFðtÞ,
Obviously, E½Dnn r E½Dn and, despite of, under the null, the asymptotic difference is zero (for finite samples the difference
is (almost) negligible), under the alternative, this difference can lead to changes in the obtained statistical power. New
bootstrap procedure estimates the original distribution assuming that the null is true, i.e., eðtÞ ¼ 0. Fig. 1 depicts the
traditional and the new bootstrap approximation when F1 is a standard normal distribution (N ð0; 1Þ) and F2 is a N ð2; 1Þ. At
left, the null is true, both samples are drawn from F (the observed rejection proportions (a ¼ 0:05) were 0.051 and 0.054 for
the bootstrap and the new bootstrap methods, respectively (n¼ 50) and 0.049 and 0.051 for n¼100). The difference
between the densities are negligible for the two considered sample sizes (n ¼50, 100). At right (the histograms with the
real distributions used as reference are the same), the simulations are from the alternative hypothesis (the observed
statistical power was 1, the distributions are clearly different and e(t) is relevant even for small sample size). The density
estimation for the new bootstrap is a bit more sharp than the one for traditional bootstrap.
For a fixed nominal level a, the statistical power of the statistic CN2 ðkÞ is P H1 fCN2 ðkÞ 4 ta g (P H1 denotes the probability
conditionally on the alternative hypothesis) where ta ¼ maxfu2Rg ðP H0 fCN2 ðkÞ 4 ug r aÞ (P H0 denotes the probability
conditionally on the null). Of course, one method (to approximate the distribution of the statistic) will be more powerful
when the corresponding ta estimation was smaller.
Arguing like in the previous scheme, we consider a two-sample problem where F1 follows a N ð0; 1Þ and F2 a N ðm,1Þ
(F ¼ ð1=2ÞF1 þ ð1=2ÞF2 ). Under the null, both samples are generated from F and, under the alternative, one is run from F1 and
the other one from F2. Table 1 depicts the observed rejection proportions (at nominal level a ¼ 0:05) in 10,000 Monte Carlo
simulations, for different values of m and different sample sizes (although always n1 ¼ n2 ¼ n). The P-values were
approximated with B ¼2000. Although the new bootstrap method always rejected a bit more than the traditional one,
differences between both methods are negligible.
Fig. 2 depicts the mean of the ta -estimations from the two methods (tna and tnn a for traditional and proposed algorithms,
respectively) for the previous considered problem (n¼50). Note that, in spite of the observed statistical power is almost
the same, when resampling from the alternative hypothesis, tnn a -values are smaller than the ta ones.
n
Gini (1995) concentration index is often used in order to study distribution inequality (mainly, although not only, in
economic context). Let X be a non-negative random variable with cumulative distribution function (CDF) F, the Gini index
can be defined as
Z
1
GðFÞ ¼ FðuÞð1FðuÞÞ du:
E½X
By replacing the CDF by the empirical cumulative distribution function (ECDF) it is obtained the typical nonparametric
Gini index estimator, GðF^ N Þ. This estimator has been widely studied (see, e.g., Martı́nez-Camblor, 2007 and references
Table 1
Observed rejection proportions in 10,000 Monte Carlo simulations for the bootstrap and new bootstrap algorithms. The P-values were approximated from
2000 iterations.
H0
50 0.052 0.050 0.051 0.049 0.056 0.053 0.054 0.053
75 0.045 0.050 0.048 0.052 0.047 0.052 0.051 0.055
100 0.049 0.051 0.048 0.052 0.050 0.053 0.049 0.052
150 0.050 0.046 0.050 0.051 0.051 0.049 0.050 0.051
H1
50 0.052 0.648 0.996 1.000 0.055 0.659 0.997 1.000
75 0.054 0.824 1.000 1.000 0.056 0.831 1.000 1.000
100 0.049 0.919 1.000 1.000 0.051 0.920 1.000 1.000
150 0.047 0.980 1.000 1.000 0.048 0.980 1.000 1.000
P. Martı́nez-Camblor, N. Corral / Journal of Statistical Planning and Inference 142 (2012) 589–600 595
2.5 2.5
2.0 2.0
1.84
1.5 1.5
1.0 1.0
0.5 0.5
** *
τα τα
0.0 0.0
0 1/2 1 3/2 0 1/2 1 3/2
μ μ
Fig. 2. Continuous and dotted lines stand for the mean of tnn
a and ta (a ¼ 0:05), respectively. At left, the samples are from the null, at right the samples are
n
from the alternative (n¼ 50). Grey areas stand for a 95% confidence intervals.
where for 1 rj r k, N ðjÞ ð0,VG ðFj ÞÞ are random variables with distribution N ð0,VG ðFj ÞÞ. For each u 2 R, the bootstrap version
(associated with the algorithm B1 B4 ) of T N (T nN ), satisfies the convergence
0 8 0 1 91
<X k X k =
@P X fT n r ugP
N
@ aij N ðjÞ ð0,VG ðF^ N ÞÞA r u A!N 0 a:s:,
: ;
i¼1 j¼1
where for 1r j r k, N ðjÞ ð0,VG ðF^ N ÞÞ are independent random variables with distribution N ð0,VG ðF^ N ÞÞ. The new bootsrap
version (associated with the algorithm N1 N4 ) of T N (T nn N ), for each u 2 R, satisfies the convergence:
0 8 0 1 91
<X k X k =
@P X fT nn r ugP
N
@ aij N ðjÞ ð0,VG ðF^ nj ÞÞA r u A!N 0 a:s:,
: ;
i¼1 j¼1
where for N ¼ fN ð1Þ ð0,VG ðF^ n1 ÞÞ, . . . ,N ðkÞ ð0,VG ðF^ nk ÞÞg is a k-dimensional random vector (which preserves the original data
structure) and whose marginal random variables follow the distribution N ð0,VG ðF^ nj ÞÞ (1 rj rk).
Even for independent samples, there exist some issues which do not allow to guarantee the general correct convergence
for the traditional bootstrap method. On one hand, there exist different cumulative distribution functions which drive to
the same Gini index and, on the other hand, given two CDFs, F1 and F2, the equality Gðð1=2ÞðF1 þF2 ÞÞ ¼ ð1=2ÞðGðF1 Þ þ GðF2 ÞÞ is
usually untrue.
For instance, consider one sample drawn from the distribution F1 ðtÞ ¼ tI½0;1Þ ðtÞ þI½1,1Þ ðtÞ (IA stands for the typical
indicator function) and another one independently drawn from F2 ðtÞ ¼ ð1=2ÞF1 ðtÞ þ ð1=2ÞI½2,1Þ ðtÞ (in Fig. 3, at left, it is shown
the respective Lorenz curves). It is easy to check that GðF1 Þ ¼ GðF2 Þ ¼ 1=3 (VG2 ðF1Þ ¼ 8=135 ( 0:058) and VG2 ðF2Þ ¼ 304=3375
596 P. Martı́nez-Camblor, N. Corral / Journal of Statistical Planning and Inference 142 (2012) 589–600
1.0 100
F1
F2
0.8 80
Lorenz Curve
0.6 60
Density
Monte Carlo
N−Boots.
0.4 40 Boots.
0.2 20
0.0 0
0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.02 0.04 0.06 0.08 0.10
t t
Fig. 3. At left, Lorenz curves for the considered functions. At right, histogram for the real distribution (based on 10,000 Monte Carlo replications) and
density estimations for the bootstrap and new bootstrap approximation.
( 0:090)). Obviously, the null is true, however, Gðð1=2ÞF1 þ ð1=2ÞF2 Þ ¼ 3=7 (VG2 ðð1=2ÞF1 þð1=2ÞF2 Þ ¼ 64=1135 ( 0:057)). In
this case, assuming equal sample sizes, the asymptotic distribution for T N is ð168=1125Þ w21 . Since for j 2 1, . . . ,k,
L n L
F^ nj !nj Fj (almost surely), it had that T nn 2 2
N !N ð168=1125Þ w1 while T N !N ð128=1125Þ w1 .
Fig. 3 shows, at right, the two considered approximations for the sample distribution (sample size were n1 ¼ n2 ¼ 100)
for the statistic T N ; one based on the usual bootstrap algorithm (B1 B4 ) and the one based on the studied resampling plan
(algorithm N1 N4 ) for the above described problem. The real sample distribution is approximated by 10,000 Monte Carlo
replications. The observed rejection percentages (nominal level a ¼ 0:05) were 11.1% and 5.3% for the traditional and new
bootstrap, respectively.
Conventionally, survival studies are related with the estimation of the involved survival functions (1-CDF). The data are
often randomly right censored. Specifically, let Tn ¼ ft1 , . . . ,tn g be a sample of n independent and identically distributed
(iid) lifetime (non-negative) observations with cumulative distribution function F and let Cn ¼ fc1 , . . . ,cn g be iid censoring
time (also non-negative) observations with CDF G. We observe zi ¼ minfti ,ci g and know the pairs ðzi , di Þ where di ¼ Ifti o ci g
(1 ri r n). Clearly, Z ¼ fz1 , . . . ,zn g are iid with CDF H where ð1HÞ ¼ ð1FÞð1GÞ. In this context, the Kaplan–Meier (KM) or
product-limit estimator (Kaplan and Meier, 1958) plays the analog role that the ECDF for complete information. KM has
been widely studied and there exists a vast literature on it. We would like to highlight here the paper by Csörgo+ (1996) in
which the author established universal Gaussian approximations for the empirical cumulative hazard and the product-
limit processes. In particular, it is known that, if d(t) is the variance of the KM estimator, DðtÞ ¼ dðtÞ=½1 þdðtÞ, for t 2 ½0, tH
(tH ¼ inf fx : HðxÞ ¼ 1g),
pffiffiffi KM L
nfF^ n ðtÞFðtÞg!n ½1FðtÞ½1dðtÞW 0 fDðt Þg,
12 12
10 10
8 Monte Carlo 8
Density
Density
N−Boot
6 Boot 6
4 4
2 2
0 0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
t t
Fig. 4. Histogram for the real distribution of the statistic (based on 10,000 Monte Carlo replications) and density estimations for the bootstrap and new
bootstrap approximation under the null (at left) and under the alternative described in the text (at right).
ti ) and a three-sample problem where the (real) underlying survival function is SðtÞ ¼ ð1=3ÞfS1 ðtÞ þ S2 ðtÞ þ S3 ðtÞg
1
with Si ðtÞ ¼ eð2iÞ t
I½0,1Þ ðtÞ (1 ri r3) and the respective censoring times are drawn from the functions
1
Ci ðtÞ ¼ ði5Þ tI½0,i5 ðtÞ þ Iði5,1Þ ðtÞ (the expected censor percentages are 32.5%, 26.7% and 22.1% for the first, second and third
sample, respectively). Despite of, obviously, the null hypothesis is true, in a Monte Carlo simulation study (considered
sample sizes were n1 ¼ n2 ¼ n3 ¼ 50 and 10,000 Monte Carlo replications were run) the observed rejection percentage
(a ¼ 0:05) for the traditional bootstrap was 9.5% (the P-values were estimated from 1000 bootstrap resamples). The
proposed resampling procedure gets to preserve (again) the internal structure within each group and the observed
rejection proportion (on the same Monte Carlo replications) was 4.3% (1000 bootstrap samples were also considered in
order to estimate the P-values).
Fig. 4 depicts the real distribution of the statistic (estimated by 10,000 Monte Carlo replications), the traditional
bootstrap approximation (resampling from the pooled sample) and the proposed new bootstrap distributions under the
null (at left). At right, the distribution under the null is computed in a problem in which the underlying survival functions
are S1, S2 and S3 for the first, second and third sample, respectively (the observed statistical power (nominal level a ¼ 0:05)
is close to 0.95). The histogram is the same real distribution in both cases.
The problem of resampling from the pooled sample is the ti estimation. As usual, since the censoring and the survival
times are independent (i.e., the censoring mechanism is noninformative), the problem can also be solved by using a little
modification on the usual bootstrap algorithm (see Martı́nez-Camblor, 2011). In this case, simulations (not shown here)
suggest that the traditional bootstrap results (under the null) are quite similar than the obtained ones by using the
proposed resampling algorithm.
The above problem is compounded when the studied event may be precluded by the occurrence of others which alters
the probability of experiencing the event of interest. Such events are known as competing risk events.
To be precise, in the considered competing risks setting, we suppose there are k independent groups of subjects, let
0
Tni ¼ fti1 , . . . ,tini g (1 ri r k) be the failure times of ith group and dij (1 r j rni ) be the indicator of the studied event (1 if the
0
event of interest occurs and 2 if other (competing) event occurs). For each i 2 1, . . . ,k, the pairs ðtij , dij Þ (1r j r ni ) from
different subjects within the same group are assumed to be iid. Conventionally, there also exist independent censoring
n
times, Cni ¼ fci1 , . . . ,cini g, which are iid with CDF Gi. We observe zij ¼ minftij ,cij g and, for i 2 1, . . . ,k, the pairs ðzij , dij Þ j i¼ 1
0
where dij ¼ dij Iftij ocij g are known. There exist different functions of interest related with this problem. We will focus on
the cumulative incidence function (CIF). Therefore, our goal is to test the hypothesis:
H0 : F11 ðtÞ ¼ ¼ F1k ðtÞð ¼ F1 ðtÞÞ, ð15Þ
0
where, for j 2 1, . . . ,k, F1j ðtÞ ¼ Pftij rt, dij
¼ 1g. Despite of that the methods for estimating this subdistribution function are
not new (see Kalbfleish and Prentice, 1980 for a early reference), misused of the Kaplan–Meier estimator on this setting is
still usual in biomedical literature (Gooley et al., 1999). Of course, some different criteria have been proposed in order to
testing (15). Gray (1988) developed a log-rank type tests for the CIFs comparison. Pepe (1991) used the integrated
difference of CIF estimates (for two-sample problems). Lin (1997) proposed to use the Kolmogorov–Smirnov criterion for
598 P. Martı́nez-Camblor, N. Corral / Journal of Statistical Planning and Inference 142 (2012) 589–600
1.0 1.0
Density
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
t t
Fig. 5. Histogram for the real distribution of the statistic (based on 10,000 Monte Carlo replications) and density estimation for new bootstrap
approximation under the null (at left) and under the alternative described in the text (at right).
this goal. Asymptotic distribution (theory of counting processes (Aalen, 1978) is usually employed in order to derive this
approximation) is always derived in order to approximate the (considered) statistic distribution.
Let us consider the Kolmogorov–Smirnov type test suggested by Lin (1997). It is known (Kraus, 2007) that its
convergence to the asymptotic distribution is slow and its fixed nominal level is usually underestimated. The studied
resampling plan not only provides a useful tool in order to approximate the distribution under the null for general
statistics, as the usual bootstrap, it can also get increase the convergence speed for small samples. Suppose it had two
independent sample, the studied (event labeled as 1) real failure time distribution is (for both samples)
F10 ðtÞ ¼ f1ð1=2Þet ð1=4Þeð1=2Þt gI½0,1Þ ðtÞ and this event happens with probability 12. The other competing events (labeled
as 2) is F20 ðtÞ ¼ f1et gI½0,1Þ ðtÞ, obviously, it also happens with probability 12. Since the failure times are generated, the
censoring times are drawn, independently, following a uniform distribution on [0,10] (about 15% and 10% of censored are
expected for F10 and F20, respectively). In this situation, a Monte Carlo simulation study (considered sample sizes were
n1 ¼ n2 ¼ 50 and 10,000 replications were run) the observed rejection percentage (a ¼ 0:05) for the asymptotic
approximation (Lin, 1997) was (only) 1.6%. The proposed bootstrap gets improve this approximation and it obtains a
rejection percentage of 5.9% (the P-values were estimated from 1000 bootstrap resamples). This improvement has a direct
impact on the statistical power. Under the alternative (the studied event, 1, takes values from F11 ðtÞ ¼ f1et gI½0,1Þ ðtÞ and
F12 ðtÞ ¼ f1ð1=2Þeð1=2Þt g I½0,1Þ ðtÞ for the first and second sample, respectively, with probability 12. Rest of the problem
conditions are still like the previous ones), the observed rejection percentages (10,000 Monte Carlo replications were
made) were 10.2% for the asymptotic distribution and 22.1% for the new bootstrap.
Fig. 5 shows density estimations under the null for the real sample distribution (estimated from 10,000 Monte Carlo
replications) and for the new bootstrap algorithm, at left under the null and, at right, under the alternative hypothesis. In
this case, both distributions are almost equal.
6. Main conclusions
In order to construct a resampling algorithm useful to hypothesis testing, it looks reasonable that, the null should be
taken into account. Most of the authors match that the resampling under the null is critical to the proper construction of
bootstrap test (see, for example, Fisher and Hall, 1990; Hall and Wilson, 1991 or Westfall and Young, 1993). Despite that
the resampling under the null can be developed in a wide range of situations, it is true that, in many common practical
problems sampling under the null could be complicated. The problems appear when the null hypothesis restrictions are
not easily reflected, in adequate way, by the pool sample (Bickel and Ren, 2001). In particular, if the studied parameter, J(F)
P P
(F stands for the underlying distribution function) is such that Jð ki ¼ 1 gi Fi Þa ki ¼ 1 gi JðFi Þ (gi 2 R for i 2 1, . . . k), resampling
from the pool sample could drive to mistaken critical regions.
There have been a number of papers in the statistical literature which deal with particular problems and propose
modifications of the bootstrap for correct use in testing (see, for instance, Romano, 1988, 1989 and references there in). In
this report we study a general, simple and useful bootstrap algorithm which allows to develop fully nonparametric tests.
The main difference with the usual bootstrap plan is that the new proposed bootstrap does not resample from the null but
the relevant information contained in the null (and only this information) is used in order to define the bootstrap
estimator (i.e., to compute the statistic bootstrap values). The idea lies on item N3, in front of the B3 for traditional
bootstrap. Note that both steps are the same for one-sample problems therefore, in this case, both methods are the same.
P. Martı́nez-Camblor, N. Corral / Journal of Statistical Planning and Inference 142 (2012) 589–600 599
This device allows resampling from the alternative (in order to preserve the data structure within the different studied
groups) and not make any other additional assumptions.
When the traditional bootstrap works (the samples are independent and the null hypothesis implies the equality
among the involved CDFs), the provided distributions for both methodologies, under the null, are asymptotically
equivalent. However, the two procedures are different and the obtained distributions, under the alternative, can be really
different, hence the obtained statistical power could be (also) different depending on the particular studied problem. For
instance, in the considered two-sample problem based on the Cramér–von Mises statistic, the differences between
algorithms do not produce relevant changes on the observed statistical powers.
The developed method has many practical applications. Due the studied algorithm allows preserve the internal
covariance data structure, perhaps the marginal distribution comparison in a multivariate random variable could be the
most direct one (see Martı́nez-Camblor, 2010a; Martı́nez-Camblor et al., 2011a, 2011b for usual applications on k-sample
problem for paired design and Martı́nez-Camblor and Corral (2011c) for the generalization of the repeated measures
problem to functional data). However, it is also really useful when the null hypothesis implications are not clear (see
Martı́nez-Camblor et al., 2011b for a practical application of this case). Competing risks and multistate models are specially
interested setting, the different functions which are involved in these problems, which can be different even under the
null, complicate the use of the usual resampling plans and the studied procedure allows replicate the problem complexity
and checking the interest hypothesis without any other assumption.
Of course, the considered algorithm is compatible and can be easily adapted to other bootstrap resampling alternatives
like the smoothed, symmetrized or Bayesian ones.
Acknowledgements
The authors are grateful with the anonymous reviewers which comments and suggestions have helped to improve
the paper.
References
Aalen, O.O., 1978. Nonparametric inference for a family of counting processes. Annals of Statistics 6, 701–726.
Akritas, M.G., 1986. Bootstrapping the Kaplan–Meier estimator. Journal of the American Statistical Association 81 (396), 1032–1038.
Bickel, P.J., Ren, J.J., 2001. The Bootstrap in Hypothesis Testing. Lecture Notes-Monograph Series, vol. 36, pp. 91–112.
+ S., 1996. Universal Gaussian approximations under random censorship. Annals of Statistics 34 (6), 2744–2778.
Csörgo,
Davison, A.C., Hinkley, D.V., 1997. Bootstrap Methods and their Application. Cambridge University Press, Cambridge.
DiCiccio, T.J., Efron, B., 1996. Bootstrap confidence intervals (with discussion). Statistical Sciences 11, 189–228.
Efron, B., 1979. Bootstrap methods: another look at the jackknife. Annals of Statistics 7 (1), 1–26.
Efron, B., 1981. Censored data and the bootstrap. Journal of the American Statistical Association 76, 312–319.
Efron, B., 1982. The jackknife, the bootstrap and other resampling plans. In: Regional Conference Series in Applied Mathematics, CBMS-NSF.
Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.
Fisher, N.I., Hall, P., 1990. On bootstrap hypothesis testing. Australian & New Zealand Journal of Statistics 32 (2), 177–190.
Gray, R.J., 1988. A class of k-sample tests for comparing the cumulative incidence of a competing risk. Annals of Statistics 16, 1141–1154.
Gini, C., 1995. Variabilitá e mutabilita. In: Pizetti, E., Salvemini, T. (Eds.), Memorie di Metodologia Statistica, Rome. Reprinted (Libreria Eredi Virgilio
Veschi, 1912).
González-Manteiga, W., Prada-Sánchez, J.M., 1994. The bootstrap—a review. Computational Statistics 9, 165–205.
Gooley, T.A., Leisenring, W., Crowley, J., Storer, B.E., 1999. Estimation of failure probabilities in the presence of competing risks: new representations of
old estimators. Statistics in Medicine 18, 695–706.
Hall, P., 1988. Theoretical comparison of bootstrap confidence intervals (with discussion). Annal of Statistics 16, 927–953.
Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer, New York.
Hall, P., Wilson, S.R., 1991. Two guidelines for bootstrap hypothesis testing. Biometrics 47, 757–762.
Harrington, D.P., Fleming, T.R., 1982. A class of rank test procedures for censored survival data. Biometrika 69 (3), 553–566.
Horváth, L., 1991. On Lp-norms of multivariate density estimators. Annals of Statistics 19 (4), 1933–1949.
Kalbfleish, J.D., Prentice, R.L., 1980. The Statistical Analysis of Failure Time Data. John Wiley, New York.
Kaplan, E.L., Meier, P., 1958. Nonparametric estimation from incomplete observations. Journal of American Statistic Association 53, 457–481.
Kiefer, J., 1959. k-Sample analogues of the Kolmogorov–Smirnov, Cramér–von Mises tests. Annals of Mathematical Statistics 30, 420–447.
Kraus, D., 2007. Smooth Tests of Equality of Cumulative Incidence Function in Two Samples. Institute of Information and Automation, Prague, Research
Report 2197, pp. 1–12.
Lin, D.Y., 1997. Nonparametric inference for cumulative incidence functions in competing risks studies. Statistics in Medicine 16, 901–910.
Martin, M.A., 2007. Bootstrap hypothesis testing for some common statistical problems: a critical evaluation of size and power properties. Computational
Statistics & Data Analysis 51 (12), 6321–6342.
Martı́nez-Camblor, P., 2007. Central limit theorems for S-Gini and Theil inequality coefficients. Revista Colombiana de Estadı́stica 2, 287–300.
Martı́nez-Camblor, P., 2010a. Nonparametric k-sample test based on kernel density estimator for paired design. Computational Statistics & Data Analysis
54, 2035–2045.
Martı́nez-Camblor, P., 2010b. Comparing k-independent and right censored samples based on the likelihood ratio. Computational Statistics 25, 363–374.
Martı́nez-Camblor, P., 2011. Testing the equality among distribution functions from independent and right censored samples via Cramér–von Mises
criterion. Journal of Applied Statistics 38 (6), 1117–1131.
Martı́nez-Camblor, P., Carleos, C., Corral, N., 2011a. Cramér–von Mises statistic for paired samples, unpublished manuscript.
Martı́nez-Camblor, P., Corral, N., Vicente, D., 2011b. Statistical comparison of the genetic sequence type diversity of invasive Neisseria meningitidis isolates
in northern Spain (1997–2008). Ecological Informatics, in press. doi:10.1016/j.ecoinf.2011.06.001.
Martı́nez-Camblor, P., Corral, N., 2011c. Repeated measures analysis for functional data. Computational Statistics & Data Analysis 55 (12), 3244–3256.
Pepe, M.S., 1991. Inference for events with dependent risks in multiple endpoint studies. Journal of American Statistical Association 86, 770–778.
Reid, N., 1981. Estimating the median survival time. Biometrika 68, 601–608.
Romano, J.P., 1988. A bootstrap revival of some nonparametric distance tests. Journal of American Statistic Association 83, 698–708.
600 P. Martı́nez-Camblor, N. Corral / Journal of Statistical Planning and Inference 142 (2012) 589–600
Romano, J.P., 1989. Bootstrap of randomization tests of some nonparametric hypotheses. Annals of Statistic 17, 141–159.
Shao, J., Tu, D., 1995. The Jackknife and the Bootstrap. Springer, New York.
Silverman, B.W., 1981. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society, Series B 43 (1), 97–99.
Van der Vaart, A.W., 1998. Asymptotic Statistics. Cambridge University Press, London.
Westfall, P.H., Young, S.S., 1993. Resampling-based Multiple Testing: Examples and Methods for p-value Adjustment. Wiley, New York.