0% found this document useful (0 votes)
3 views21 pages

Consistencyofsumvar

Uploaded by

jrhs1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views21 pages

Consistencyofsumvar

Uploaded by

jrhs1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

The Annals of Statistics

2005, Vol. 33, No. 5, 2022–2041


DOI: 10.1214/009053605000000390
c Institute of Mathematical Statistics, 2005

ESTIMATION OF SUMS OF RANDOM VARIABLES: EXAMPLES


arXiv:math/0602214v1 [math.ST] 10 Feb 2006

AND INFORMATION BOUNDS1

By Cun-Hui Zhang
Rutgers University
This paper concerns the estimation of sums of functions of ob-
servable and unobservable variables. Lower bounds for the asymptotic
variance and a convolution theorem are derived in general finite- and
infinite-dimensional models. An explicit relationship is established
between efficient influence functions for the estimation of sums of
variables and the estimation of their means. Certain “plug-in” esti-
mators are proved to be asymptotically efficient in finite-dimensional
models, while “u, v” estimators of Robbins are proved to be effi-
cient in infinite-dimensional mixture models. Examples include cer-
tain species, network and data confidentiality problems.

1. Introduction. Given a pool of n motorists, how do we estimate the


total intensity of those in the pool who have a prespecified number of traffic
accidents in a given time period? This is an example of a broad class of
problems involving the estimation of sums of random variables
n
X
(1.1) Sn ≡ u(Xj , θj )
j=1

[24], where Xj are observable variables, θj are unobservable variables or


constants, and u(·, ·) is a certain utility function. The estimation of (1.1)
has numerous important applications. In the motorist example, Xj is the
number of traffic accidents and θj the intensity of the jth individual in
the pool, and u(x, ϑ) = ϑI{x = a} for a prespecified integer a. In Sections
3, 4 and 5 we consider applications in certain species, network and data
confidentiality problems.

Received June 2001; revised October 2004.


1
Supported in part by the National Science Foundation.
AMS 2000 subject classifications. Primary 62F10, 62F12, 62G05, 62G20; secondary
62F15.
Key words and phrases. Empirical Bayes, sum of variables, utility, efficient estimation,
information bound, influence function, species problem, networks, node degree, data con-
fidentiality, disclosure risk.

This is an electronic reprint of the original article published by the


Institute of Mathematical Statistics in The Annals of Statistics,
2005, Vol. 33, No. 5, 2022–2041. This reprint differs from the original in
pagination and typographic detail.
1
2 C.-H. ZHANG

The estimation of (1.1) is a nonstandard problem in statistics, since the


sums, involving observables, as well as unobservables, are not parameters.
Without a theory of efficient estimation, the performance of different estima-
tors can only be measured against each other in terms of relative efficiency.
For the specific motorist example with u(x, ϑ) = ϑI{x = a}, Robbins and
Zhang [28] proved that, in a Poisson mixture model, the efficient estimation
of (1.1) is equivalent to the efficient estimation of E(θ|X = a), so that the
usual information bounds can be used. In this paper we provide a general
theory for the efficient estimation of sums of variables.
Let (X, θ), (Xj , θj ), j = 1, . . . , n, be i.i.d. vectors with an unknown com-
mon joint distribution F . Our general theory covers asymptotic efficiency
for the estimation of
n
X
(1.2) Sn ≡ Sn (F ) ≡ u(Xj , θj ; F )
j=1

based on X1 , . . . , Xn , where the utility u(x, ϑ; F ) is also allowed to depend


on F . This provides a unified asymptotic theory for the estimation of (1.1)
and conventional parameters u(F ), since the utility is allowed to depend on
F only. Our problem is closely related to the estimation of the mean
(1.3) µ(F ) ≡ EF u(X, θ; F ).
If EF u2 (X, θ; F ) < ∞ and 1/2 ≤ α < 1, an estimator is nα -consistent for the
estimation of Sn (F ) iff it is nα -consistent for the estimation of its mean
nµ(F ) = EF Sn (F ). But an efficient estimator of nµ(F ) is not necessarily
an efficient estimator of Sn (F ), since the two estimation problems may have
different efficient influence functions, as we demonstrate below in (1.4)–(1.6)
and in simple examples in Sections 2.3 and 2.4. The asymptotic theory for
the estimation of µ(F ) is well understood; see [3, 17, 31].
Suppose that F belongs to a known class F . Let F0 ∈ F . An estimator
µ
bn of (1.3) is (locally) asymptotically efficient in contiguous neighborhoods
of PF0 iff
n
1X
(1.4) µ
bn = µ(F0 ) + ψ∗ (Xj ) + oPF0 (n−1/2 ),
n j=1

where ψ∗ (x) ≡ ψ∗ (x; F0 ) is the efficient influence function at F0 for the esti-
mation of µ(F ). In Section 6 we show that, under mild regularity conditions
on the utility functions {u(x, ϑ; F ), F ∈ F}, an estimator Sbn of (1.2) is (lo-
cally) asymptotically efficient in contiguous neighborhoods of PF0 iff
n
Sbn 1X
(1.5) = µ(F0 ) + φ∗ (Xj ) + oPF0 (n−1/2 ),
n n j=1
ESTIMATING SUMS OF RANDOM VARIABLES 3

where φ∗ (x) ≡ φ∗ (x; F0 ) is the efficient influence function at F0 for the esti-
mation of Sn (F ). Furthermore, the following relationship holds between the
two efficient influence functions in (1.4) and (1.5):
(1.6) φ∗ (x) = ψ∗ (x) + u(x; F0 ) − µ(F0 ) − u∗ (x),
where u(x; F ) ≡ EF [u(X, θ; F )|X = x] and u∗ (x) ≡ u∗ (x; F0 ) is the projec-
tion of u(x; F0 ) to the tangent space of the family of distributions {F X , F ∈
F} at F0X . Here F X is the marginal distribution of X under the joint distri-
bution F of (X, θ). It follows clearly from (1.6) that asymptotically efficient
estimations of Sn (F )/n and µ(F ) are equivalent in contiguous neighbor-
hoods of PF0 iff u(·; F0 ) − µ(F0 ) is in the tangent space, that is, u(·; F0 ) −
µ(F0 ) = u∗ (·; F0 ).
We will derive more explicit results in finite-dimensional models and
infinite-dimensional mixture models. In finite-dimensional models F = {Fτ ,
τ ∈ TP } with a Euclidean τ , it will be shown that “plug-in” estimators of the
form nj=1 u(Xj ; Fbτn ) are asymptotically efficient for the estimation of (1.2)
if τbn is an efficient estimator of τ . In infinite-dimensional mixture models,
certain “u, v” estimators of Robbins [24] will be shown to be efficient for the
estimation of (1.1). We shall consider estimation of (1.1) with known f (x|ϑ)
in Section 2 and provide the general theory in Section 6. Section 7 contains
proofs of all theorems.

2. Mixture models. Suppose (X, θ) ∼ F (dx, dϑ) = f (x|ϑ)ν(dx)G(dϑ), that


is,
(2.1) X|θ ∼ f (x|θ), θ ∼ G.
In this section we state our results for the estimation of (1.1) with known
f (·|·).

2.1. Finite-dimensional mixture models. Let {Gτ , τ ∈ T } be a paramet-


ric family of distributions with an open T in a Euclidean space. Suppose
(2.1) holds with G = Gτ for an unknown vector τ ∈ T . Suppose that, for
certain functions ρeτ ,
Z

( gτ,∆ − 1 − ∆t ρeτ /2)2 dGτ = o(k∆k2 ),
(2.2) Z
gτ,∆ dGτ = 1 + o(k∆k2 ), as ∆ → 0,

where gτ,∆ is the Radon–Nikodym derivative of the absolutely continuous


part of Gτ +∆ with respect to Gτ . Let Eτ denote the expectation under Gτ .
The Fisher information matrix for the estimation of τ based on a single X
is
(2.3) Iτ ≡ Covτ (ρτ (X)), ρτ (x) ≡ Eτ [ρeτ (θ)|X = x].
Define uτ (x) ≡ Eτ [u(X, θ)|X = x] and µτ ≡ Eτ u(X, θ).
4 C.-H. ZHANG

Theorem 2.1. Suppose (2.2) holds, Eτ u2 (X, θ) is locally bounded and


Iτ are of full rank for all τ ∈ T . Then {Sbn , n ≥ 1} is an asymptotically
efficient estimator of (1.1) iff (1.5) holds with µ(F0 ) = µτ , P = Pτ , and the
efficient influence function
(2.4) φ∗ = φ∗,τ ≡ uτ − µτ + ρtτ Iτ−1 γτ ,
where γτ ≡ Eτ Covτ (u(X, θ), ρeτ (θ)|X) = Eτ {u(X, θ)ρeτ (θ) − uτ (X)ρτ (X)}.

Remark 2.1. Since κ∗,τ ≡ Iτ−1 ρτ is the efficient influence function for
the estimation of τ and ∂µτ /∂τ = Eτ U (X, θ)ρeτ (θ), ψ∗,τ ≡ ρtτ Iτ−1 Eτ u(X, θ)ρeτ (θ)
is the efficient influence function for the estimation of µτ . Moreover, u∗,τ ≡
ρtτ Iτ−1 Eτ uτ (X)ρτ (X) is the projection of uτ to the tangent space generated
by the scores ρτ (X) under Eτ . Thus, Theorem 2.1 asserts that (1.5) and
(1.6) hold under (2.2).

Our next theorem provides the asymptotic theory for plug-in estimators
n
X
(2.5) Sbn ≡ ubτn (Xj )
j=1

of (1.1), where uτ (x) ≡ Eτ [u(X, θ)|X = x] as in Theorem 2.1. An estimator


τbn of the vector τ is an asymptotically linear one with influence functions
κτ under Eτ if
n
1X
(2.6) τbn = κτ (Xj ) + oPτ (n−1/2 ),
n j=1

with Eτ κτ (X)ρtτ (X) being the identity matrix.

Theorem 2.2. Let Sbn be as in (2.5) with an asymptotically linear esti-


mator τbn as in (2.6). Suppose conditions of Theorem 2.1 hold, Eτ u2τ +∆ (X) =
O(1) as ∆ → 0 for every τ ∈ T , and for all τ ∈ T and c > 0,
n
X
(2.7) sup √ [uτ +∆ (Xj ) − uτ (Xj ) − {Eτ uτ +∆ (X) − µτ }] = oPτ (n1/2 ).
k∆k≤c/ n j=1

Let φ∗,τ and γτ be as in Theorem 2.1 and κ∗,τ = Iτ−1 ρτ . Then


Sbn − Sn D
(2.8) 1/2
−→ N (0, στ2 ), στ2 = σ∗,τ
2
+ Varτ ({κτ (X) − κ∗,τ (X)}t γτ )
n
under Eτ , where σ∗,τ2 ≡ Var (φ (X) − u(X, θ)). Consequently, S
τ ∗,τ
bn is an
asymptotically efficient estimator of (1.1) at Eτ0 iff γτ0 τbn is an asymptoti-
cally efficient estimator of γτ0 τ in contiguous neighborhoods of Eτ0 .
ESTIMATING SUMS OF RANDOM VARIABLES 5

Remark 2.2. It follows from (2.8) that |Sbn − Sn | ≤ 1.96σbτn n1/2 pro-
vides an approximate 95% confidence interval for (1.1), provided that στ is
continuous in τ .

Remark 2.3. Condition (2.7) holds if {uτ +∆ : τ + ∆ ∈ T , k∆k ≤ δτ } is


a Donsker class under Eτ for some δτ > 0 and Eτ u2τ +∆ (X) is continuous at
∆ = 0.

2.2. General mixtures. Let G be a convex class of distributions. Suppose


(2.1) holds with an unknown G ∈ G. Let EG be the expectation under (2.1).
Suppose EG u2 (X, θ) < ∞ for all G ∈ G. Define
 Z 
2
(2.9) GG0 ≡ G : EG0 (fG (X)/fG0 (X)) < ∞, fG I{fG0 > 0} dν = 1 ,
R
where fG (x) ≡ f (x|ϑ)G(dϑ), and define
(2.10) VG0 ≡ {v(x) : EG v(X) = EG u(X, θ) ∀ G ∈ GG0 }.

Theorem 2.3. (i) If VG0 is nonempty, then {Sbn , n ≥ 1} is an asymptoti-


P
cally efficient estimator of (1.1) at EG0 iff Sbn = { nj=1 vG0 (Xj )}+oPG0 (n1/2 )
with
(2.11) vG0 ≡ arg min{EG0 (v(X) − u(X, θ))2 : v ∈ VG0 }.

(ii) If VG0 is empty, then there does not exist any regular n−1/2 -consistent
estimator of EG u(X, θ) or Sn /n in contiguous neighborhoods of EG0 .

The definition of regular estimators of (1.1) is given in Section 6.


Suppose that for certain G∗ ⊆ G the collection
(2.12) V∗ ≡ {v(x) : EG v(X) = EG u(X, θ), EG v 2 (X) < ∞ ∀ G ∈ G∗ }
is nonempty, for example, certain VG0 as in Theorem 2.3(i). Let khkG ≡
{EG h2 (X)}1/2 .

Theorem 2.4. Let vG0 be as in (2.11). Suppose vG0 ∈ V∗ and as (ε, n) →


(0, ∞),
( n
)
X vG (Xj ) − vG0 (Xj )
sup : kvG − vG0 kG0 ≤ ε, G ∈ G∗ → 0 in PG0
j=1
n1/2
b be an estimator of G such that PG (G
for all G0 ∈ G∗ . Let G b ∈ G∗ ) → 1 and
0
kvGb − vG0 kG0 → 0 in PG0 for all G0 ∈ G∗ . Then
n
X
(2.13) Vbn ≡ vGb (Xj )
j=1
6 C.-H. ZHANG

is an asymptotically efficient estimator of (1.1) at PG0 for all G0 ∈ G∗ .

If f (x|ϑ) belongs to certain exponential families, there exists a unique


function v such that VG0 6= ∅ implies VG0 = {v}, so that vG0 = v for all G0
and V∗ = {v}. The following theorem is a variation of Theorem 2.4 for such
distributions.

Theorem 2.5. Suppose f (x|ϑ) ∝ exp(xt λ(ϑ)), λ(ϑ) ∈ Λ, is an exponen-


tial family with an open Λ in a Euclidean space, and that the conditional dis-
tribution of θ given λ(θ) is known. Suppose G contains distributions G ≡ Gc
with EG |λ(θ) − c| = 0 for all c ∈ Λ. If VG0 6= ∅ for certain G0 , then there
exists a function v(x) such that
(2.14) EG [v(X)|λ(θ) = c] = EG [u(X, θ)|λ(θ) = c] ∀ c ∈ Λ, G ∈ G,
and such that the following Vn is an efficient estimator of Sn under {EG :
EG v 2 (X) < ∞}:
n
X
(2.15) Vn ≡ v(Xj ).
j=1

Remark 2.4. Robbins [24] called (2.15) “u, v” estimators, provided


that (2.14) holds. The Vbn in (2.13) can be viewed as a “u, v” estimator
with an estimated optimal v. Theorems 2.4 and 2.5 provide conditions un-
der which these two types of “u, v” estimators are asymptotically efficient.

2.3. The Poisson example. Let (X, Y, λ) ≡ (X, θ) with


E[Y |X, λ] = λ,
(2.16)
f (x|λ) ≡ P (X = x|λ) = e−λ λx /x!, x = 0, 1, . . . .
Robbins [22, 24] and Robbins and Zhang [25, 26, 27] considered the esti-
P P
mation of Sn′ ≡ nj=1 λj u(Xj ) and Sn′′ ≡ nj=1 Yj u(Xj ), and several related
problems.
Both Sn′ and Sn′′ are special cases of (1.1). For u(x) = I{x ≤ a}, Sn′′ could
be the total number of accidents next year for those motorists with no more
than a accidents this year in the motorist example.
Suppose λj have a common exponential density τ e−λτ dλ with unknown
τ . The marginal distribution of X is fτ (x) = τ (1 + τ )−x−1 , and the marginal
and conditional expectations of λu(X) and Y u(X) are

X
(x + 1)u(x)
uτ (x) = , µτ = fτ (x)xu(x − 1).
1+τ x=0
ESTIMATING SUMS OF RANDOM VARIABLES 7
Pn Pn
Let X ≡ j=1 Xj /n. Define τbn ≡ (β + n)/(α + j=1 Xj ) and
n
X n
X (α/n + X)(Xj + 1)u(Xj )
(2.17) Sbn ≡ ubτn (Xj ) = .
j=1 j=1 (α + β)/n + 1 + X

It follows from Theorem 2.2 that the plug-in estimators in (2.17) are asymp-
totically efficient for both Sn′ and Sn′′ . For α = β = 0, (2.17) gives the plug-in
estimator corresponding to the maximum likelihood estimator (MLE) of τ .
For general positive α and β, (2.17) gives the Bayes estimator of Sn′ and Sn′′
P
with a beta prior on τ /(1 + τ ). Clearly, µ bn ≡ ∞ bn xu(x − 1)}/(1 + τbn )x+1
x=1 {τ
is efficient for the estimation of the mean µτ ≡ Eτ u(X, θ), but not for Sn′ /n
or Sn′′ /n. Similar results can be obtained for λ with the gamma distribution;
see [23].
In the case of completely unknown G(dλ), the “u, v” estimator (2.15) with
v(x) = xu(x − 1) is asymptotically efficient for the estimation of Sn′ and Sn′′
for all G with finite EG {v(X) − λu(X)}2 .

2.4. More examples.

Example 2.1. Let X ∼ N (τ, σ 2 ). The number of “above average” indi-


viduals, Sbn ≡ #{j ≤ n : Xj > X}, is an efficient estimator of the number of
above mean individuals Sn (τ ) ≡ #{j ≤ n : Xj > τ }. The estimator Sen ≡ n/2
is efficient for the estimation of Eτ Sn (τ ) = n/2, but not Sn (τ ).

Example 2.2. Let f (x|ϑ) ∼ N (ϑ, σ 2 ). An efficient estimator for the


number of “above mean” individuals, Sn ≡ #{j ≤ n : Xj > θj }, is Sbn ≡
n/2, compared with Example 2.1. This is even true under the condition
P
n−1 nj=1 θj2 = O(1), that is, in contiguous neighborhoods of P0 with P0 {θj =
0} = 1.

Pn
Example 2.3. Sbn ≡ 0 is efficient for the estimation of Sn (τ ) ≡ j=1 ρτ (Xj ).

3. A species problem. An interesting example of our problem is estimat-


ing the total number of species in a population of plants or animals. Suppose
a random sample of size N is drawn (with replacement) from a population of
d species. Let nk be the number of species represented k times in the sample.
A species problem is to estimate d based on {nk , k ≥ 1}. The problem dates
back to [13] and [14] and has many important applications [4]. We consider
a network application in Section 4.
8 C.-H. ZHANG

3.1. Finite-dimensional models. Let Xj be the frequencies of the jth


species in the sample, so that, for certain pj > 0,
d
X
(3.1) nk = I{Xj = k}, (X1 , . . . , Xd ) ∼ multinomial(N, p1 , . . . , pd ).
j=1

We will confine P our discussionPd


to the case of (N, N/d) → (∞, µ), 0 < µ <
∞, since E(d − ∞ k=1 n k ) = N
j=1 (1 − pj ) → 0 as N → ∞ for fixed d. Let
{Gτ , τ ∈ T } be a parametric family of distributions in (0, ∞), where τ is an
unknown parameter with a scale component, Gτ (y/c) = Gτc′ (y). Let Pτ be
probability measures under which (3.1) holds conditionally on N and certain
i.i.d. variables θj > 0, and
d
!
θj X
(3.2) pj = Pd , N |{θj } ∼ Poisson c θj , θj ∼ G,
i=1 θi j=1
R
with G = Gτ . Under Pτ , Xj are i.i.d. with Pτ {Xj = k} = e−y (y k /k!)Gτc′ (dy).
Assume c = 1 due to scale invariance. Since n0 is unobservable, the MLE of
(d, τ ) is
PN ∞  R nk
k=1 nk
Y e−y y k Gτ (dy)
(3.3) db ≡ R , τb ≡ arg max R .
(1 − e−y )Gbτ (dy) τ ∈T k=1
1 − e−y Gτ (dy)
In the next two paragraphs we derive the influence function for the MLE
(3.3) and prove its asymptotic efficiency.
If (2.2) holds and the MLE τb of τ is asymptotically efficient, then
d
1X
(3.4) τb = τ + κ∗,τ (Xj ) + oP (d−1/2 )
d i=1
with κ∗,τ ≡ {Cov τ (ρτ (X)}−1 ρτ and ρτ ≡ I{x>0} (ρτ (x) − γτ ), where ρτ is as
in (2.3) and γτ ≡ Eτ [ρτ (X)|X > 0]. Thus, by the Taylor expansion of the db
in (3.3),
d
X
(3.5) db = d + φ∗,τ (Xj ) + oP (d1/2 ),
j=1

where φ∗,τ (x) ≡ I{x>0} /Pτ (X > 0) − 1 − κt∗,τ (x)γτ . In this case, as d → ∞,
 
db − d D Pτ (X = 0)
(3.6) −→ N 0, + γτt {Covτ (ρτ (X)}−1 γτ .
d1/2 Pτ (X > 0)
For the gamma G(dy; τ ) ∝ y α−1 exp(−y/β) dy, the MLE τb ≡ (α, b satisfies
b β)
∞ P∞
X
ℓ=k nℓ delog(1 + β)
b deα
b βb
(3.7) = , = N,
α
b+k−1 b −α
1 − (1 + β) b b −α
1 − (1 + β) b
k=1
ESTIMATING SUMS OF RANDOM VARIABLES 9
P
with de = ∞ k=1 nk , and (3.4) holds [29]. Rao [19] called (3.3) with (3.7)
pseudo MLE in a different (gamma) model, but the efficiency of the db was
not clear [11].
The species problem is a special case of estimating (1.1) when d is viewed
as the number of species represented in the population out of a total of n
species. Specifically, letting pj = 0 if the jth species is not represented in the
population, estimating
n
X n
X N
X
(3.8) d= I{pj > 0} = I{Xj = 0, pj > 0} + nk
j=1 j=1 k=1

is equivalent to estimating (1.1) with u(x, p) = I{p > 0} or u(x, p) = I{x =


0, p > 0}, based on observations {Xj , j ≤ n}. Under (3.1) and (3.2) with d
replaced by n,
R −y k
e (y /k!)Gτ (dy)
(3.9) Pp∗ ,τ {Xj = k} = (1 − p∗ )I{k = 0} + p∗ R −y
I{k > 0}
(1 − e )Gτ (dy)
R
with certain p∗ < (1 − e−y )Gτ (dy). Under P (3.9), the τb in (3.3) is the con-
ditional MLE of τ given {nk , k ≥ 1}. Since ( ∞ k=1 nk , d, n − d) is a trinomial
vector, τb in (3.3) equals the MLE of τ based on a sample {Xj , j ≤ n} from
(3.9), provided that db in (3.3) is no greater than n. Since Pp∗ ,τ {db ≤ n} → 1
under (3.9), by Theorem 2.1, the (conditional) MLE (3.3) is asymptotically
efficient in the empirical Bayes model (3.2) under conditions (2.2), (3.4) and
(3.5).

3.2. General mixture. Now, suppose the distribution G in (3.2) is com-


pletely unknown. The nonparametric MLE of (d, G) is given by
R ∞  R  nk
de y>0 G(dy)
b Y e−y y k G(dy)
(3.10) db ≡ R , b ≡ arg max
G R ,
b
(1 − e−y )G(dy) G k=1
1 − e−y G(dy)
P
with de ≡ N k=1 nk , but its asymptotic
P
distribution is unclear. Since there is
no solution v to the equation ∞ x=0 v(x)e−ϑ ϑx /x! = I{ϑ > 0} for 0 ≤ ϑ < ∞,

by Theorems 2.3 and 2.5, the estimation of d with completely unknown G


is an ill-posed problem.
Among many choices, R a compromise between (3.3) and (3.10) is to fit
Eτ nk ∝ Pτ (X = k) = e−y (y k /k!)Gτ (dy) for 1 ≤ k ≤ m. For gamma G with
Enk+1 /Enk = (k + α)β/(1 + β), fitting the negative binomial distribution
yields
N
X
(3.11) db ≡ de + max(τb1 , 0)n1 , de ≡ nk ,
k=1
10 C.-H. ZHANG

where τb1 is the (weighted) least squares estimate of τ1 ≡ (β + 1)/(αβ) based


on
nk = τ1 nk+1 + τ2 (knk ) + error, k = 1, . . . , m − 1, τ2 ≡ −1/α,
with nk being a response variable and (nk+1 , knk ) being covariates for each
k. For small θj (large nk for small k), (3.11) has high efficiency for gamma G
and small bias for G(y) = c1 y α + (c2 + o(1))y α+1 at y ≈ 0. Chao [5] proposed
de + n21 /(2n2 ) as a low estimate of d. Another possibility is to estimate d by
correcting the bias of the estimator d/(1e − n1 /N ) of Darroch and Ratcliff
[9] as in [6].

4. Networks: estimation of node degrees based on source-destination data.


Source-destination (SD) data in networks are generated by sending probes
(e.g., traceroute queries in the Internet) through networks from certain source
nodes to certain destination nodes; see [8, 32]. We shall treat SD data as a
collection of random vectors Wj , j = 1, . . . , N , generated from a sample of
SD pairs and make statistical inference based on U -processes of {Wj }, for
example,
N
X X
h1 (Wj ) h2 (Wj1 , Wj2 )
(4.1) , ,
j=1
N 1≤j1 6=j2 ≤N
N (N − 1)

indexed by Borel h1 and h2 , where Wj are the observations from the jth SD
pair in the sample. We focus here on the estimation of node degrees, although
the approach based on (4.1) could be useful in other network problems.
The topology of a deterministic network can be described with a routing
table: a list r1 , . . . , rJ of directed paths representing connections between
pairs of source and destination nodes, with each path being composed of a
set of directed links. For example, the path 4 → 2 → 3 → 8 has source node
4, destination node 8, and links 4 → 2, 2 → 3 and 3 → 8. Consider a network
with nodes {1, . . . , K}. The link degree D(k, ℓ) is defined as the number of
paths using the link k → ℓ,
(4.2) D(k, ℓ) ≡ #{j ≤ J : link k → ℓ is used in rj },
with D(k, ℓ) = 0 if k → ℓ is nonexistent or never used. The node degree,
defined as
K
X
(4.3) dk = I{D(k, ℓ) > 0},
ℓ=1
is the number of outgoing
P
links from k to other nodes. This is also called out-
degree. The in-degree, ℓ I{D(ℓ, k) > 0}, is the number of incoming links
to k. The node degrees dk and their (empirical) distributions are important
characteristics of networks; see [12, 15, 30].
ESTIMATING SUMS OF RANDOM VARIABLES 11

For a given sample size N , let R1 , . . . , RN be a sample of SD pairs from the


routing table {r1 , . . . , rJ }. Suppose we observe the paths of Rj , so that the
vectors Wj ≡ (W1j , . . . , WKj )′ are given by Wkj ≡ ℓ if link k → ℓ is used in Rj
for some 1 ≤ ℓ ≤ K and Wkj = 0 otherwise. The observed link frequencies
are
N
X
(4.4) Xkℓ ≡ #{j ≤ N : link k → ℓ is used in Rj } = I{Wkj = ℓ}.
j=1

Since Xkℓ=0 for D(k, ℓ) = 0 by (4.3), the node degree dk is a sum


K
X
(4.5) dk = dek + sk , dek ≡ I{Xkℓ > 0},
ℓ=1

where dek is the observed degree and sk is the unobserved degree given by
K
X
(4.6) sk ≡ I{Xkℓ = 0, D(k, ℓ) > 0}.
ℓ=1
Lakhina, Byers, Crovella and Xie [16] and Clauset and Moore [7] pointed
out that the observed degrees dek may grossly underestimate the true node
degree dk .
It follows from (4.5), (4.6) and (3.8) that the problem of estimating the
node degree (4.3) is a species problem. From this point of view, we may di-
rectly use estimators in Section 3 and references therein, for example, (3.11).
However, in network problems, we are typically interested in simultaneous
estimation of many node degrees. Thus, information from {Xkℓ , ℓ ≤ K} can
be pooled from different nodes k. Let K ⊆ {1, . . . , K} be a collection of
“similar” and/or “independent” nodes. Let G be a family of distributions,
for example, gamma with unit scale. Suppose the G in (3.2) for different
nodes are identical to a member of G up to scale parameters βk . Then, as
in (3.10), the (pseudo) MLE for {dk , βk , k ∈ K, G} is given by
PN R
b
b j=1 nkj y>0 G(dy)
dk ≡ R ,
(1 − e−βbk y )G(dy)
b
(4.7) N  R −βk y j 
b b
Y Y e y G(dy) nkj
(β, G) ≡ arg max R ,
β,G k∈K j=1
1 − e−βk y G(dy)
where β ≡ (β, . . . , βK ) and the maximum is taken over all βk > 0 and G ∈ G.
This type of estimator is expected to perform well for self-similar networks.
In the nonparametric case of completely unknown G, the MLE (β, b G)
b in
(4.7) can be computed via the following EM algorithm:
( N  (m) (m) )−1 X
N
(m+1) X p(j + 1; βk , G(m) ) p(1; βk , G(m) )
βk ← nkj (m)
+ (m)
jnkj ,
j=1 p(j; βk , G(m) ) 1 − p(0; βk , G(m) ) j=1
12 C.-H. ZHANG
R
with p(j; βk , G) ≡ e−βk y y j G(dy),
N
!−1
XX (m+1)
(m+1) (m)
G (dϑ) ← G (dϑ) nkj /{1 − p(0; βk , G(m) )}
k∈K j=1
N  (m+1) (m+1) 
XX exp(−βk ϑ)ϑj exp(−βk ϑ)
× nkj (m+1)
+ (m+1)
.
k∈K j=1 p(j; βk , G(m) ) 1 − p(0; βk , G(m) )

5. Data confidentiality: estimation of risk in statistical disclosure. A ma-


jor concern in releasing microdata sets is protecting the privacy of individ-
uals in the sample. Consider a data set in the form of a high-dimensional
contingency table. If an individual belongs to a cell with small frequency, an
intruder with certain knowledge about the individual may identify him and
learn sensitive information about him in the data. Statistical models and
methods concerning the risk of such breach of confidentiality have been con-
sidered by many; see [10] and the proceedings of the joint ECE/EUROSTAT
work sessions on statistical data confidentiality. For multi-way contingency
tables, Polettini and Seri [18] and Rinott [21] studied the estimation of global
disclosure risks of the form
J
X
(5.1) SJ ≡ u(Xj , Yj )
j=1

based on {Xj , j ≤ J}, where Xj and Yj are the sample and population
frequencies in the jth cell, J is the total number of cells, and u(x, y) is a loss
the form u(x, y) = u(x)/y, for example, u(x, y) = y −1 I{x = 1}.
function of P
Let N = Jj=1 Yj be the population size. Suppose N ∼ Poisson(λ),
{Yj }|N ∼ multinomial(N, {πj }), Xj |({Yj }, N ) ∼ binomial(Yj , pj ),
(5.2) PJ
for certain πj > 0 with j=1 πj = 1, 0 ≤ pj ≤ 1 and λ > 0. For known
{pj , πj , λ}, the Bayes estimator of SJ in (5.1) is
J
X
(5.3) SJ∗ ≡ E(SJ |{Xj }) = uj (Xj ), uj (x) ≡ Eu(x, Yj − Xj + x),
j=1

with Yj − Xj ∼ Poisson((1 − pj )πj λ) (independent of Xj ). For u(x, y) =


y −1 I{x = 1},
(5.4) uj (x) = {(1 − pj )πj λ}−1 [1 − exp{−(1 − pj )πj λ}].
In general, the parameters (1 − pj )πj λ cannot be completely identified
from the data Xj ∼ Poisson(pj πj λ), so that it is necessary to further model
the parameters. This can be achieved by setting {pj , πj , λ} to known tractable
ESTIMATING SUMS OF RANDOM VARIABLES 13

functions of an unknown vector τ and certain covariates zj characterizing


cells j, and by incorporating all available knowledge about the parameters,
P P
for example, λ ≈ N and Jj=1 pi πj ≈ n/N , where n = Jj=1 Xj is the sample
size. Consequently, the conditional expectation uj (x) in (5.4) can be written
as uj (x) = u(x, zj ; τ ). This suggests
J
X
(5.5) SbJ ≡ u(Xj , zj ; τbJ )
j=1

as an estimator of the global risk (5.1) and its conditional expectation (5.3),
where τbJ is a suitable (e.g., the maximum likelihood or method of moments)
estimator of τ . For example, in a two-way table with cells labelled by j ∼
(i, k) and known πi,k and λ, we may assume a regression model pi,k = ψ0 (τ1 +
τ2′ zi,k ) for a certain known (e.g., logit or probit) function ψ0 . In the case of
unknown πi,k , we may consider the independence model πi,k = πi·π·k with
unknown πi· and known or unknown π·k . If τ has fixed dimensionality and
τbJ is asymptotically efficient, (5.5) is efficient by Theorem 2.2. Theorem 2.2
also suggests that (5.5) is highly efficient if dim(τ )/J → 0.
Alternatively, we may consider the negative binomial model N ∼NB(α, 1/
(1 + β)), that is, P (N = k) = Γ(k + α){Γ(α)k!}−1 β k /(1 + β)k+α . As in
[21], we have in this case Yj ∼ NB(α, 1/(1 + βj )) with βj = βπj , Xj ∼
NB(α, 1/(1 + pj βj )), and (Yj − Xj )|{Xj = x} ∼ NB(x + α, (1 + pj βj )/(1 +
βj )). Consequently,
Z 1
1 + pj βj
(5.6) uj (x) = tαj −1 dt I{x = 1}
(1 − pj )βj (1+pj βj )/(1+βj )

in (5.3) for u(x, y) = y −1 I{x = 1}. Bethlehem, Keller and Pannekoek [2]
studied this negative binomial model with constant πj = 1/J and pj =
En/EN ≈ n/N . For (αj , βj ) → (0, ∞), (Yj − Xj )|{Xj = x} converges in dis-
tribution to the NB(x, pj ), resulting in the µ-ARGUS estimator [1] with
uj (x) = pj (1− pj )−1 (− log pj )I{x = 1} in (5.6), as pointed out by Rinott [21].
Compared with the Poisson model in which λ ≈ N , estimates of both EN
and Var(N ) are required in the negative binomial model. The µ-ARGUS
model essentially assumes Var(N )/(EN )2 ≥ 1/α → ∞, which may not be
suitable in some applications.

6. General information bounds. We provide a lower bound for the asymp-


totic variance and a convolution theorem for (locally asymptotically) regular
estimators of the sum in (1.2). To facilitate the statements of our results, we
first briefly describe certain terminologies and concepts in general asymp-
totic theory.
14 C.-H. ZHANG

6.1. Scores and tangent spaces. Suppose (X, θ) ∼ F with F ∈ F , where


F is a family of joint distributions. Let C ≡ C(F0 ) be a collection of mappings
{Ft , 0 ≤ t ≤ 1} from [0, 1] to F satisfying

(6.1) EF0 ( ft (X) − 1 − tρ(X)/2)2 = o(t2 ), EF0 ft (X) = 1 + o(t2 ),
for certain score functions ρ(x) ≡ ρ(x; {Ft }) depending on the mappings
{Ft }, where ft ≡ dFtX /dF0X is the Radon–Nikodym derivative of the abso-
lutely continuous part of the marginal distribution FtX of X under Ft with
respect to the marginal distribution F0X . Let C∗ ≡ C∗ (F0 ) be the collection
of score functions ρ(X) generated by C. The tangent space H∗ ≡ H∗ (F0 ) is
the closure of the linear span [C∗ ] of C∗ in L2 (F0 ); that is,
(6.2) H∗ ≡ [C∗ ], C∗ ≡ {ρ(·; {Ft }) : {Ft } ∈ C}.
For further discussion about score and tangent space, see [3], pages 48–57.
The second part of (6.1) holds in regular parametric models; see [3], page
459.

6.2. Smoothness of random variables and their distributions. Let L(U ; F )


be the distribution of U under PF . Suppose that, for all {Ft } ∈ C, the ran-
dom variables uFt ≡ u(X, θ; Ft ) and uFt ≡ EFt [uFt |X] satisfy the continuity
conditions
(6.3) lim VarF0 (uFt − uF0 ) = 0,
t→0+

D
(6.4) L(wFt ; Ft ) −→ L(wF0 ; F0 ), EFt wF2 t → EF0 wF2 0 ,
as t → 0+, with wF ≡ uF − uF , and also satisfy the differentiability condition
(6.5) lim EF0 (uFt − uF0 )/t = EF0 φ(X)ρ(X)
t→0+
for certain φ(X) ≡ φ(X; F0 ) ∈ L2 (F0 ). The usual smoothness condition for
µ(F ), see [3], pages 57–58, is that, for a certain influence function ψ(X) ≡
ψ(X; F0 ) ∈ L2 (F0 ),
(6.6) lim {µ(Ft ) − µ(F0 )}/t = EF0 ψ(X)ρ(X).
t→0+

6.3. Regular estimators. An estimator µ̃n ≡ µ̃n (X1 , . . . , Xn ) of µ(F ) is


(locally asymptotically) regular at F0 if there exists a random variable ζ0
such that
(6.7) lim L(n1/2 {µ̃n − µ(Fc/√n )}; Fc/√n ) = L(ζ0 ; F0 )
n→∞
for all c > 0 and {Ft } ∈ C ([3], page 21). Likewise, for the estimation of
the sum Sn (F ) in (1.2), we say that an estimator Sen ≡ Sen (X1 , . . . , Xn ) is
regular at F0 if there exists a random variable ξ0 such that, for all c > 0 and
{Ft } ∈ C,
(6.8) lim L(n−1/2 {Sen − Sn (Fc/√n )}; Fc/√n ) = L(ξ0 ; F0 ).
n→∞
ESTIMATING SUMS OF RANDOM VARIABLES 15

6.4. Efficient influence functions and information bounds. Let ψ∗ be the


projection of ψ in (6.6) to the tangent space H∗ in (6.2). The standard
convolution theorem ([3], page 63) asserts that, for a certain variable ζ0′ ,
L(ζ0 ; F0 ) = N (0, Eψ∗2 (X)) ⋆ L(ζ0′ ; F0 )
for the ζ0 in (6.7), and that
P
efficient estimators are characterized
√ by (1.4). For
h ∈ L2 (F0 ), let An (h) ≡ nj=1 h(Xj , θj )/n and Zn (h) ≡ n{An (h) − EF0 h}.

Theorem 6.1. Suppose (6.3), (6.4) and (6.5) hold at F0 . Let φ∗,0 be
the projection of φ in (6.5) into the tangent space H∗ in (6.2), and let
φ∗ ≡ uF0 − µ(F0 ) + φ∗,0 .
(i) If (6.8) holds, then VarF0 (ξ0 ) ≥ VarF0 (φ∗ − uF0 ). Moreover, the lower
bound is reached without bias, that is, EF0 ξ02 = VarF0 (φ∗ − uF0 ), iff (1.5)
holds.
(ii) If (6.8) holds and the L2 (F0 ) closure C∗ of C∗ in (6.2) is convex,
then there exist a random variable ξ̃0 and certain normal variables Z(h) ∼
N (0, VarF0 (h)) such that
 √     
n{Sen /n − An (φ∗ ) − µ(F0 )} D ξ̃0
L ; F0 −→ L ; F0
Zn (uF0 + h − uF0 ) Z(uF0 + h − uF0 )
and ξ̃0 is independent of Z(uF0 + h − uF0 ) for all h ∈ H∗ . In particular, for
h = φ∗,0 ,
L(ξ0 ; F0 ) = L(Z(φ∗ − uF0 ); F0 ) ⋆ L(ξ̃0 ; F0 ).
(iii) Suppose EFt u2 (X; Ft ) is bounded for all {Ft } ∈ C. Then, ψ∗ = φ∗,0 +
u∗ is the efficient influence function for the estimation of µ(F ), that is, (6.6)
holds with ψ = ψ∗ , where u∗ is the projection of uF0 to H∗ . Consequently,
(1.6) holds.

Remark 6.1. Based on Theorem 6.1(i) and (ii), Sbn is said to be locally
asymptotically efficient if (1.5) holds. Note that in Theorem 6.1(ii), ξ̃0 = 0
iff (1.5) holds.

Remark 6.2. In the proof of Theorem 6.1(iii), we show that (6.5) and
(6.6) are equivalent under the condition that EFt u2 (X; Ft ) = O(1) for all
{Ft } ∈ C.

Remark 6.3. For the estimation of µ(F ), that is, u(x, ϑ, F ) ≡ µ(F )
as a special case of Theorem 6.1(ii), a standard proof of the convolution
theorem uses analytic continuation along lines passing through the origin in
the tangent space, and as a result, C ∗ is often assumed to be a linear space.
In the proof of Theorem 6.1(ii), analytic continuation is used along arbitrary
16 C.-H. ZHANG

lines across C ∗ , so that only the convexity of C ∗ is needed as in [31], pages


366–367. Rieder [20] showed that, in the case of convex C ∗ , the projections
of scores to C ∗ (not to H∗ ) are useful in the context of one-sided confidence.

6.5. Finite-dimensional models. Let F = {Fτ , τ ∈ T } with an open Eu-


clidean parameter space T . We shall extend the results in Section 2.1 to
general sums (1.2). Suppose dFτX = fτX dν exists and is differentiable in the
sense of (6.1), that is,
Z
1/2
(6.9) (fτ +∆ − fτ1/2 − ∆ρτ )2 dν = o(k∆k2 ), τ ∈T.

Let Eτ ≡ EFτ , Iτ ≡ Covτ (ρτ (X)), uτ ≡ u(X, θ; Fτ ) and uτ ≡ u(X; Fτ ).

Theorem 6.2. (i) Suppose (6.9) holds, Iτ is of full-rank, L(uτ ; Fτ ) is


continuous in τ in the weak topology, Eτ u2τ is continuous, Eτ {uτ +∆ −uτ }2 →
0 as ∆ → 0, Eτ u2τ is locally bounded, and µ′ (τ ) exists. Then (2.4) gives the
efficient influence function for the estimation of (1.2) with γτ = µ′ (τ ) −
Eτ uτ ρτ , and (1.5) and (1.6) hold.
(ii) Suppose (2.6), (2.7) and conditions of (i) hold. Then (2.8) holds for
the plug-in estimator (2.5) with the γτ in (i). In particular, (2.5) is asymp-
totically efficient under Pτ iff γτ κτ = γτ Iτ−1 ρτ .

Remark 6.4. Comparing Theorem 6.2 with Theorems 2.1 and 2.2, we
see that (6.9) is weaker than (2.2) and (1.2) is more general than (1.1), while
stronger conditions are imposed on uτ in Theorem 6.2.

7. Proofs. We prove Theorems 6.1, 2.1, 2.2, 6.2, and 2.3–2.5 in this sec-
tion.

Lemma 7.1. Suppose (2.2) holds. Let (X, θ) ∼ Ft under Pτ +at and ρ =
at ρτ for a vector a, where ρτ is as in (2.3). Then (6.1) holds with PF0 = Pτ .

Proof. Let gt ≡ gτ +at and ∆ = at. The lemma follows from the expan-
sion
√  1/2   t 
ft − 1 ρ 1 g − 1 1/2 a ρeτ
− = 1/2 E0 t (gt + 1) X = x − E0 X =x .
t 2 f +1 t 2
t

The uniform integrability of the square of the right-hand side (i.e., the first
term) under f0 (x) follows from the inequality E0 [gt |X] ≤ ft (X)I{f0 (X) >
0}. We omit the details. 
ESTIMATING SUMS OF RANDOM VARIABLES 17

Lemma 7.2. Suppose (6.1) holds and X ∼ FtX under Pt , 0 ≤ t ≤ 1. Let


µt ≡ Et ht (X) for a certain Borel ht . If Et h2t (X) = O(1) and ht → h0 in
L2 (P0 ), then
µt − µ0 = E0 {ht (X) − h0 (X)} + tE0 ρ(X)h0 (X) + o(t) as t → 0.

Proof. Let Bt be the support sets of dPt (X) − ft (X) dP0 (X). By (6.1)
and the boundedness of Et h2t , Et ht − E0 ft ht = Et ht IBt = O(1)(Et h2t )1/2 ×
1/2
Pt (Bt ) = o(t). Thus,
(7.1) µt − µ0 = Et ht − E0 h0 = E0 (ft − 1)ht + E0 (ht − h0 ) + o(t)
√ √
as t → 0+. Since ( ft − 1)/t → ρ/2 in L2 (P0 ) and E0 {( ft + 1)ht }2 = O(1),
p p
E0 (ft − 1)ht /t = E0 [t−1 ( ft − 1)( ft + 1)ht ] → E0 h0 ρ.
This and (7.1) complete the proof. 

Proof of Theorem 6.1. Let Fn ≡ Fc/√n , ξn ≡ n{Sen /n − Sn (Fn )/n},
√ √
ξn′ ≡ n{Sen /n − An (uFn )}, ξn′′ ≡ nAn (wFn ) and Z ′′ = Z(wF0 ). Then ξn =
ξn′ + ξn′′ and ξn′ depend on {Xj } only. By (6.4), wF2 n under PFn are uniformly
D
integrable and L(wFn ; Fn ) −→ L(wF0 ; F0 ) as n → ∞. Thus, by the Lindeberg
central limit theorem and the weak law of large numbers,
(7.2) EFn [exp(itξn′′ )|{Xj }] → EF0 exp(itZ ′′ )
in probability for all t. Since ξn′ depends on {Xj } only, this and (6.8) imply
EFn exp(itξn′ )E exp(itZ ′′ ) = EFn exp(itξn′ ) exp(itξn′′ ) + o(1) → EF0 exp(itξ0 ).
Thus, since E exp(itZ ′′ ) 6= 0 for all t,
( n
) !
X D
(7.3) L n−1/2 Sen − u(Xj ; Fc/√n ) ; Fc/√n = L(ξn′ ; Fn ) −→ L(ξ0′ ; F0 )
j=1

for a certain variable ξ0′ independent of c > 0 and the curve {Ft } ∈ C.
′ ≡ √n{S
Define ξn,0 en /n−An (uF )}. By (6.3) and (6.5), ξ ′ −ξ ′ = √nAn ×
0 n,0 n
(uFn − uF0 ) = EF0 (uFn − uF0 ) + oP (1) → cEφ(X)ρ(X) in probability under
PF0 . Thus, as in [3], pages 24–26, by (7.3) and the LAN from (6.1) and (6.2),
(7.4) EF0 exp(itξ0′ + zZ(ρ)) = exp[itzEF0 φρ + z 2 EF0 ρ2 /2]EF0 exp(itξ0′ )
for all ρ ∈ C∗ and complex z. Here Z(h) are constructed so that (ξn,0 ′ , Z (h))
n

converges jointly in distribution to (ξ0 , Z(h)) for all h ∈ L2 (F0 ). Differenti-
ating (7.4) in t at t = 0 and then in z at z = 0, we find
(7.5) EF0 ξ0′ Z(h) = EF0 φ(X)h(X) = EF0 Z(φ∗,0 )Z(h)
18 C.-H. ZHANG

for all scores h = ρ, ρ ∈ C∗ , and then for all h ∈ H∗ by (6.2). Since φ∗,0 ∈ H∗ ,
ξ0′ − Z(φ∗,0 ) and Z(φ∗,0 ) are orthogonal in L2 (F0 ). This proves (i), since
ξ0′ and Z(φ∗,0 ) are both independent of Z ′′ by (7.2) and Z(φ∗,0 ) + Z ′′ =
Z(φ∗ − uF0 ).
Now, suppose C∗ is convex in L2 (F0 ). By continuity extension, (7.4) holds
for all ρ ∈ C∗ and complex z. Let ρj ∈ C∗ . Since (7.4) holds for ρ = sρ1 + (1 −
s)ρ2 , 0 ≤ s ≤ 1, with both sides being analytic in s, by analytic continuation
it holds for ρ = sρ1 +(1−s)ρ2 for all real s. Thus, (7.4) holds for all complex z
and
(7.6) ρ ∈ H0 ≡ {sρ1 + (1 − s)ρ2 : ρj ∈ C∗ , −∞ < s < ∞}.
Let H e be the linear span of a set of finitely many members of C∗ . Let ρ1
be a fixed interior point of H e ∩ C∗ and ρ2 ∈ H e with kρ2 − ρ1 k = δ0 . For
sufficiently small δ0 > 0, ρ2 ∈ C∗ for all such ρ2 , so that He ⊆ H0 . Thus, H0 is
a linear space and H∗ is the closure of H0 . It follows that (7.4) holds for all
ρ ∈ H∗ and complex z. As in [3], pages 25–26, this implies the independence
of ξ0′ − Z(φ∗,0 ) and {Z(h) : h ∈ H∗ }. Since {ξ0′ , Z(h), h ∈ H∗ } is independent
of Z ′′ = Z(uF0 − uF0 ) by (7.2), the conclusions of part (ii) hold with ξ̃0 =
ξ0′ − Z(ψ∗,0 ).
The proof of part (iii) follows easily from Lemma 7.2 with ht = uFt , which
gives
{µ(Ft ) − µ(F0 )}/t − EF0 {uFt − uF0 }/t → EF0 uF0 ρ = EF0 u∗ ρ.
It follows that (6.5) and (6.6) are equivalent under EFt u2 (X; Ft ) = O(1),
with ψ = ψ∗ = u∗ + φ∗,0 , by (1.6) and the definition of φ∗ . The proof is
complete. 

Proof of Theorem 2.1. The proof is similar to that of Theorem 6.1(i),


so we omit certain details. By (2.2), ξ0 is independent of Z(ρeτ ) under Pτ .
Since Eτ u2 < ∞, (7.2) holds for fixed Fn = Fτ , so that ξ0 = ξ0′ + Z(uτ − u)
as a sum of independent variables. Let Z(hτ ) be the projection of ξ0′ to
{Z(h), h ∈ L2 (Fτ )} in L2 (Pτ ) and vτ = hτ + uτ . Then Varτ (ξ0 ) ≥ Eτ (vτ − u)2
and Eτ (vτ − u)ρeτ = 0. Since ξ0′ is the limit of variables dependent on {Xj }
only, hτ and vτ depend on X only.
Since Eτ u2 gτ,∆ (θ) ≤ Eτ +∆ u2 = O(1), by (2.2) and Lemma 7.2 with ht =
h0 = u(x, ϑ), µτ +∆ − µτ ≈ ∆t Eτ uρeτ = ∆t Eτ ψ∗,τ (X)ρτ (X), where ψ∗,τ ≡
ρtτ Iτ−1 Eτ uρeτ . It follows that 0 = Eτ (vτ − u)ρeτ = Eτ (vτ ρeτ − ψ∗,τ ρτ ) = Eτ (vτ −
ψ∗,τ )ρτ . Thus, Eτ (vτ − uτ )ρτ = Eτ (ψ∗,τ − u∗,τ )ρτ with u∗,τ ≡ ρtτ Iτ−1 Eτ uτ ρτ .
Since ψ∗,τ − u∗,τ is linear in ρτ , Z(vτ − uτ − (ψ∗,τ − u∗,τ )) is independent
of Z(ψ∗,τ − u∗,τ ). Thus, Varτ (vτ − uτ ) ≥ Varτ (ψ∗,τ − u∗,τ )) and Varτ (ξ0 ) ≥
Varτ (vτ −uτ )+Varτ (uτ −u) ≥ Varτ (φ∗,τ −u) by (2.4). The proof is complete.

ESTIMATING SUMS OF RANDOM VARIABLES 19

Proofs of Theorems 2.2 and 6.2. Theorem 6.2(i) follows from The-
orem 6.1 and Remark 6.2. Let µ(t; τ ) = Eτ ut (X). By Lemma 7.2, µ′ = Eτ uρe
in Theorem 2.2 and γτ = (∂/∂t)µ(τ ; τ ) in both theorems. Simple expansion
of (2.5) via (2.7) yields
Sbn
= An (uτ ) + {µ(τbn ; τ ) − µ(τ ; τ )} + oPτ (n−1/2 )
n
= An (uτ + γτ κτ ) + oPτ (n−1/2 ),
which implies (2.8). Note that γτ (κτ − κ∗,τ ) is orthogonal to uτ − uτ + γτ κ∗,τ .
The proof is complete. 

Proofs of Theorems 2.3, 2.4 and 2.5. Let Gt ≡ (1 − t)G0 + tG,


ft ≡ fGt and Et ≡ EGt , t > 0. By (2.9), (6.1) holds with ρ = fG /f0 − 1. Since
EG u2 < ∞, u2 are uniformly integrable under Pt , so that (6.4) holds. Since
f0 /ft ≤ 1/(1 − t), {u2t , 0 ≤ t ≤ 1/2} are uniformly integrable under E0 , so
that (6.3) holds. Moreover,
   
fG fG
(7.7) t−1 E0 {ut − u0 } = E0 (uG − u0 ) → E0 (uG − u0 ) .
ft f0
Suppose there exists a regular estimator of (1.1). Let ξ0′ be as in (7.5) and
let Z(v − u0 ) be the projection of ξ0′ to {Z(h), h ∈ L2 (f0 )} as in the proof of
Theorem 2.1. It follows from (7.7) and the argument leading to (7.5) that
 
fG
E0 (v − u0 )(fG /f0 − 1) = E0 Z(v − u0 )Z(ρ) = E0 (uG − u0 ) ,
f0
which implies EG v − E0 v + E0 u = EG u. Since ξ0′ does not depend on the
choice of G ∈ GG0 , v ∈ VG0 . By the Lindeberg central limit theorem, EG0 v 2 <
∞ and v ∈ VG0 imply L(Zn (v − u); Pc/√n ) → L(Z(v − u); P0 ), so that Vn in
(2.15) is regular at G0 for all v ∈ VG0 . If v is a limit point of VG0 in L2 (f0 ),
Vn is also a regular estimator of Sn at P0 , so that VG0 is closed in L2 (f0 ).
This completes the proof of Theorem 2.3.
The proof of Theorem 2.4 is similar to those of Theorems 2.2 and 6.2
but simpler. We note that EG0 (vG − vG0 ) = 0. Finally, Theorem 2.5 follows
from the fact that VG contains a single function v due to the completeness
of exponential families. The proofs are complete. 

REFERENCES
[1] Benedetti, R. and Franconi, L. (1998). Statistical and technological solutions
for controlled data dissemination. In Pre-proceedings of New Techniques and
Technologies for Statistics, Sorrento 1 225–232 .
[2] Bethlehem, J., Keller, W. and Pannekoek, J. (1990). Disclosure control of mi-
crodata. J. Amer. Statist. Assoc. 85 38–45.
20 C.-H. ZHANG

[3] Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Effi-
cient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ.
Press, Baltimore. MR1245941
[4] Bunge, J. and Fitzpatrick, M. (1993). Estimating the number of species: A review.
J. Amer. Statist. Assoc. 88 364–373.
[5] Chao, A. (1984). Nonparametric estimation of the number of classes in a population.
Scand. J. Statist. 11 265–270. MR0793175
[6] Chao, A. and Bunge, J. (2002). Estimating the number of species in a stochastic
abundance model. Biometrics 58 531–539. MR1925550
[7] Clauset, A. and Moore, C. (2003). Traceroute sampling makes random graphs
appear to have power law degree. Preprint.
[8] Coates, A., Hero, A., Nowak, R. and Yu, B. (2002). Internet tomography. IEEE
Signal Processing Magazine 19(3) 47–65.
[9] Darroch, J. N. and Ratcliff, D. (1980). A note on capture–recapture estimation.
Biometrics 36 149–153. MR0672144
[10] Duncan, G. T. and Pearson, R. W. (1991). Enhancing access to microdata while
protecting confidentiality: Prospects for the future (with discussion). Statist.
Sci. 6 219–239.
[11] Engen, S. (1974). On species frequency models. Biometrika 61 263–270. MR0373217
[12] Faloutsos, M., Faloutsos, P. and Faloutsos, C. (1999). On power-law relation-
ships of the Internet topology. In Proc. ACM SIGCOMM 1999 251–262. ACM
Press, New York.
[13] Fisher, R. A., Corbet, A. S. and Williams, C. B. (1943). The relation between
the number of species and the number of individuals in a random sample of an
animal population. J. Animal Ecology 12 42–58.
[14] Good, I. J. (1953). The population frequencies of species and the estimation of
population parameters. Biometrika 40 237–264. MR0061330
[15] Govindan, R. and Tangmunarunkit, H. (2000). Heuristics for Internet map dis-
covery. In Proc. IEEE INFOCOM 2000 3 1371–1380. IEEE Press, New York.
[16] Lakhina, A., Byers, J., Crovella, M. and Xie, P. (2003). Sampling biases in
IP topology measurements. In Proc. IEEE INFOCOM 2003 1 332–341. IEEE
Press, New York.
[17] Pfanzagl, J. (with the assistance of W. Wefelmeyer) (1982). Contributions to a
General Asymptotic Statistical Theory. Lecture Notes in Statist. 13. Springer,
New York. MR0675954
[18] Polettini, S. and Seri, G. (2003). Guidelines for the protection of so-
cial micro-data using individual risk methodology. Application within µ-
Argus version 3.2, CASC Project Deliverable No. 1.2-D3. Available at
neon.vb.cbs.nl/casc/deliv/12D3_guidelines.pdf.
[19] Rao, C. R. (1971). Some comments on the logarithmic series distribution in the
analysis of insect trap data. In Statistical Ecology (G. P. Patil, E. C. Pielou and
W. E. Waters, eds.) 1 131–142. Pennsylvania State Univ. Press, University Park.
MR0375600
[20] Rieder, H. (2000). One-sided confidence about
functionals over tangent cones. Available at
www.uni-bayreuth.de/departments/math/org/mathe7/RIEDER/pubs/cc.pdf.
[21] Rinott, Y. (2003). On models for statistical disclosure risk esti-
mation. Working paper no. 16, Joint ECE/Eurostat Work Ses-
sion on Data Confidentiality, Luxemburg, 2003. Available at
www.unece.org/stats/documents/2003/04/confidentiality/wp.16.e.pdf.
ESTIMATING SUMS OF RANDOM VARIABLES 21

[22] Robbins, H. (1977). Prediction and estimation for the compound Poisson distribu-
tion. Proc. Natl. Acad. Sci. U.S.A. 74 2670–2671. MR0451479
[23] Robbins, H. (1980). An empirical Bayes estimation problem. Proc. Natl. Acad. Sci.
U.S.A. 77 6988–6989. MR0603064
[24] Robbins, H. (1988). The u, v method of estimation. In Statistical Decision Theory
and Related Topics IV (S. S. Gupta and J. O. Berger, eds.) 1 265–270. Springer,
New York. MR0927106
[25] Robbins, H. and Zhang, C.-H. (1988). Estimating a treatment effect under biased
sampling. Proc. Natl. Acad. Sci. U.S.A. 85 3670–3672. MR0946190
[26] Robbins, H. and Zhang, C.-H. (1989). Estimating the superiority of a drug to a
placebo when all and only those patients at risk are treated with the drug. Proc.
Natl. Acad. Sci. U.S.A. 86 3003–3005. MR0995401
[27] Robbins, H. and Zhang, C.-H. (1991). Estimating a multiplicative treatment effect
under biased allocation. Biometrika 78 349–354. MR1131168
[28] Robbins, H. and Zhang, C.-H. (2000). Efficiency of the u, v method of estimation.
Proc. Natl. Acad. Sci. U.S.A. 97 12,976–12,979. MR1795617
[29] Sampford, M. R. (1955). The truncated negative binomial distribution. Biometrika
42 58–69. MR0072401
[30] Spring, N., Mahajan, R. and Wetherall, D. (2002). Measuring ISP topologies
with rocketfuel. In Proc. ACM SIGCOMM 2002 133–145. ACM Press, New
York.
[31] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press.
MR1652247
[32] Vardi, Y. (1996). Network tomography: Estimating source-destination traffic inten-
sities from link data. J. Amer. Statist. Assoc. 91 365–377. MR1394093

Department of Statistics
Rutgers University
Hill Center
Busch Campus
Piscataway, New Jersey 08854-8019
USA
e-mail: czhang@stat.rutgers.edu

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy