APS 4 Report
APS 4 Report
Shuvayan Banerjee
Mathematics Department
IITB-Monash Research Academy
Contents
1 Introduction 3
5 Conclusion 46
1
6 Appendix 47
6.1 Proofs of Lemmas and Theorems of Chapter 2 . . . . . . . . . . . . . . . . . . . . . . 47
6.1.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1.4 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1.5 Convex conjugates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 Proofs of Theorems and Lemmas of Chapter 3 . . . . . . . . . . . . . . . . . . . . . . 50
6.2.1 Proofs of Theorems and Lemmas on Robust Lasso . . . . . . . . . . . . . . . . 50
6.2.2 Proofs of Theorems and Lemmas on Debiased Lasso . . . . . . . . . . . . . . . 56
6.2.3 Lemma on properties of A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2.4 Proofs of Simultaneous Confidence Intervals . . . . . . . . . . . . . . . . . . . . 70
6.2.5 Some useful lemmas for Drlt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3 Proofs of Theorems and Lemmas of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . 77
6.3.1 Proof of Theorem 10: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.2 Some useful Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2
Chapter 1
Introduction
In high-dimensional sparse regression, where the number of predictors significantly exceeds the number
of observations, the Lasso (Least Absolute Shrinkage and Selection Operator) is a widely used method
for variable selection and estimation. By incorporating an ℓ1 regularization term, Lasso promotes
sparsity in the estimated coefficients, enabling effective performance for sparse signal vectors even if
the number of predictors far exceeds the number of samples. The Lasso estimator has well-established
theoretical guarantees for signal and support recovery [25]. Despite its strengths, a well-recognized
limitation of Lasso is its tendency to produce biased estimates. This bias arises from the shrinkage
imposed by the ℓ1 penalty. Consequently, the bias compromises estimation accuracy and impedes
statistical inference tasks such as construction of confidence intervals or hypothesis tests. These
challenges are especially pronounced in high-dimensional regimes, where traditional inference tools
fail due to high dimensionality.
[31] introduced a simple yet powerful approach that constructs debiased Lasso estimates using an
“approximate inverse” of the sample covariance matrix. Their method avoids direct precision matrix
estimation and instead employs an optimization framework to compute a debiasing matrix M that
corrects for bias while ensuring asymptotic normality of the debiased estimates.
In Chapter 2, we build upon the technique of [31], addressing one of its primary computational
bottlenecks: the optimization step required to compute the approximate inverse M . In Chapters 3
and 4, we attempt to identify and correct Group membership specification errors in high dimensional
sparse regression specifically in the field of Group Testing.
Notations: Throughout this paper, In denotes the identity matrix of size n × n. We use the nota-
tion [n] ≜ {1, 2, · · · , n} for n ∈ Z+ . Given a matrix A, its ith row is denoted as ai. , its j th column
is denoted as a.j and the (i, j)th element is denoted by aij . The ith column of the identity matrix
will be denoted as ei . For any vector z ∈ Rn and index set S ⊆ [n], we define z S ∈ Rn such that
∀i ∈ S, (zS )i = zi and ∀i ∈ / S, (zS )i = 0. S c denotes the complement of set S. We define the entrywise
l∞ norm of a matrix A as |A|∞ ≜ max|aij |. Consider two real-valued random sequences xn and rn .
i,j
Then, we say that xn is oP (rn ) if xn /rn → 0 in probability, i.e., limn→∞ P (|xn /rn | ≥ ϵ) = 0 for any
ϵ > 0. Also, we say that xn is OP (rn ) if xn /rn is bounded in probability, i.e., for any ϵ > 0 there
exist m, n0 > 0, such that P (|xn /rn | < m) ≥ 1 − ϵ for all n > n0 . For a positive integer p, we use the
shorthand [p] = {1, 2, . . . , p}. For a vector w ∈ Rm , we denote the ℓq -norm by ∥w∥q := ( m q 1/q
P
i=1 |wi | )
if 1 ≤ q < ∞ and the ℓ∞ -norm by ∥w∥∞ := maxi∈[m] |wi |.
3
Chapter 2
To address the limitations of inference in high dimensional sparse regression, several methods have
been developed to “debias” the Lasso estimator, allowing for valid statistical inference even in
high-dimensional settings. Notably, [56] introduced a decorrelated score-based approach, leveraging
the Karush–Kuhn–Tucker (KKT) conditions of the Lasso optimization problem to construct bias-
corrected estimators. Their framework relies on precise estimation of the precision matrix (inverse
covariance matrix), which can be computationally challenging and sensitive to regularization choices.
Similarly, [51] proposed a methodology rooted in node-wise regression, where each variable is regressed
on the remaining variables to estimate the precision matrix. While effective, this method is computa-
tionally intensive. This may limit its applicability, particularly in scenarios where the design matrix
lacks favorable properties like sparsity of the rows of the precision matrix.
[31] introduced a simple yet powerful approach that constructs debiased Lasso estimates using an
“approximate inverse” of the sample covariance matrix. Their method avoids direct precision matrix
estimation and instead employs an optimization framework to compute a debiasing matrix M that
corrects for bias while ensuring asymptotic normality of the debiased estimates. A key advantage of
this method is its applicability for random sub-Gaussian sensing matrices, enabling valid inference
across a broad range of high-dimensional applications.
In this chapter, we build upon the technique of [31], addressing one of its primary computational
bottlenecks: the optimization step required to compute the approximate inverse M . By reformulating
the problem to work directly with the “weight matrix” W := AM ⊤ , we entirely eliminate the need
to solve this optimization problem in many practical cases. Our proposed reformulation leverages the
insight that the theoretical guarantees of the debiased Lasso estimator depend on the product AM ⊤
rather than the individual debiasing matrix M . By shifting the focus to the “weight matrix” W :=
AM ⊤ , we simplify the optimization problem while retaining all theoretical properties of the original
framework. Under certain deterministic assumptions on the coherence of A, we provide a simple,
exact, closed form optimal solution for the optimization problem to obtain W . We show that these
assumptions are satisfied with high probability for the ensembles of sensing matrices considered in [31],
under the additional condition that the elements of the rows of A are uncorrelated. In practice, sensing
matrices with uncorrelated entries are commonly used in many applications [17,37] and are also widely
used in many theoretical results in sparse regression [25]. This closed form solution eliminates the
computationally intensive optimization step required to compute M , significantly improving runtime
efficiency. It is applicable in many natural situations, including sensing matrices with i.i.d. isotropic
sub-Gaussian rows (such as i.i.d. Gaussian, or i.i.d. Rademacher entries).
y = Aβ ∗ + η, (2.1)
4
that consists of independent and identically distributed elements drawn from N (0, σ 2 ), where σ 2 is
the noise variance.
The Lasso estimate β̂λ of the sparse signal β ∗ is defined as the solution to the following opti-
mization problem:
1
β̂λ := arg min ∥y − Aβ∥22 + λ∥β∥1 , (2.2)
β 2n
where λ > 0 is a regularization parameter chosen appropriately. The Lasso estimator is known to be
a consistent estimator of the sparse signal β ∗ under the condition that the sensing matrix A satisfies
the Restricted Eigenvalue Condition (REC) [25, Chapter 11].
Here M is an approximate inverse of the rank deficient matrix Σ̂ := A⊤ A/n, computed by solving
the convex optimization problem given in Algorithm 2. The theoretical properties of β̂d from [31] are
applicable to a sensing matrix A with the following properties:
D1: The rows a1. , a2. , . . . , an,. of matrix A are independent and identically distributed zero-mean
sub-Gaussian random vectors with covariance Σ := E[ai. a⊤ i. ]. Furthermore, the sub-Gaussian
norm κ := ∥Σ−1/2 ai. ∥ψ2 1 is a finite positive constant.
D2: There exist positive constants 0 < Cmin ≤ Cmax , such that the minimum and maximum eigen-
values σmin (Σ), σmax (Σ) of Σ satisfy 0 < Cmin ≤ σmin (Σ) ≤ σmax (Σ) ≤ Cmax < ∞.
Theorem 7(b) of [31] shows that the optimization problem in (2.4) to obtain M is feasible with q high
√ 2
q
Cmax log p
probability, for sensing matrices satisfying properties D1 and D2, as long as µ > 4 3eκ Cmin n .
√
q
log p
If µ is O n and n is ω((s log p)2 ), then Theorem 8 of [31] shows that ∀j ∈ [p], n(β̂dj − βj∗ ) is
asymptotically zero-mean Gaussian with variance σ 2 m⊤
.j Σ̂m.j .
minimize m⊤
.j Σ̂m.j
subject to ∥Σ̂m.j − ej ∥∞ ≤ µ, (2.4)
1
The sub-Gaussian norm of a random variable x, denoted by ∥x∥ψ2 , is defined as ∥x∥ψ2 := supq≥1 q −1/2 (E|x|q )1/q .
For a random vector x ∈ Rn , its sub-Gaussian norm is defined as ∥x∥ψ2 := supy∈S n−1 ∥y ⊤ x∥ψ2 , where S n−1 denotes
the unit sphere in Rn .
5
2.2 Re-parameterization of the Debiased LASSO
The debiased Lasso estimator in (3.10) can be rewritten in terms of the variable W := AM ⊤ as:
1 ⊤
W (y − Aβ̂λ ).
β̂d = β̂λ + (2.5)
n
The re-parameterization does not affect the debiasing procedure introduced in [31]. Thus, any theo-
retical guarantees established using M extend to those using W .
We now produce a reformulated problem in (2.6) using W , and show that it is equivalent to the
original optimization problem in Algorithm 2. Using the relationship W = AM ⊤ , we can rewrite
m.j as w.j := Am.j . Making this substitution, the objective in (2.4) becomes m⊤ 1 ⊤
.j Σ̂m.j = n w .j w .j
and the constraint ∥Σ̂m.j − ej ∥∞ ≤ µ (where ej is the jth column of the identity matrix) becomes
1 ⊤
n A w .j − ej ∞ ≤ µ. This change of variables suggests the following reformulated optimization
problem (2.6) for the j th column of W :
Pj := minimize 1 ⊤
n w .j w .j
1 ⊤
subject to n A w .j − ej ∞
≤ µ. (2.6)
In fact, the j th reformulated problem (2.6) and the j th original problem (2.4) are equivalent in the
following sense: If m.j is feasible for (2.4) then w.j := Am.j is feasible for (2.6) and n1 w⊤ .j w .j =
⊤ †
m.j Σ̂m.j . Conversely, suppose that w.j is feasible for (2.6). If A is a pseudo-inverse of A, then
m.j := A† w.j is feasible for (2.4) since Σ̂m.j = n1 A⊤ Am.j = n1 A⊤ w.j . Moreover, n1 w⊤ .j w .j =
⊤
m.j Σ̂m.j , so both have the same objective values, establishing that (2.4) and (2.6) are equivalent.
This reformulation provides an equivalent separable problem for each column of W , maintaining all
theoretical guarantees while simplifying the representation of the debiasing procedure.
The reformulated problem (2.6) has a unique optimal solution because the objective function is
strongly convex with convex constraints. In contrast, the original problem (2.4) does not have a unique
solution. Indeed if m.j is any solution to (2.4), then we can add to it any element of the nullspace of
A to obtain another solution to (2.4).
Remarks:
1. This theorem eliminates the requirement to execute an iterative optimization algorithm to obtain
W (or an iterative optimization algorithm to obtain M ). This is because given A, one can
directly implement the optimal solution of Alg. 2 in the form (2.7) for all j ∈ [p]. This speeds up
the implementation of the debiasing of Lasso for the ensemble of sensing matrices that satisfy
the conditions of Theorem 1.
6
ν(A)
2. The condition L(A)+ν(A) ≤ µ < 1 can be satisfied whenever L is strictly positive, i.e., the column
norms of A are strictly positive. Given the sensing matrix A, the quantity ν/(ν + L) can be
computed exactly.
3. For sensing matrices whose column norms are equal to n, such as random Rademacher, row
sub-sampled DFT or row sub-sampled Hadamard, we clearly have L = 1. Hence, the condition
ν
for Theorem 1 becomes 1+ν ≤ µ < 1.
4. For sensing matrices with (almost surely) unequal column norms such as random sub-Gaussian
matrices, we need a condition that L is strictly positive, i.e., P (L ≥ c Cmin ) ≥ 1 − 2/p for
ν
some constant c > 0. This will be shown in Theorem 3. This implies L+ν ≤ c Cνmin with high
probability. Hence, in this case, it is sufficient to choose µ ≥ c Cνmin .
ν(A)
5. For sensing matrices with equal column norms, the condition on µ given by L(A)+ν(A) ≤µ<1
is a necessary and sufficient condition for the closed form expression in (2.7) to be optimal.
ν(A)
However, for sensing matrices with unequal column norms such as Gaussian, L(A)+ν(A) ≤µ<1
is only a sufficient condition. This is empirically illustrated in Subsection 2.3.2.
q
log p
Recall that as per Theorem 8 of [31], if µ is O n and n is ω((s log p)2 ), then ∀j ∈
√
[p], n(β̂dj −βj∗ ) is asymptotically zero-mean Gaussian when the elements of η are drawn from N (0, σ 2 ).
q
For specific classes of random matrices, we now show, in Theorem 2, that ν+L ≤ c0 logn p with high
ν
probability
q for some
constant c0 . This implies that for these random sensing matrices, the choice
log p
µ := O n ensures both the following: (i ) asymptotic debiasing for β̂d from (2.5) when n is
ν
ω((s log p)2 ) (see Theorem 8 of [31]), and (ii ) fulfillment of the sufficient condition L+ν ≤ µ for the
q
debiasing matrix W to be computed in closed-form. If the relation ν+L ν
≤ c0 logn p is to be satisfied
with high probability, we need an additional (mild) assumption on A as defined below:
D3: Σ, as defined in D1, is a diagonal matrix i.e., the elements of the rows of A are uncorrelated.
Theorem 2 Let A be a n × p dimensional matrix with independent and identically distributed zero-
mean sub-Gaussian rows with uncorrelated entries and sub-Gaussian norm κ := ∥Σ−1/2 ai. ∥ψ2 , where
n < p and Σ := E[ai. ai. ⊤ ]. Let L and ν be as defined in Theorem 1. For any constant c ∈
√ √ 2 4
( 2/(1 + 2), 1), if A obeys properties D1, D2 and D3 and n ≥ C4C max κ
2 (1−c)2 log p, then
min
r !
√ κ2 Cmax
ν log p 2 1
P ≤2 2 ≥1− + 2 . (2.8)
ν+L c Cmin n p p
√ 2 q
log p
Furthermore, the choice µ := 2 2 κc CCmax
min n ensures that the optimal debiasing matrix W is given
by (2.7) with high probability.
Remarks:
√ √ 2
4Cmax κ4
1. The condition that c ∈ ( 2/(1 + 2), 1) ensures that when n ≥ 2
Cmin (1−c)2
log p and µ :=
√ 2 q
log p
2 2 κc CCmax
min n , then we have µ < 1.
√ κ2 C q log p
2. For the choice of µ := 2 2 c Cmax
min n , the optimization problem in (2.6) is feasible with high
probability under the assumptions D1 and D2 (as per Theorem 7b of [31]). Additionally, if
A satisfies assumption D3, then W has the closed form solution as given in (2.7) with high
probability.
7
The proof of Theorem 2 is given in Appendix 6.1.2. The proof utilizes the following results: Theorem 3
and Lemma 1. In Theorem 3, we show that for an ensemble of sensing matrices satisfying assumptions
D1, D2 and D3, the parameter L is greater than c Cmin with high probability for some constant c.
Theorem 3 Let A be a n×p matrix with independently and identically distributed sub-Gaussian rows,
where n < p. Consider L as defined in Theorem 1. For any constant c ∈ (0, 1) and κ := ∥Σ−1/2 ai. ∥ψ2 ,
2 4
if A satisfies properties D1 and D2 and n ≥ C4C max κ
2 (1−c)2 log p, then
min
2
P (L ≥ c Cmin ) ≥ 1 − . (2.9)
p
The proof of Theorem 3 is given in Appendix 6.1.3. In the upcoming Lemma we provide a high
probability upper bound on ν for sensing matrices with independent and identically distributed zero-
mean sub-Gaussian rows with uncorrelated entries.
Lemma 1 Let A be a n × p dimensional matrix satisfying assumptions D1, D2 and D3 and sub-
Gaussian norm κ := ∥Σ−1/2 ai. ∥ψ2 . Define ν as in Theorem 1. Then
r !
√ 2 log p 1
P ν(A) ≤ 2 2Cmax κ ≥ 1 − 2. (2.10)
n p
Signal Generation: For our simulations, we chose our design matrix A to have elements drawn
independently from the standard Gaussian distribution. We set the size of the signal to be p = 500. We
synthetically generated signals (i.e., β ∗ ) with p = 500 elements in each. The non-zero values of β ∗ were
drawn i.i.d. from U (50, 1000) and placed
Pn at randomly chosen indices. We set s := ∥β ∗ ∥0 = 10 and the
noise standard deviation σ := 0.05 i=1 |ai. β ∗ |/n. We varied n ∈ {200, 250, 300, 350, 400, 450, 500}.
We choose µ = ν/(ν + L) where ν and L are computed exactly given the sensing matrix A.
Sensitivity and Specificity Computation: Let us denote the debiased Lasso estimates obtained
using a matrix W by β̂d,W . We know that asymptotically β̂d,W (j) ∼ N(βj∗ , σ 2 W⊤ .j W.j ) for all j ∈ [p].
Using this result, β̂d,W was binarized to create a vector b̂W in the following way: For all j ∈ [p], we
set b̂W (j) := 1 if the value of β̂Wj was such that the the hypothesis H0,j : βj∗ = 0 was rejected against
the alternate H1,j : βj∗ ̸= 0. b̂W (j) was set to 0 otherwise. Note that for the purpose of our simulation,
we either have W = Wo or W = We . The binary vectors corresponding to these choices of W are
respectively denoted by b̂Wo and b̂We .
A ground truth binary vector b∗ was created such that b∗j := 1 at all locations j where βj∗ ̸= 0
and b∗j := 0 otherwise. Sensitivity and specificity values were computed by comparing corresponding
8
sensitivity specificity time (in s)
∥W o −W e ∥F
n Wo We Wo We Wo We ∥W e ∥F
200 0.6411 0.6411 0.8455 0.8455 3.62 × 102 1.14 × 10−3 5.37 × 10−10
250 0.7047 0.7047 0.8942 0.8942 4.33 × 102 1.87 × 10−3 5.57 × 10−8
300 0.7988 0.7988 0.9452 0.9452 5.52 × 102 2.75 × 10−3 4.40 × 10−7
350 0.8602 0.8602 0.9773 0.9773 5.95 × 102 5.06 × 10−3 2.43 × 10−7
400 0.9342 0.9342 0.9892 0.9892 6.28 × 102 9.44 × 10−3 3.23 × 10−7
450 0.9874 0.9874 0.9924 0.9924 6.74 × 102 2.31 × 10−2 3.01 × 10−7
500 0.9991 0.9991 1 1 7.31 × 102 8.55 × 10−2 4.09 × 10−7
Table 2.1: Sensitivity and Specificity of hypothesis test using debiased estimates obtain from Wo
(optimization method) and We (closed-form expression from (2.7)) with its corresponding runtime
in seconds for varying number of measurements. The fixed parameters are p = 500, s = 10, σ :=
0.05 ni=1 |ai. β ∗ |/n. We set µ = ν/(ν + L) where ν and L are computed exactly given the sensing
P
matrix A.
entries of b∗ to those in b̂Wo and b̂We . Considering the matrix W, we declared an element to be a
true defective if b∗j = 1 and b̂W,j = 1, and a false defective if b∗j = 0 but b̂W,j ̸= 0. We declare it to
be a false non-defective if b∗j = 0 but b̂W,j ̸= 0, and a true non-defective if βj∗ = 0 and b̂W,j = 0. The
sensitivity for β ∗ is defined as (# true defectives)/(# true defectives + # false non-defectives) and
specificity for β ∗ is defined as (# true non-defectives)/(# true non-defectives + # false defectives).
Results: For obtaining Wo , the optimization routine was executed using the lsqlin package in
MATLAB. The sensivitiy and specificity are averaged over 25 runs with independent noise instances.
In Table 2.1, we can see that the sensitivity as well as the specificity of the hypothesis tests for Wo
and We are equal. We further report the relative difference between Wo and We in the Frobenius
norm. We can clearly see that the difference is negligible, consistent with Theorem 1. Furthermore,
we see that using the closed-form expression in (2.7) saves significantly on time (by a factor of nearly
104 ). While the computational efficiency of the iterative approach can be improved by developing
a specialized solver for problems of the form (2.6), no iterative method is expected to outperform
directly computing the simple closed-form expression (2.7).
Sensing matrix properties: For this experiment, we fix n = 80, p = 100. We run this for two differ-
ent n × p sensing matrices A with elements drawn
from: (1) i.i.d. Gaussian and, (2) i.i.d. Rademacher.
∥W o −W e ∥F ν
In Figure 2.1 we plot µ vs log ∥We ∥F for both of these matrices. The exact value of L+ν is
given by a black vertical line in each case.
Observation: We see that for both the plots in Figure 2.1, the relative error decreases with increas-
ing in µ for µ < ν/(L + ν). For µ ≥ ν/(L + ν), the relative error is very small with fluctuations
primarily due to the solver tolerances in lsqlin when computing W o .
ν
Furthermore, for the Rademacher A, the decrease is sharp after the value of µ crosses L+ν .
(Note that, for Rademacher
A we have that L = 1). However, for the Gaussian sensing matrix
∥W o −W e ∥F ν
A, log ∥We ∥F decreases sharply before µ > L+ν .
9
Figure 2.1: Line plot of µ vs relative error log10 ∥W∥W o −W e ∥F
e ∥F
for two 80 × 100 dimensional sensing
matrices, (left) i.i.d. Gaussian and (right) i.i.d. Rademacher. The exact value of ν/(L + ν) is given
by the black vertical line. The value of ν/(L + ν) is 0.45 for Gaussian sensing matrix(left) and 0.298
for Rademacher sensing matrix (right). Here, Wo is the solution of the optimization problem in (2.6)
and W e is computed as in (2.7).
10
Chapter 3
Group testing is a well-studied area of data science, information theory and signal processing, dating
back to the classical work of Dorfman in [16]. Consider p samples, one per subject, where each
sample is either defective or non-defective. In the case of defective samples, additional quantitative
information regarding the extent or severity of the defect in the sample may be available. Group
testing typically replaces individual testing of these p samples by testing of n < p ‘groups’ of samples,
thereby saving on the number of tests. Each group (also called a ‘pool’) consists of a mixture of small,
equal portions taken from a subset of the p samples. Let the (perhaps noisy) test results on the n
groups be arranged in an n-dimensional vector z. Let the true status of each of the p samples be
arranged in a p-dimensional vector β ∗ . The aim of group testing is to infer β ∗ from z given accurate
knowledge of the group memberships. We encode group memberships in an n × p-dimensional binary
matrix B (called the ‘pooling matrix’) where Bij = 1 if the j th sample is a member of the ith group,
and Bij = 0 otherwise. If the overall status of a group is the sum of the status values of each of the
samples that participated in the group, we have:
z = Bβ ∗ + η̃, (3.1)
where η̃ ∈ Rn is a noise vector. In a large body of the literature on group testing (e.g., [5,12,16]), z and
β ∗ are modeled as binary vectors, leading to the forward model z = N(Bβ ∗ ), where the matrix-vector
‘multiplication’ Bβ ∗ involves binary OR, AND operations instead of sums and products, and N is a
noise operator that could at random flip some of the bits in z. In this work, however, we consider
z and β ∗ to be vectors in Rn and Rp respectively, as also done in [22, 26, 49], and adopt the linear
model (3.1). This enables encoding of quantitative information in z, β ∗ , and Bβ ∗ now involves the
usual matrix-vector multiplication.
In commonly considered situations in group testing, the number of non-zero samples, i.e., defective
samples, s ≜ ∥β ∗ ∥0 is much less than p, and βj∗ = 0 indicates that the j th sample is non-defective
where 1 ≤ j ≤ p. In such cases, group testing algorithms have shown excellent results for the recovery
of β ∗ from z, B. These algorithms are surveyed in detail in [2] and can be classified into two broad
categories: adaptive and non-adaptive. Adaptive algorithms [16, 26, 28] process the measurements
(i.e., the results of pooled tests available in z) in two or more stages of testing, where the output
of each stage determines the choice of pools in the subsequent testing stage. Non-adaptive algo-
rithms [7, 22, 23, 49], on the other hand, process the measurements with only a single stage of testing.
Non-adaptive algorithms are known to be more efficient in terms of time as well as the number of tests
required, at the cost of somewhat higher recovery errors, as compared to adaptive algorithms [22, 33].
In this work, we focus on non-adaptive algorithms.
11
Problem Motivation: In the recent COVID-19 pandemic, RT-PCR (reverse transcription poly-
merase chain reaction) has been the primary method of testing a person for this disease. Due to
widespread shortage of various resources for testing, group testing algorithms were widely employed
in many countries [1]. Many of these approaches used Dorfman testing [16] (an adaptive algorithm),
but non-adaptive algorithms have also been recommended or used for this task [7,22,49]. In this appli-
cation, the vectors β ∗ and z refer to the real-valued viral loads in the individual samples and the pools
respectively, and B is again a binary pooling matrix. In a pandemic situation, there is heavy demand
on testing labs. This leads to practical challenges for the technicians to implement pooling due to
factors such as (i) a heavy workload, (ii) differences in pooling protocols across different labs, and (iii)
the fact that pooling is inherently more complicated than individual sample testing [54], [18, ‘Results’].
Due to this, there is the possibility of a small number of inadvertent errors in creating the pools. This
causes a difference between a few entries of the pre-specified matrix B and the actual matrix B̂ used
for pooling. Note that B is known whereas B̂ is unknown in practice. The sparsity of the difference
between B and B̂ is a reasonable assumption, if the technicians are competent. Hence only a small
number of group membership specifications contain errors. This issue of errors during pool creation is
well documented in several independent sources such as [54], [18, ‘Results’], [15, Page 2], [46], [55, Sec.
3.1], [14, ‘Discussion’], [24, ‘Specific consideration related to SARS-CoV-2’] and [3, ‘Laboratory infras-
tructure’]. However the vast majority of group testing algorithms — adaptive as well as non-adaptive
— do not account for these errors. To the best of our knowledge, this is the first piece of work on
the problem of a mismatched pooling matrix (i.e., a pooling matrix that contains errors in group
membership specifications) for non-adaptive group testing with real-valued β ∗ and (possibly) noisy z.
We emphasize that besides pooled RT-PCR testing, faulty specification of pooling matrices may also
naturally occur in group testing in many other scenarios, for example when applied to verification of
electronic circuits [32]. Another scenario is in epidemiology [13], for identifying infected individuals
who come in contact with agents who are sent to mix with individuals in the population. The health
status of various individuals is inferred from the health status of the agents. However, sometimes an
agent may remain uninfected even upon coming in contact with an infected individual, which can be
interpreted as an error in the pooling matrix.
Related Work: We now comment on two related pieces of work which deal with group testing
with errors in pooling matrices via non-adaptive techniques. The work in [13] considers probabilistic
and structured errors in the pooling matrix, where an entry bij with a value of 1 could flip to 0 with
a probability 0.5, but not vice versa, i.e., a genuinely zero-valued bij never flips to 1. The work in [39]
considers a small number of ‘pretenders’ in the unknown binary vector β ∗ , i.e., there exist elements
in β ∗ which flip from 1 to 0 with probability 0.5, but not vice versa. Both these techniques consider
binary valued vectors z and β ∗ , unlike real-valued vectors as considered in this work. They also
do not consider noise in z in addition to the errors in B. Furrthemore, we also present a method
to identify the errors in B, unlike the techniques in [13, 39]. Due to these differences between our
work and [13,39], a direct numerical comparison between our results and theirs will not be meaningful.
Sensing Matrix Perturbation in Compressed Sensing: There exists a nice relationship be-
tween the group testing problem and the problem of signal reconstruction in compressed sensing (CS),
as explored in [22, 23]. Likewise, there is literature in the CS community which deals with perturba-
tions in sensing matrices [4, 21, 27, 30, 43, 44, 58]. However, these works either consider dense random
perturbations (i.e., perturbations in every entry) [4, 21, 27, 30, 44, 58] or perturbations in specifications
of Fourier frequencies [29,43]. These perturbation models are vastly different from the sparse set of er-
rors in binary matrices as considered in this work. Furthermore, apart from [29, 43], these techniques
just perform robust signal estimation, without any attempt to identify rows of the sensing matrix
which contained those errors.
12
approach, which we call the Debiased Robust Lasso Test Method or Drlt, extends existing work
on ‘debiasing’ the well-known Lasso estimator in statistics [31], to also handle errors in B. In this
approach, we present a principled method to identify which measurements in z correspond to rows
with errors in B, using hypothesis testing. We also present an algorithm for direct estimation of β ∗
and a hypothesis test for identification of the defective samples in β ∗ , given errors in B. We estab-
lish the desirable properties of these statistical tests such as consistency. Though our approach was
initially motivated by pooling errors during preparation of pools of COVID-19 samples, it is broadly
applicable to any group-testing problem where the pool membership specifications contain errors.
Given a sufficient number of measurements, the Lasso is known to be consistent for sparse β ∗ [25,
Chapter 11] if the penalty parameter λ > 0 is chosen appropriately and if B satisfies the Restricted
Eigenvalue Condition (REC)1 . Certain deterministic binary pooling matrices can also be used as
in [22, 49] for a consistent estimator of β ∗ . However, we focus on the chosen random pooling matrix
in this paper.
It is more convenient for analysis via the REC, and more closely related to the theory in [31], if
the elements of the pooling matrix have mean 0. Since the elements of B are drawn independently
from Bernoulli(0.5), it does not obey the mean-zero property. Hence, we transform the random binary
matrix B to a random Rademacher matrix A ≜ 2B − 1n×p , which is a simple one-one transformation
similar to that adopted in [45] for Poisson compressive measurements. (Note that 1n×p refers to a
matrix of size n × p containing all ones.) We also transform the measurements in z to equivalent
measurements y associated with Rademacher matrix A.
The expression for each measurement in y is now given by:
where ηi ∼ N(0, σ 2 ), σ 2 ≜ 4σ̃ 2 . We will henceforth consider y, A for the Lasso estimates in the
following manner: The Lasso estimator β̂, used to estimate β ∗ , is now defined as
1
β̂ = arg min ∥y − Aβ∥22 + λ∥β∥1 . (3.4)
β 2n
13
membership specifications as ‘bit-flips’. For example, suppose that the ith pool is specified to consist
of samples j1 , j2 , j3 ∈ [p]. But due to errors during pool creation, the ith pool is generated using
samples j1 , j2 , j5 . In this specific instance, ai,j3 ̸= âi,j3 and ai,j5 ̸= âi,j5 .
Note that A is known whereas  is unknown. Moreover, the locations of the bit-flips are unknown.
Hence they induce signal-dependent and possibly large ‘model mismatch errors’ δi∗ ≜ (âi. − ai. )β ∗
in the ith measurement. In the presence of bit-flips, the model in (3.3) can be expressed as:
∗
∗ ∗ ∗ ∗ β
yi = ai. β + δi + ηi , for i ∈ [n], =⇒ y = Aβ + δ + η = (A|In ) + η. (3.5)
δ∗
We assume δ ∗ , which we call the ‘model mismatch error’ (MME) vector in Rn , to be sparse, and
r ≜ ∥δ ∗ ∥0 ≪ n. The sparsity assumption on δ ∗ is reasonable in many applications (e.g., given a
competent technician performing pooling).
Suppose for a fixed i ∈ [n], âi. contains a bit-flip at index j. If βj∗ is 0 then δi∗ would remain 0
despite the presence of a bit-flip in âi. . Furthermore, such a bit-flip has no effect on the measurements
and is not identifiable from the measurements. However, if βj∗ is non-zero then δi∗ is also non-zero.
Such a bit-flip adversely affects the measurement and we henceforth refer to it as an effective bit-flip.
Effective bit-flips lead to non-zero elements in the MME vector δ ∗ . We refer to the non-zero elements
of δ ∗ as effective MMEs. Without loss of generality, we consider the identification of effective MMEs
in this paper.
Aim (i): Estimation of β ∗ under model mismatch and development of a statistical test to determine
whether or not the j th sample, j ∈ [p], is defective/diseased.
Aim (ii): Development of a statistical test to determine whether or not the ith measurement (i ∈ [n])
contains an effective MME i.e., δi∗ is non-zero.
A measurement containing an effective MME will appear like an outlier in comparison to other mea-
surements due to the non-zero values in δ ∗ . Therefore identification of measurements containing
effective MMEs is equivalent to determining the non-zero entries of δ ∗ . This idea is inspired by the
concept of ‘Studentised residuals’ which is widely used in the statistics literature to identify outliers
in full-rank regression models [40]. Since our model operates in a compressive regime where n < p,
the distributional property of studentized residuals may not hold. Therefore, we develop our Drlt
method which is tailored for the compressive regime.
Our basic estimator for β ∗ and δ ∗ from y and A is given as
β̂λ1 1
= arg min ∥y − Aβ − δ∥22 + λ1 ∥β∥1 + λ2 ∥δ∥1 , (3.6)
δ̂λ2 β,δ 2n
where λ1 , λ2 are appropriately chosen regularization parameters. This estimator is a robust version
of the Lasso regression [42]. The robust Lasso, just like the Lasso, will incur a bias due to the ℓ1
penalty terms.
The work in [31] provides a method to mitigate the bias in the Lasso estimate and produces a
‘debiased’ signal estimate whose distribution turns out to be approximately Gaussian with specific
observable parameters in the compressive regime (for details, see [31] and Sec. 3.2.2 below). However,
the work in [31] does not take into account errors in sensing matrix specification. We non-trivially
adapt the techniques of [31] to our specific application which considers bit-flips in the pooling matrix,
and we also develop novel procedures to realize Aims (i ) and (ii ) mentioned above.
14
We now first review important concepts which are used to develop our method for the specified
aims. We subsequently develop our method in the rest of this section. However, before that, we
present error bounds on the estimates β̂λ1 and δ̂λ2 from (3.6), which are non-trivial extensions of
results in [42]. These bounds will be essential in developing hypothesis tests to achieve Aims (i) and
(ii).
In Lemma. 6 of Sec.6.2.1, we show that the chosen random Rademacher sensing matrix A satisfies the
EREC with κ = 1/16 if λ1 and λ2 are chosen as in Theorem 4. Furthermore, |A|∞ = 1. Therefore,
the sufficient conditions for Theorem 4 are satisfied with high probability for a random Rademacher
sensing matrix.
Remarks on Theorem 4:
q
1. From Result (1), we see that β̂λ1 − β ∗ = OP (s + r) logn p .
1
√
r log n
2. From Result (2), we see that δ̂λ2 − δ ∗ = OP n .
1
3. The upper bounds of errors given in Theorem 4 increase with σ, as well as s and r, which is
quite intuitive. They also decrease with n.
15
for a given value of λ. Though Lasso provides excellent theoretical guarantees [25, Chapter 11], it is
well known that it produces biased estimates, i.e., E(β̂λ ) ̸= β ∗ , where the expectation is taken over
different instances of η. The work in [31] replaces β̂λ by a ‘debiased’ estimate β̂d given by:
1
β̂d = β̂λ + M A⊤ (y − Aβ̂λ ), (3.10)
n
where M is an approximate inverse (defined as in Alg. 2) of Σ̂ ≜ A⊤ A/n. Substituting y = Aβ ∗ + η
into (3.10) and treating n1 M A⊤ A as approximately equal to the identity matrix, yields:
1 1
β̂d = β̂λ + M A⊤ (Aβ ∗ + η − Aβ̂λ ) ≈ β ∗ + M A⊤ η, (3.11)
n n
which is referred to as a debiased estimate, as E(β̂d ) ≈ β ∗ . Note that Σ̂ is not an invertible matrix as
n < p. Hence, the approximate inverse is obtained by solving a convex optimization problem as given
by Alg. 2, where the minimization of the diagonal elements of M Σ̂M is motivated by minimizing the
variance of β̂d , as proved in [31, Sec. 2.1]. Furthermore, as proved in [31, Theorem 7], the convex
problem in Alg. 2 is feasible with high probability if Σ ≜ E[ai. (ai. )⊤ ] (where the expectation is taken
over the rows of A) obeys some specific statistical properties (see later in this section).
minimize mi ⊤ Σ̂1 mi
subject to ∥Σ̂mi − ei ∥∞ ≤ µ, (3.12)
q
log p
where ei ∈ Rp is the ith column of the identity matrix I and µ = O n .
3: Set M = (m1 | . . . |mp )⊤ . If any of the above problems is not feasible, then set M = Ip . =0
The
p
debiased
estimate β̂d in (3.10) obtained via an approximate inverse M of Σ̂ using µ =
O (log p)/n satisfies the following statistical properties [31, Theorem 8]:
√ √ √
m(β̂d − β ∗ ) = M A⊤ η/ n + n(M Σ̂ − In )(β ∗ − β̂d ). (3.13)
Here the second term on the RHS is referred to as the bias vector. Moreover it is proved in [31,
Theorem 8] that for sufficiently large n, an appropriate choice of λ in (3.9) and under appropriate
statistical assumptions on Σ, the maximum absolute value of the bias vector is OP ( σs√log n
p
). Thus, if
2
n > O((s log p) ), the largest absolute value of the bias vector will be negligible, and thus the debiasing
effect is achieved since E(β̂d ) ≈ β ∗ .
Our debiasing approach is motivated along similar lines as in Alg. 2, but with MMEs in the
sensing matrix which the earlier method cannot handle. Moreover, we demonstrate via simulations
that ignoring MMEs may lead to larger estimation errors—see Table 3.1 of Sec. 3.5.1.
16
To produce a debiased estimate of β ∗ in the presence of MMEs in the pooling matrix, we adopt a
different approach than the one in [31]. We define a linear combination of the residual error vectors
obtained by running the robust Lasso estimator from (3.6) via a carefully chosen set of weights,
in order to debias the robust Lasso estimates β̂ λ1 , δ̂ λ2 . The weights of the linear combination are
represented in the form of an appropriately designed matrix W ∈ Rn×p for debiasing β̂λ1 and a
derived weights matrix In − n1 W A⊤ for debiasing δ̂ λ2 . We later provide a procedure to design an
In our work, the matrix W does not play the role of M from Alg. 2, but instead plays the role of
AM ⊤ (comparing (3.14) and (3.10)). In Theorem 5 below, we show that these estimates are debiased
in nature for the choice W ≜ A. Thereafter, in Sec. 3.3 and Theorem 8, using a different choice for
W via Alg. 3, we show that the resultant tests are superior in comparison to W = A.
Theorem
√
5 Let√ β̂ λ1 , δ̂λ2 be as in (3.6), β̂W , δ̂W be as in (3.14), (3.15) respectively and set λ1 ≜
4σ √log p
n
, λ2 ≜ 4σ nlog n . Suppose that n is ω[((s + r) log p)2 ] 2 , A is a Rademacher matrix and W ≜ A.
L
Here −
→ denotes the convergence in law/distribution. ■
Remarks on Theorem 5
1. The asymptotic distributions of the LHS terms in (3.16) and (3.17) do not depend on A. These
distributions are asymptotically Gaussian because the noise vector η is normally distributed.
2. Theorem 5 provides the key result to develop a testing procedure corresponding to Aims (i) and
(ii).
3. If n is ω[((s + r) log p)2 ] then Lemma 6 implies that the Rademacher matrix A satisfies EREC.
4. The condition n < p in Result (1) emerges from (6.163) and (6.160), which are based on proba-
bilistic bounds on the singular values of random Rademacher matrices [41]. For the special case
where n = p (which is no longer a compressive regime), these bounds are no longer applicable,
and instead results such as [47, Thm. 1.2] can be used.
2 f (n)
Given functions f (n) and g(n) of n ∈ R, we say that f (n) is ω(g(n)) if limn→∞ = ∞, i.e. f (n) asymptotically
g(n)
‘dominates’ g(n).
17
Drlt for β ∗ : In Aim (i), we intended to develop a statistical test to determine whether a sample is
defective or not. Given the significance level α ∈ [0, 1], for each j ∈ [p], we reject the null hypothesis
G0,j : βj∗ = 0 in favor of G1,j : βj∗ ̸= 0 when
√
n|β̂W j |/σ > zα/2 , (3.19)
where zα/2 is the upper (α/2)th quantile of a standard normal random variable.
Drlt for δ ∗ : In Aim (ii), we intended to develop a statistical test to determine whether or not a
pooled measurement is affected by MMEs. Given the significance level α ∈ [0, 1], for each i ∈ [n], we
reject the null hypothesis H0,i : δi∗ = 0 in favor of H1,i : δi∗ ̸= 0 when
p
|δ̂W i |/ σ ΣAii > zα/2 . (3.20)
A desirable property of a statistical test is that the probability of rejecting the null hypothesis when
the alternate is true converges to 1 as n → ∞ (referred to as a consistent test). Theorem 5 ensures
that the proposed Drlts are consistent. Additionally, Theorem 5 shows that probability of rejecting
the null hypothesis when the null is true converges to α (referred to as an asymptotically unbiased
test). Further, the sensitivity and specificity (as defined in Sec. 3.5) of both these tests approach 1 as
n, p → ∞.
Note that the first term on the RHS of both (3.35) and (3.36) is zero-mean Gaussian. The remaining
two terms in both equations are bias terms. In order to develop an optimal hypothesis test for the
debiased robust Lasso, we show that (i ) the variances of the first term on the RHS of (3.35) and
(3.36) are bounded with appropriate scaling as n, p → ∞; and (ii ) the two bias terms in (3.35) and
(3.36) go to 0 in probability as n, p → ∞. In such a situation, the sum of the asymptotic variance of
2 P
the elements of β̂W will be nσ2 pj=1 w.j ⊤ w.j .
18
Algorithm 3 Design of W
Input: A, µ1 , µ2 and µ3
Output: W
1: We solve the following optimisation problem :
p
X
minimize w.j ⊤ w.j
W
j=1
q q q
where µ1 ≜ 2 2 log(p)
n , µ 2 ≜ 2 log(2np)
np + 1
n and µ3 ≜ √ 2 2 log(n)
p .
1−n/p
2: If the above problem is not feasible, then set W = A. =0
Theorem 6 (given below) establishes that the second and third term on P the RHS of both (3.35)
and (3.36) go to 0 in probability. We design W to minimize the expression pj=1 w.j ⊤ w subject to
.j
constraints C0, C1, C2, C3 on W , as summarized in Alg. 3. The values of µ1 , µ2 , µ3 are selected in
such a way that each of the constraints C1, C2, C3 in Alg. 3 holds with high probability for the choice
W ≜ A, as will be formally established in Lemma 10. These constraints are derived from Theorem 6
and ensure that the bias terms go to 0. In particular, the constraint C1 (via µ1 ) controls the rate
of convergence of bias terms on the RHS of (3.35), whereas the constraint C2 (via µ2 ) controls the
rate of convergence of bias terms on the RHS of (3.36). Furthermore, the constraint C3 allows us to
control the asymptotic variance of the first term on RHS of (3.36) (as will be shown via Theorem 7).
Essentially, the choice W ≜ A helps us establish that the set of all possible W matrices which satisfy
the constraints in Alg. 3 is non-empty with high probability. Finally, Theorem 7 establishes that the
variances of the first term on the RHS of (3.35) and (3.36) converge. These theorems play a vital role
in deriving Theorem 8 that leads to developing the optimal debiased robust Lasso tests.
Theorem
√
6 Let√β̂ λ1 , δ̂λ2 be as in (3.6), β̂W , δ̂W be as in (3.14), (3.15) respectively and set λ1 ≜
4σ √log p
n
, λ2 ≜ 4σ nlog n . Let A be a random Rademacher matrix and let W be obtained from Alg. 3.
Then if n is o(p) and n is ω[((s + r) log p)2 ], as p, n → ∞, we have:
1.
√
1 ⊤
n Ip − W A (β ∗ − β̂λ1 ) = oP (1). (3.23)
n ∞
2.
1
√ W ⊤ δ ∗ − δ̂λ2 = oP (1). (3.24)
n ∞
3.
n 1 ⊤
p In − W A A(β ∗ − β̂λ1 ) = oP (1). (3.25)
p 1 − n/p n
∞
4.
n 1
W A⊤ δ ∗ − δ̂λ2
p = oP (1). (3.26)
p 1 − n/p n ∞
19
■
Note that Σβ /n and Σδ are the variance-covariance matrix of the first terms of the RHS of (3.35)
and (3.36), respectively.
Theorem 7 shows that when W is chosen as per Alg. 3, the element-wise variances of the first
term of the RHS of (3.35) (diagonal elements of Σβ ) approach 1 in probability. The constraints C0
and C1 of Alg. 3 are mainly used to establish this theorem. Further, for the optimal choice of W as
in Alg. 3, we show that the element-wise variances of the first term of the RHS of (3.36) (diagonal
elements of Σδ ) goes to 1 in probability. To establish this, we use the constraint C3 of Alg. 3.
Theorem 7 Let A be a Rademacher matrix. Suppose W is obtained from Alg. 3 and Σβ and Σδ are
defined as in (3.37) and (3.38), respectively. If n log n is o(p) and n is ω[((s + r) log p)2 ], as n, p → ∞,
we have the following:
When we choose an optimal W as per the Alg. 3, the equations (3.35) and (3.36) along with Theorem 6
and Theorem 7 can be used to derive the asymptotic distribution of β̂W and δ̂W . This is accomplished
in Theorem 8, which can be viewed as a non-trivial extension of Theorem 5 for such an optimal choice
of W .
Theorem
√
8 Let√ β̂ λ1 , δ̂λ2 be as in (3.6), β̂W , δ̂W be as in (3.14), (3.15) respectively and set λ1 ≜
4σ √log p
n
, λ2 ≜ 4σ nlog n . Let A be a random Rademacher matrix and W be the debiasing matrix obtained
from Alg. 3. If n is ω[((s + r) log p)2 ] and n log n is o(p), then we have:
where Σβjj and Σδii are the j th and ith diagonal elements of matrices Σβ (as in (3.37)) and Σδ
(as in (3.38)), respectively.
20
Theorem 8 paves the way to develop an optimal Drlt for Aim (i) and (ii) of this work along a
similar line of development as the Drlt.
Optimal Drlt for β ∗ : As in Drlt for β ∗ , we now present a hypothesis testing procedure for an
optimally designed W to determine defective samples based on Theorem 8. As before, given α > 0
we reject the null hypothesis G0,j : βj∗ = 0 in favor of G1,j : βj∗ ̸= 0, for each j ∈ [p] when
√ q
nβ̂W j Σβjj > zα/2 , (3.33)
Note that, we assume σ 2 to be known. Now we will provide a theorem that provides the joint
√
distributions of n(β̂W − β ∗ )K of (3.35) and (δ̂W − δ ∗ )L of (3.36) which will aid us in creating the
joint tests and its corresponding confidence intervals.
21
This theorem provides us with the following simultaneous hypothesis test.
Simultaneous test for β: Let K ⊂ [p] such that ∥K∥0 = k. We reject the null hypothesis G0 :
∗ = 0 vs the alternate G : β ∗ ̸= 0 at α% level of significance if:
βK 1 K
√ √
{ n(β̂W )K }⊤ ΣβK −1 { n(β̂W )K } > χ2k,1−α (3.41)
Lemma 2 Given A is n × p dimensional Rademacher matrix and W be the optimal solution of Alg.3.
Let K ⊂ [p] such that ∥K∥0 = k and L ⊂ [n] such that ∥L∥0 = l with both k, l being fixed as n, p → ∞.
Furthermore, let (β̂W − β ∗ )K be as defined in (3.35) and ΣβK be as defined in (3.37). If n log n is
o(p) and n is ω(((s + r) log(p))2 ) then, we have,
( )⊤ ( )
√ ⊤ −1 √ ⊤ P
n Ip − n W A K (β ∗ − β̂λ1 ) ΣβK
1
n Ip − n W A K (β ∗ − β̂λ1 ) → 0.
1
1.
( )⊤ ( )
P
2. √1 WK ⊤ δ ∗ − δ̂λ2 ΣβK −1 √1 WK ⊤ δ ∗ − δ̂λ2 →0
n n
( )⊤ ( )
√1 WK ⊤ η −1 √ P
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 )
3. 2 n
ΣβK K
→ 0.
( )⊤ ( )
P
4. 2 √1 WK ⊤ η ΣβK −1 √1 WK ⊤ δ ∗ − δ̂λ2 → 0.
n n
( )⊤ ( )
√
P
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 ) ΣβK −1 √1 WK ⊤ δ ∗ − δ̂λ2
5. 2 K n
→ 0.
Lemma 3 Given A is n × p dimensional Rademacher matrix and W be the optimal solution of Alg.3.
Let K ⊂ [p] such that ∥K∥0 = k and L ⊂ [n] such that ∥L∥0 = l with l being fixed as n, p → ∞.
Furthermore, let (δ̂W − δ ∗ )L be as defined in (3.36) and ΣδL be as defined in (3.38). If n log n is o(p)
and n is ω(((s + r) log(p))2 ) then, we have,
( )⊤ ( )
P
In − n1 W A⊤ L A(β ∗ − β̂λ1 ) ΣδL −1 In − n1 W A⊤ L A(β ∗ − β̂λ1 ) → 0.
1.
( )⊤ ( )
⊤ P
2. 1
n WL A δ ∗ − δ̂λ2 ΣδL −1 1
n WL A
⊤
δ ∗ − δ̂λ2 → 0.
( )⊤ ( )
P
In − n1 W A⊤ L η ΣδL −1 In − n1 W A⊤ L A(β ∗ − β̂λ1 )
3. 2 → 0.
22
( )⊤ ( )
⊤ −1 ⊤ P
1 1
δ∗
4. 2 In − nW A L
η ΣδL n WL A − δ̂λ2 → 0.
( )⊤ ( )
⊤ −1 ⊤ P
1
A(β ∗ 1
δ ∗ − δ̂λ2
5. 2 In − nW A L
− β̂λ1 ) ΣδL n WL A → 0.
In the upcoming Lemma, we show that inverse of the covariance matrices of βˆ∗ W and δ̂ W are
strictly positive and converges to a constant asymptotically.
Lemma 4 Given A is n × p dimensional Rademacher matrix and W be the optimal solution of Alg.3.
Let K ⊂ [p] such that ∥K∥0 = k and L ⊂ [n] such that ∥L∥0 = l with both k, l being fixed as n, p → ∞.
Furthermore, let ΣβK and ΣδL be as defined in (3.37) and (3.38) respectively. Then, we have,
1.
XX c26 k 2
P [ΣβK −1 ]lj ≤ p ≥ 1 − ((ψ)n−k+1 + (c5 )p ). (3.43)
j∈K l∈K
ψ 2 c2opt (1 − (k − 1)/n)2
2.
!
1 XX
−1 l2 (1 − n/p)
P n2
[ΣδL ]ik ≤ p
p2 (1−n/p) i∈L k∈L
(copt ϵ21 (1 − n/p)2 − n/p)2
≥ 1− {(c6 ϵ1 )n−k+1 + (c5 )p }. (3.44)
Choice of Model Mismatch Error: In our work, all effective MMEs were generated in the following
manner: In our convention, a bit-flipped pool (measurement as described in (3.5)) contains exactly
one bit-flip at a randomly chosen index. Suppose that the ith pool (measurement) contains a bit-flip.
Then exactly one of the following two can happen: (1) some j th sample that was intended to be in
the pool (as defined in A) is excluded, or (2) some j th sample that was not intended to be part of the
pool (as defined in A) is included. These two cases lead to the following changes in the ith row of Â
(as compared to the ith row of A), and in both cases the choice of j ∈ [n] is random uniform: Case
23
1: âij = −1 but aij = 1, Case 2: âij = 1 but aij = −1. Note that under this scheme, the generated
MMEs may not be effective. Hence MMEs were applied in an adversarial setting by inducing bit-flips
only at those entries in any row of  corresponding to indices with non-zero values of β ∗ .
Evaluation Measures of Hypothesis Tests: Many different variants of the Lasso estimator
were compared empirically against each other as will be described in subsequent subsections. Each of
them were implemented using the CVX package in MATLAB. Results for the hypothesis tests (given
in (3.19),(3.20),(3.33) and (3.34)) are reported in terms of sensitivity and specificity (defined below).
The significance level of these tests is chosen at 1%. Consider a binary signal b̂β with p elements. In
our simulations, a sample at index j in β̂W is declared to be defective if the hypothesis test G0,j is
rejected, in which case we set b̂β,j = 1. In all other cases, we set b̂β,j = 0. We declare an element to
be a true defective if βj∗ ̸= 0 and b̂β,j ̸= 0, and a false defective if βj∗ = 0 but b̂β,j ̸= 0. We declare it to
be a false non-defective if βj∗ = 0 but b̂β,j ̸= 0, and a true non-defective if βj∗ = 0 and b̂β,j = 0. The
sensitivity for β ∗ is defined as (# true defectives)/(# true defectives + # false non-defectives) and
specificity for β ∗ is defined as (# true non-defectives)/(# true non-defectives + # false defectives).
We report the results of testing for the debiased tests using: (i ) W ≜ A corresponding to Drlt
(see (3.19) and (3.20)), and (ii ) the optimal W using Alg. 3 corresponding to Odrlt (see (3.33) and
(3.34)).
1. Baseline ignoring MMEs: (Baseline-1) This approach computes the following ‘debiased’
estimate of β ∗ as given in Equation (5) of [31]:
1
β̂b ≜ β̂λ,b + M A⊤ (y − Aβ̂λ,b ), (3.45)
n
where β̂λ,b ≜ argminβ ∥y − Aβ∥22 + λ∥β∥1 , and M is the approximate inverse of A obtained
from Alg. 2. In this baseline approach, we reject the null hypothesis G0,j : βj∗ = 0 in favor of
√ q
G1,j : βj∗ ̸= 0, for each j ∈ [p] when nβ̂bj / σ 2 [M A⊤ AM ⊤ ]jj /n > zα/2 .
24
n Sens - B-1 Sensi-B-2 Sens-Odrlt Spec-B-1 Spec-B-2 Spec-Odrlt
100 0.522 0.602 0.647 0.678 0.702 0.771
200 0.597 0.682 0.704 0.832 0.895 0.931
300 0.698 0.802 0.878 0.884 0.915 0.963
400 0.791 0.834 0.951 0.902 0.927 0.999
500 0.858 0.894 0.984 0.923 0.956 1
Table 3.1: Comparison of average Sensitivity (Sens) and Specificity (Spec) (based on 100 independent
noise runs) for the tests Baseline-1 (B-1), Baseline-2(B-2) and Odrlt for determining defectives
in β ∗ from their respective debiased estimates in the presence of MMEs induced in A (See Sec. 3.5.1
for detailed definitions).
2. Baseline considering MMEs: (Baseline-2) In this approach, we consider the MMEs which
is equivalent to the sensing matrix as (A|In ) and signal vector x∗ = (β ∗ ⊤ , δ ∗ ⊤ )⊤ . The ‘debiased’
estimate of x∗ in this approach is given as:
1
x̃b ≜ x̃λ + M̃ (A|In )⊤ (y − (A|In )x̃λ ), (3.46)
n
where x̃λ ≜ argminβ ∥y − (A|In )x∥22 + λ∥x∥1 and M̃ is the approximate inverse of (A|In )
obtained from Alg. 2. Then β̃b is obtained by extracting the first p elements of x̃b . In this
approach, we reject the null hypothesis G0,j : βj∗ = 0 in favor of G1,j : βj∗ ̸= 0, for each j ∈ [p]
√
q
when nβ̃bj / σ 2 [M̃ (A|In )⊤ (A|In )M̃ ⊤ ]jj /n > zα/2 .
Note that the theoretical results established in [31] hold for completely random or purely deterministic
sensing matrices, whereas the sensing matrix corresponding to the MME model, i.e., (A|In ), is partly
random and partly deterministic. Nonetheless, the second baseline test, i.e. Baseline-2 with the
augmented matrix, is useful as a numerical benchmark. For both baseline approaches, the regular-
ization parameter λ was chosen using cross validation. We chose the λ value which minimized the
validation error with 90% of the measurements used for reconstruction and the remaining 10% used
for cross-validation. In Table 3.1, we compare the average values (over 100 instances of measurement
noise) of Sensitivity and Specificity of Baseline-1, Baseline-2 and Odrlt for different values of n
varying in {100, 200, 300, 400, 500} and p = 500. It is clear from Table 3.1, that for all the values of
n, the Sensitivity and Specificity value of Odrlt is higher as compared to that of Baseline-1 and
Baseline-2. The performance of Baseline-2 dominates Baseline-1 which indicates that ignoring
MMEs may lead to misleading inferences in small sample scenarios. Furthermore, the Sensitivity and
Specificity of Odrlt approaches 1 as n increases. This highlights the superiority of our proposed
technique and its associated hypothesis tests over two carefully chosen baselines. Note that there is
no prior literature on debiasing in the presence of MMEs, and hence these two baselines are the only
possible competitors for our technique.
TH,i ≜ (δ̂W i − δi∗ )/ [Σδ ]ii , for the optimal weight matrix W , with its asymptotic distribution N(0, 1)
p
as derived in Theorem 8. We chose p = 500, n = 400, fadv = 0.01, fsp = 0.01 and fσ = 0.01. The
measurement vector y was generated with a perturbed matrix  containing effective MMEs using the
procedure described earlier. Here, TG,j and TH,i were computed for 100 runs across different noise
instances in η.
The left sub-figure of Fig. 3.1 shows plots of the quantiles of a standard normal random variable
versus the quantiles of TG,j computed over 100 runs for each j ∈ [p]. For the quantiles, each plot is
presented in a different color. A 45◦ straight line passing through the origin is also plotted (black solid
line) as a reference. These p different quantile-quantile (QQ) plots corresponding to j ∈ [p], all super-
imposed on one another, indicate that the quantiles of the {TG,j }pj=1 are close to that of a standard
25
Figure 3.1: Left: Quantile-Quantile plots of N(0, 1) vs. TG,j (defined at the beginning of Sec. 3.5.2)
using 100 independent noise runs for all j ∈ [p] (one plot per index j with different colors). Right:
Quantile-Quantile plots of N(0, 1) vs. TH,i (defined at the beginning of Sec. 3.5.2) using 100 indepen-
dent noise runs for all i ∈ [n] (one plot per index i with different colors). For both plots, the pooling
matrix contained MMEs.
normal distribution in the range of [−2, 2] (thus covering 95% range of the area under the standard
bell curve) for defective as well as non-defective samples. This confirms that the distribution of the
TG,j values is each approximately N(0, 1), even in this chosen finite sample scenario. Similarly, the
right sub-figure of Fig. 3.1 shows the QQ-plot corresponding to TH,i for each i ∈ [n] in different colors.
As before, these n different QQ-plots, one for each i ∈ [n], all super-imposed on one another, indicate
that the {TH,i }ni=1 values are also each approximately standard normal, with or without MMEs.
26
Figure 3.2: Average Sensitivity and Specificity plots (over 100 independent noise runs) for detecting
measurements containing MMEs (i.e. detecting non-zero values of δ ∗ ) using Drlt, Odrlt and Robust
Lasso (Rl). The experimental parameters are p = 500, fσ = 0.1, fadv = 0.01, fsp = 0.1, n = 400. Left
to right, top to bottom: results for experiments E1, E2, E3, E4 (see Sec. 3.5.3 for details).
27
Figure 3.3: Average Sensitivity and Specificity plots (over 100 independent noise runs) plots for
detecting defective samples (i.e., non-zero values of β ∗ ) using Drlt, Odrlt, Robust Lasso and
Baseline 3. Left to right, top to bottom: results for experiments (EA), (EB), (EC), (ED). The
experimental parameters are p = 500, fσ = 0.1, fadv = 0.01, fsp = 0.1, n = 400. See Sec. 3.5.4 for more
details.
2. Lasso (referred to as L2) based on minimizing ∥y − Aβ∥22 + λ∥β∥1 with respect to β. Note
that this ignores MMEs.
3. An inherently outlier-resistant version of Lasso which uses the ℓ1 data fidelity (referred to as
L1), based on minimizing ∥y − Aβ∥1 + λ∥β∥1 with respect to β.
4. Variants of L1 and L2 combined with the well-known Ransac (Random Sample Consensus)
framework [19] (described below in more detail). The combined estimators are referred to as
Rl1 and Rl2 respectively.
Ransac is a popular randomized robust regression algorithm, widely used in computer vision [20,
Chap. 10]. We apply it here to the signal reconstruction problem considered in this paper. In Ransac,
multiple small subsets of measurements from y are randomly chosen. Let the total number of subsets
be NS . Let the set of the chosen subsets be denoted by {Zi }N i=1 . From each subset Zi , the vector
S
(i)
β̂ is estimated, using either L2 or L1. Every measurement is made to ‘cast a vote’ for one of the
(i) (j)
models from the set {β̂ }N i=1 . We say that measurement yl (where l ∈ [n]) casts a vote for model β̂
S
(j) (k)
(where j ∈ [NS ]) if |yl − al. β̂ | ≤ |yl − al. β̂ | for k ∈ [NS ], k ̸= j. Let the model which garners the
28
Figure 3.4: Average RRMSE comparison (over 100 independent noise runs) using Odrlt, Drlt, L1
(L1 Lasso), L2 (L2 Lasso), RL1 (L1 Lasso with Ransac), the RL2 (L2 Lasso with Ransac),
and robust Lasso (Rl) w.r.t. variation in the following parameters keeping others fixed: bit-flip
proportions fadv as in setup (EA) (topleft), measurements n (top right) as in setup (EB), noise level fσ
as in setup (EC) (bottom left) and sparsity fsp as in setup (ED) (bottom right). The fixed parameters
are dimension of p = 500, fσ = 0.1, fadv = 0.01, fsp = 0.01, n = 400. See Sec. 3.5.5 for more details.
js
largest number of votes be denoted by β̂ , where js ∈ [NS ]. The set of measurements which voted for
this model is called the consensus set. Ransac when combined with L2 and L1 is respectively called
Rl2 and Rl1. In Rl2, the estimator L2 is used to determine β ∗ using measurements only from the
consensus set. Likewise, in Rl1, the estimator L1 is used to determine β ∗ using measurements only
from the consensus set.
Our experiments in this section were performed for signal and sensing matrix settings identical
to those described in Sec. 3.5.4. The performance in all experiments was measured using RRMSE,
averaged over reconstructions from 100 independent noise runs. For all techniques, the regularization
parameters were chosen using cross-validation following the procedure in [57]. The maximum number
of subsets for finding the consensus set in Ransac was set to NS = 500 with 0.9n measurements in
each subset. RRMSE plots for all competing algorithms are presented in Fig. 3.4, where we see that
Odrl and Drl outperformed all other algorithms for all parameter ranges considered here. We also
observe that Odrl produces lower RRMSE than Drl, particularly in the regime involving higher
fadv .
Algorithm Implementation: A MATLAB implementation of the algorithms in this paper can be
found at https://github.com/Shuvayan21/DRLT-for-MMEs.
29
Chapter 4
All algorithms for Group testing or Compressed Sensing assume that A is known accurately. However
a technician may make errors while implementing the pooling procedure [3, 18, 24, 55]. That is, we
consider the case where due to errors in mixing of the samples, the pools are generated using an
unknown matrix  (say) instead of the pre-specified matrix A. The elements of matrix  and A
are equal everywhere except for the misspecified samples in each pool. We refer to these errors in
group membership specifications as ‘bit-flips’. For example, suppose that the ith pool is specified to
consist of samples j1 , j2 , j3 ∈ [p]. But due to errors during pool creation, the ith pool is generated
using samples j1 , j2 , j5 . In this specific instance, ai,j3 ̸= âi,j3 and ai,j5 ̸= âi,j5 .
Previously, we proposed a method for determining health status values that is resilient to a limited
number of bit-flips Chapter 3. This method uses a ‘debiased’ version of the robust Lasso estimator,
through which we designed hypothesis tests to achieve two objectives: (i ) identifying the unhealthy
subjects by detecting non-zero entries in the health status vector, and (ii ) identifying rows in the
design matrix affected by Model Mismatch Errors (MMEs). In this chapter, we present algorithms
aimed at correcting MMEs within pooled tests and subsequently reconstructing the signal vector based
on the corrected sensing matrix A.
We first address the problem of correction of Permutation Noise in Compressed Sensing.
30
swapped. This means that the pooled results in y were obtained via an ‘actual’ pooling matrix Â
(unknown) which is different from the known (pre-specified) pooling matrix A with ∆A ≜ Â − A. If
yi1 , yi2 were swapped, we have âi1 ,. = ai2 ,. ̸= ai1 ,. and âi2 ,. = ai1 ,. ̸= ai2 ,. . Accounting for permutation
noise in y, we have the forward model:
where δ ∗ ≜ ∆Aβ ∗ is the signal-dependent noise owing to permutation errors (PEs). In practice, one
expects to have very few PEs given a competent technician. Therefore, we assume that δ ∗ is sparse,
i.e., r ≜ ∥δ ∗ ∥0 , r ≪ n.
this procedure, we go through each pair of indices (i1 , i2 ), i1 ̸= i2 in the set of estimated permuted
measurements P and swap the values of yi1 , yi2 . For a fixed i1 , we retain the swap which yields the
highest p-value as per the hypothesis test TH . After correction of the measurements, we re-estimate
β ∗ using (3.6) (cf. Sec. 3.2). Compared to Drlt-D, we note that Drlt-C makes better use of the
available measurements. We also note that the correction procedure does not alter A, but alters only
y.
31
Choice of Regularization Parameters: The parameters λ1 , λ2 were chosen such that log(λ1 ) ∈
Rλ , log(λ2 ) ∈ Rλ where Rλ ≜ [1, 1.25, 1.5, . . . , 7], in the following manner: We first identified values
of λ1 , λ2 ∈ Rλ such that the Lilliefors test [36] confirmed the Gaussian distribution for both TG,j ≜
√ p p
nβ̂W j / σ Σβjj , j ∈ [p] and TH,i ≜ δ̂W i / σ Σδii , i ∈ [n] (cf. TG , TH in Sec. ??) at the 1%
significance level, for at least 70% of the coordinates of β ∗ and δ ∗ . Out of these chosen values, we
determined the values λ1 , λ2 that minimized the average cross-validation error (CVE) over 10 folds. In
each fold, 90% of the n measurements (denoted by a sub-vector yr corresponding to sub-matrix Ar )
were used to obtain (β̂λ1 , δ̂λ2 ) via the robust Lasso, and the remaining 10% of the measurements
(denoted by a sub-vector ycv corresponding to sub-matrix Acv ) were used to estimate the CVE
∥ycv − Acv β̂λ1 − Icv δ̂λ2 ∥22 . Note that Icv is a sub-matrix of the identity matrix which samples only
some elements of y, δ̂. The CVE is chosen for parameter selection because it is a data-driven proxy
for the non-computable mean-squared error [57].
Evaluation Measures: In our simulations, a sample at index j in β̂W was declared to be defective if
the hypothesis test G0,j was rejected at 5% significance level, and was declared non-defective otherwise.
Results are reported in terms of sensitivity and specificity. The sensitivity (SE) is defined as (# true
defectives)/(# true defectives + # false non-defectives) and specificity (SP) is defined as (# true
non-defectives)/(# true non-defectives + # false defectives).
Sensitivity and Specificity of Drlt-D and Drlt-C for β ∗ : Here, we first examined the effec-
tiveness of Drlt and Drlt-C to detect the non-zero elements in β ∗ in the presence of permutations
in y. We compared the performance of Drlt, Drlt-D and Drlt-C to two other related algorithms
to enable performance calibration: (1) Robust Lasso (Rl) from (??) without debiasing; (2) A hy-
pothesis testing mechanism on a pooling matrix without model mismatch, which we refer to as Ub
(upper baseline). In Ub, we generated measurements with the correct matrix A and obtained a
debiased Lasso estimate as given by Eqn. 7 of [31]. In the case of Rl and Drlt-C, the decision
regarding whether a sample is defective or not was taken based on a threshold τss that was cho-
sen to maximize the SE+SP on a training set of signals from the same distribution. The tuning
was done separately for every choice of parameters fperm , fσ , fsp and n. We examined the varia-
tion in SE and SP with regard to change in the following parameters, keeping all other parameters
fixed: (EA) number of pools n; (EB) fsp for sparsity of β ∗ ; (EC) fσ for noise standard deviation;
and (ED) fraction fperm for number of permutations in A to generate Â. Note that the fractions
fperm , fσ , fsp are defined earlier in this section. For the measurements experiment (EA), n was varied
over {200, 150, . . . , 500} with fsp = 0.01, fperm = 0.01, fσ = 0.1. For the sparsity experiment (EB),
fsp was varied in {0.01, 0.02, . . . , 0.1} with n = 400, fperm = 0.01, fσ = 0.1. For the noise experiment
(EC), we varied fσ in {0, 0.05, . . . , 0.5} with n = 400, fsp = 0.01, fperm = 0.01. For the permutation
experiment (ED), fperm was varied in {0.01, 0.02, . . . , 0.1} with n = 400, fsp = 0.01, fσ = 0.1. SE and
SP values, averaged over 100 noise instances, for all four experiments are plotted in Fig. 4.1. The plots
demonstrate the superior performance of Drlt-C over Rl and Drlt, with Drlt coming second. In
all regimes, Ub performs the best as it assumes and uses an error-free sensing matrix. But we observe
that for large n, small fσ and small fsp , the SE and SP for Drlt-C is on par with that of Ub.
RRMSE Comparison of Drlt-D and Drlt-C to Baselines: We computed the relative root
mean squared error (RRMSE) for an estimate β̂ by the formula ∥β ∗ − β̂∥2 /∥β ∗ ∥2 . We compared
the RRMSE of Drlt, Rl and Drlt-C to that of the following algorithms for signal and sensing
matrix settings identical to those described earlier: (1) Lasso (referred to as L2) based on minimizing
∥y − Aβ∥22 + λ∥β∥1 which ignores the presence of MMEs, and (2) an outlier-resistant version of
Lasso (referred to as L1), based on minimizing ∥y − Aβ∥1 + λ∥β∥1 . Besides this, we also compared
our algorithms with L1 and L2 combined with the well-known Ransac (Random Sample Consensus)
framework [19], producing estimators Rl1 and Rl2. The performance in all experiments was measured
using average values RRMSE over reconstructions from 50 independent noise runs. For all algorithms,
the threshold to binarize the estimate of β ∗ was chosen to minimize the sum of SE and specificity over
a training set of signals from the same distribution. For all techniques, the regularization parameters
were chosen using cross-validation following the procedure in [57]. The maximum number of subsets
for finding the consensus set in Ransac was set to NS = 500 with 0.9n measurements in each subset.
RRMSE plots for various algorithms for SSM are presented in Fig. 4.2, where we see that Drlt-C and
32
Figure 4.1: Sensitivity and Specificity comparison for Drlt, Drlt-C, Robust Lasso and Ub for
experiments (EA) (top left), (EB) (bottom left), (EC) (bottom right), (ED) (top right). Note that
for (ED), Ub uses fperm = 0 always. See Sec. 4.1.2 for more details.
Figure 4.2: RRMSE comparison for Perm using Drlt-C, Drlt-D, L1 (L1 Lasso), L2 (L2 Lasso),
RL1 (L1 Lasso with Ransac), the RL2 (L2 Lasso with Ransac), and robust Lasso (Rl) w.r.t.
variation in the following parameters keeping others fixed: proportion of permutations fperm (topleft),
measurements n (top right), noise level fσ (bottom left) and sparsity fsp (bottom right). The fixed
parameters are dimension of p = 500, fσ = 0.1, fperm = 0.01, fsp = 0.01, n = 400.
33
Drlt-D significantly outperform all the other algorithms for all parameter ranges and that Drlt-C
produces lower RRMSE than Drlt-D, particularly in the regime involving higher fσ .
Multiple stages of Correction of Permutation Errors: We have noticed that after the first stage
of correction in Drlt-C, a small fraction of PEs remain uncorrected, and a small number of new PEs
are falsely created. Both these are caused due to small but inevitable Type-I and Type-II errors in the
hypothesis test TH . For this set of experiments, we take p = 500, n = 450, s = 5, fσ = 0.01 and r = 8
PEs. After the first stage of correction in Drlt-C, we execute the following three steps iteratively:
(i ) We re-estimate β ∗ and the permutation noise vector based on the corrected measurements using
Robust Lasso Rl. (ii ) We then perform debiasing of these estimates as shown in (3.14) and (3.15).
(iii ) Based on the new set of detected permuted measurements, we perform correction given by Alg. 4.
After each stage of correction, we report the average number of measurements correctly detected to
have PEs, the average number incorrectly detected to have PEs and the average actual number of PEs
(over 20 noise runs). These results are presented in Table ?? which shows that after the fifth stage, the
test only falsely detects a small number of permutations in the model and there are no permutations
left in the sixth stage.
1. Single Switch Model (SSM): In this model, a bit-flipped pool (measurement as described in
Eqn.(5) of [6]) contains exactly one bit-flip at a randomly chosen index. Suppose that the ith
pool (measurement) contain a bit-flip. Under the SSM scheme, exactly one of the following
two can happen: (1) some jth sample that was intended to be in the pool (as defined in A) is
excluded, or (2) some jth sample that was not intended to be part of the pool (as defined in
34
A) is included. These two cases lead to the following changes in the ith row of Â, and in both
cases the choice of j ∈ [p] is random uniform: Case 1: Âij = −1 but Aij = 1, Case 2: Âij = 1
but Aij = −1.
2. Adjacent Switch Model (ASM): In ASM, a bit-flipped pool contains bit-flips at two adjacent
indices. Suppose the ith pool contains bit-flips. Then under the ASM scheme, either (1) the
jth sample that was not intended to be in the pool is included and the j ′ th sample where
j ′ ≜ mod(j + 1, p) that was intended to be in the pool is excluded, or (2) the jth sample
that is intended to be in the pool is excluded and the j ′ th sample where j ′ ≜ mod(j + 1, p)
that is not intended to be in the pool, is included at random. This leads to the following
changes in the ith row of Â, and in both cases the choice of j is uniform random: Case 1:
Âij ′ = −1, Âij = 1 and Aij ′ = 1, Aij = −1, Case 2: Âij ′ = 1, Âij = −1 and Aij ′ = −1, Aij = 1.
3. Random Switch Model (RSM): In RSM, a pool that contains bit-flips will necessarily contain two
bit-flips at random locations. Suppose the ith pool has bit-flips. Then under the RSM scheme,
for two distinct samples k ∈ [p] and l ∈ [p], either (1) the kth sample that is not intended to
be in the pool is mistakenly included and the lth sample that is intended to be in the pool
is mistakenly excluded, or (2) the kth sample that is intended to be in the pool is mistakenly
excluded and the lth sample that is not intended to be in the pool is mistakenly included. This
leads to the following changes in the ith row of Â, for l ̸= k ∈ [p], and in both cases the
choice of k, l is uniform random: Case 1: Âik = −1, Âil = 1 and Aik = 1, Ail = −1, Case 2:
Âik = 1, Âil = −1 and Aik = −1, Ail = 1. Note that ASM is a special case of RSM, with the
second index fixed to mod(j + 1, p) when the first index is j.
discarding the measurements in J, we provide algorithms to correct for errors in the corresponding
rows of matrix Â, again making use of the key principles of the ODrlt technique, as well as exploiting
the particular statistical model for mismatch.
We first provide an algorithm for the RSM for bit-flips – see Alg. 5. For RSM, we do the following:
In any row of A, we check for all pairs of unequal entries Ai,j1 and Ai,j2 with j1 ̸= j2 and swap their
values. Then we recompute δ̂W using Alg.3. We check whether the new estimate δ̂W i satisfies H0,i as
per the test described in [6]. If H0,i is rejected, we have been successful in identifying the bit-flip in
the ith row at locations j1 and j2 with probability 1 − α, where α is the level of significance of the
test. Otherwise, the sign of other entries of the ith row need to be swapped until the bit-flip is found.
Note that as per RSM, a given row can contain bit-flips in at exactly two entries. The procedure
for correction of bit-flips in ASM is quite similar to the one in Alg. 5. Here, instead of toggling Ai,j1
and Ai,j2 with j1 ̸= j2 in Alg. 5, we instead do as follows: If Ai,j ̸= Ai,j ′ , then swap the values of
Ai,j and Ai,j ′ where j ′ = mod(j + 1, p). The remaining steps are exactly as in Alg. 5. In SSM, for
every measurement i in J, we flip the sign of element Aij of A where j ∈ [p]. The rest of the steps
are exactly as in Alg. 5. The matrix thus obtained from the correction algorithm can then be used to
re-estimate β ∗ using Lasso.
35
Algorithm 5 Correction for bit-flips following the Random Switch Model (RSM) using W = copt A
Input: Measurement vector y, pooling matrix A, Lasso estimate β̂λ1 , λ and the set J of corrupted
measurements estimated by ODrlt method
Output: Bit-flip corrected matrix Ã
1: for every i ∈ J do
2: Set bf 1 := −1, bf 2 = −1 (bit-flip flag), max-p value= 0.01.
3: for every j ∈ [p − 1] do
4: for l ∈ [p] do
5: if {{Aij == −1}and{Ail == 1}}or{{Aij == 1}and{Ail == −1}} then
6: if Aij == 1 then
7: Aij = −1, Ail = 1.
8: else if Aij == −1 then
9: Aij = 1, Ail = −1.
10: end if
11: Find the solution β̂λ1 , δ̂λ2 to the convex program given in Eqn.(6) of [6].
12: Calculate the debiased Lasso estimate δ̂W given by Eqn.(15) using W = copt A of [6].
13: Set pval = 1 − Φ(TH,i = √[δ̂W ]i ), where Σδ is defined in Eqn.(28) of [6].
[Σδ ]ii
14: if pval ≥ max-p value then
15: set bf 1 := j, bf 2 := l, max-p value = pval .
16: end if
17: if bf 1 ! = −1 and bf 2 ! = −1 then {bit-flip not detected at Aij }
18: Aij = −Aij , Ail = −Ail {reverse induced bit-flip in Aij }
19: end if
20: end for
21: end for
22: end for
23: return à = A. =0
36
rectify this, we show in Thm.10 that an optimal solution of the optimisation problem p in Alg.3 is of the
form W = copt A for random Rademacher sensing matrix A, where copt = 1 − µ3 1 − n/p. This cuts
down on the runtime of the correction algorithm significantly and further allows us to run multi-stage
correction to correct all effective bit-flips.
q q
Theorem 10 Let A be a n × p Rademacher matrix, µ1 = 2 2 log(p) n , µ 2 = 2 log(2np)
np + n1 and
q
2 log(n)
µ3 = √ 2 p . Given the optimisation problem in Alg.3, if n < p, then one of the solution is
1−n/p
p
W = (1 − µ3 1 − n/p)A. ■.
The proof is given in the Appendix.
, where ∆1 A is the error matrix representing the MME’s remaining post-correction. Hence, from
(3.3), we have,
y = Âβ ∗ + η = Aβ
e ∗ + (∆A − ∆1 A)β ∗ + η = Aβ
e ∗ + δe∗ + η, (4.2)
where δe∗ ≜ (∆A − ∆1 A)β ∗ is the post-correction bit-flip vector. Here, δe∗ is also a sparse vector
however, the sparsity level is a random quantity given by r̃ = ∥δe∗ ∥0 . Here also, we estimate β ∗ and
δe∗ using the Robust Lasso estimator given as follows:
!
βeλ1 1 2
= arg min y − Aβe − δ + λ1 ∥β∥1 + λ2 ∥δ∥ ,
1 (4.3)
δλ2
e β,δ 2n 2
37
n n
√ |δ ∗ | √ |δ ∗ |
p 1−n/p i p 1−n/p i
n
√
→ ∞ as n, p → ∞. This implies we need, σ → ∞ as n, p → ∞. Hence, under
√ σ Σδii
p 1−n/p
√
∗ p 1−n/p
the condition min|δi | = ω n σ , we have that,
i∈R
√n δ∗ √n δ∗
p 1−n/p i p 1−n/p i
lim γn,p = lim 1 − Φ zα/2 − − Φ −zα/2 −
√n √n
p p
n,p→∞ n,p→∞ σ Σδii σ Σδii
p 1−n/p p 1−n/p
= 1 − {Φ(−∞) − Φ(−∞)} = 1. (4.5)
Hence, we have that power of the test for δ ∗ goes to 1 as n, p → ∞ under the condition min|δi∗ | =
√ i∈R
p 1−n/p
ω n σ .
Now, recall that, by construction, δ ∗ = ∆Aβ ∗ . Here, ∆A being the error matrix is sparse. In fact,
base on our assumption that there can be at-most one pair of bit-flips per row, min|δi∗ | = O(min|βj∗ |).
i∈R j∈S
Hence, the assumption required for the power to go to 1 is
p !
∗ p 1 − n/p
min|βj | = ω σ .
j∈S n
We will now evaluate the probability that the set of indices with effective bit-flips post-correction
is a subset of the set of indices with effective bit-flips initially. This event is represented by R̃ ⊆ R.
To find the probability of R̃ ⊆ R, we condition it on the event that all the effective bit-flips were
detected in the detection stage by the marginal tests i.e., we condition R̃ ⊆ R on R ⊆ J. Hence,
using theorem of total probability, we have,
We will now evaluate both the probabilities on the R.H.S. of (4.6) separately. Note that, {R ⊆ J}
implies that for all the indices that belong to R, the test statistic |TH,i | is rejected. Hence, the event
{R ⊆ J} is equivalent to the event {R∩Jc = ∅} which implies that none of the elements of R belongsc
to J . This implies that the event {R ∩ J = ∅} is equivalent to
c c ∗
∪ |TH,i | ≤ zα/2 |δi ̸= 0 =
i∈R
∩ |TH,i | > zα/2 |δi∗ ̸= 0 . Hence, we have,
i∈R
∗ ∗
P (R ⊆ J) = P ∩ |TH,i | > zα/2 |δi =
̸ 0 = 1 − P ∪ |TH,i | ≤ zα/2 |δi =
̸ 0
i∈R i∈R
X
P |TH,i | ≤ zα/2 |δi∗ ̸= 0
≥ 1−
i∈R
= 1 − r(1 − γn,p ). (4.7)
Here, γn,p is the power of the test for a given n, p. Now, we evaluate P ( R̃ ⊆ R|R ⊆ J). Now, using
Lemma 13, we have,
P ( R̃ ⊆ R|R ⊆ J) = P ( R̃ ∩ Rc ∩J = ∅) (4.8)
We will now define a few notations. Let us recall the in the correction algorithm Alg.5, for each i ∈ J,
for all j1 ∈ [p−1], j2 = j1 +1, . . . , p , we swap the elements the elements aij1 with aij2 . Let us denote the
new bit-flip error at this location as δi∗ (j1 , j2 ). Then we perform Lasso followed by debiasing to obtain
the debiased lasso estimate and its corresponding test statistic denoted by TH,i (j1 , j2 ). Note that, here
we assume that a wrong swap does create a non-zero bit-flip error δi∗ (j1 , j2 ). Lastly let us define the
set of tuples K = {(j1 , j2 ), j1 ∈ [p − 1], j2 = {j1 + 1, . . . , p} : {aij1 ̸= aij2 } ∩ {{βj∗1 ̸= 0} ∪ {βj∗2 ̸= 0}}}.
Note that, the event { R̃ ∩ Rc ∩J = ∅} implies that none of the elements of J\ R are in R̃. Hence,
the event { R̃ ∩ Rc ∩J = ∅} implies that for all i ∈ J \ R, the hypothesis test w.r.t. TH,i (j1 , j2 ) is
38
rejected for all j1 , j2 . Hence, we have,
∗
P ( R̃ ∩ R ∩J = ∅) = P
c
∩ ∩ |TH,i (j1 , j2 )| > zα/2 |δi (j1 , j2 ) ̸= 0
i∈J\R (j1 ,j2 )∈K
∗
= 1−P ∪ ∪ |TH,i (j1 , j2 )| ≤ zα/2 |δi (j1 , j2 ) ̸= 0
i∈J\R (j1 ,j2 )∈K
X X
P ( |TH,i (j1 , j2 )| ≤ zα/2 |δi∗ (j1 , j2 ) ̸= 0 )
≥ 1−
i∈J\R (j1 ,j2 )∈K
X X
= 1− (1 − γn,p )
i∈J\R (j1 ,j2 )∈K
(p − 1)(p − 2)
≥ 1− (k − r)(1 − γn,p ) (4.9)
2
Joining (4.6), (4.7) and (4.9), we get,
(p − 1)(p − 2)
P ( R̃ ⊆ R) ≥ {1 − r(1 − γn,p )} 1 − (k − r)(1 − γn,p ) . (4.10)
2
39
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (2,67,72,219,361,392) (2,13,44,67,72,145,219,276,361) (2,67,145,219,361)
2 (72,145,392) (13,72,145,276,392) (72,392)
3 (145) (13,145) (145)
4 ϕ (13) ϕ
Table 4.1: First β, first run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered
bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p =
500, n = 400, fσ = 0.01, s = 10, r = 6
Table 4.2: fσ = 0.01: Pre correction and post correction RRMSE, Sensitivity and Specificity and
p-value for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6
Table 4.3: First β, second run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 6
Table 4.4: First β, second run: Pre correction and post correction RRMSE, Sensitivity and Specificity,
after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n = 400, fσ =
0.01, s = 10, r = 6
Table 4.5: First β, third run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 6
40
# RRMSE Sens Sens p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0615 0.0479 1 1 0.982 0.989 1e-14
2 0.0479 0.0477 1 1 0.989 0.995 1e-9
3 0.0477 0.0416 1 1 0.995 0.995 1e-7
4 0.0477 0.0416 1 1 0.995 0.995 0.00542
Table 4.6: First β, third run: Pre correction and post correction RRMSE, Sensitivity and Specificity,
after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n = 400, fσ =
0.01, s = 10, r = 6
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (96,117,272,329,346) (23,96,117,156,213,272,329,336,346) (96,117,213,329,346)
2 (213,272) (23,96,213,276) (213,272)
3 ϕ (23) ϕ
Table 4.7: Second β, first run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 5
# RRMSE Sens Sens p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0447 0.0372 0.8 1 0.992 0.995 1e-7
2 0.0372 0.0346 1 1 0.995 0.998 0.00098
3 0.0372 0.0346 1 1 0.995 0.998 0.00928
Table 4.8: Second β, first run: Pre correction and post correction RRMSE, Sensitivity and Specificity,
after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n = 400, fσ =
0.01, s = 10, r = 5
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (96,117,272,329,346) (55 ,72,96,142,202,291,329,346) (96,202,329,346)
2 (117,202,272) (55,117,202,272,291) (117,202)
3 (272) (55,272) (272)
4 ϕ (55) ϕ
Table 4.9: Second β, second run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 5
# RRMSE Sens Sens p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0552 0.0513 0.6 0.8 0.992 0.995 1e-6
2 0.0513 0.0474 0.8 1 0.995 0.997 0.0021
3 0.0474 0.0436 1 1 0.997 0.998 0.0362
Table 4.10: Second β, second run: Pre correction and post correction RRMSE, Sensitivity and Speci-
ficity, after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 5
# Effective bitflips(B) effective bitflips detected J bitflips altered ( C)
1 (96,117,272,329,346) (82,96,117,172,209,252,329,346,397) (96,117,329,346)
2 (272) (82,172,209,272) (272)
3 ϕ (82,209) ϕ
Table 4.11: Second β, third run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 5
41
# RRMSE Sens Sens p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0602 0.0452 0.8 1 0.992 0.995 1e-13
2 0.0452 0.0427 1 1 0.995 0.997 1e-8
3 0.0452 0.0427 1 1 0.995 0.997 0.00044
Table 4.12: Second β, third run: Pre correction and post correction RRMSE, Sensitivity and Speci-
ficity, after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 5
Table 4.13: Third β, first run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 7
Table 4.14: Third β, first run: Pre correction and post correction RRMSE, Sensitivity and Specificity,
after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n = 400, fσ =
0.01, s = 10, r = 7
Table 4.15: Third β, second run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 7
Table 4.16: Third β, second run: Pre correction and post correction RRMSE, Sensitivity and Speci-
ficity, after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 7
42
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (2,67,72,219,361,392) (2,13,44,67,72,145,219,276,361) (2,67,145,219,361)
2 (72,145,392) (13,72,145,276,392) (72,392)
3 (145) (13,145) (145)
4 ϕ (13) ϕ
Table 4.17: fσ = 0.01: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, s = 10, r = 6
Table 4.18: fσ = 0.01: Pre correction and post correction RRMSE, Sensitivity and Specificity and
p-value for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6
provide the set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips ( C), after
each stage of correction in Drlt-C for RSM errors. In the second table, we provide pre correction and
post correction RRMSE, Sensitivity and Specificity and the p-value for the simultaneous-test given
in (3.42), after each stage of correction in Drlt-C for RSM errors. The fixed set of parameters are
p = 500, n = 400, fσ = 0.01, s = 10. Note that, the true number of bit-flips varies as the RSM errors
are induced in an non-adversial manner with fer = 0.2.
Table 4.19: fσ = 0.03: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, s = 10, r = 6
43
# RRMSE Sens Spec p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0822 0.0757 0.67 0.8 0.974 0.989 1e-16
2 0.0757 0.0661 0.8 1 0.989 0.992 1e-11
3 0.0661 0.0632 1 1 0.992 0.995 1e-7
4 0.0632 0.0617 1 1 0.995 0.997 0.00049
5 0.0617 0.0611 1 1 0.997 0.998 0.00225
6 0.0617 0.0611 1 1 0.997 0.998 0.0319
Table 4.20: fσ = 0.03: Pre correction and post correction RRMSE, Sensitivity and Specificity and
p-value for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6
Table 4.21: fσ = 0.05: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, s = 10, r = 6
Table 4.22: fσ = 0.05: Pre correction and post correction RRMSE, Sensitivity and Specificity and
p-value for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6
Table 4.23: s = 5: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, r = 2
Table 4.24: s = 5: Pre correction and post correction RRMSE, Sensitivity and Specificity and p-value
for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM errors. The
parameters are p = 500, n = 400, fσ = 0.01, r = 2
44
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (143,184,239,278,282) (69,127,143,177,239,275,278,282,299,375) (69,143,239,278,282)
2 (69,184) (69,177,184,299,375) (69,177,184)
3 (177) (177) (177)
4 ϕ ϕ ϕ
Table 4.25: s = 10: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, r = 5
Table 4.26: s = 10: Pre correction and post correction RRMSE, Sensitivity and Specificity and p-value
for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM errors. The
parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 5
Table 4.27: s = 15: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, r = 7
Table 4.28: s = 15: Pre correction and post correction RRMSE, Sensitivity and Specificity and p-value
for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM errors. The
parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 7
Table 4.29: Effective Bit-flips correctly altered: first column represent the indices that have true
effective bit-flips. Columns 2,3,4 represent the proportion of times that particular bitflip has been
correctly altered by the RSM correction algorithm for fσ = 0.01, fσ = 0.03 and fσ = 0.05 respectively.
The fixed parameters of the experiments are p = 500, n = 400, s = 10, r = 6. In brackets are the
proportion of the number of times that bit-flip was detected in the first stage of RSM correction.
45
Chapter 5
Conclusion
In this report, we target three objectives in three different chapters connected to the identification
and correction of Group Membership Specification Errors (MME’s) in Group Testing.
In Chapter 2, we reformulate the optimization problem to obtain M (the approximate inverse of
the covariance matrix of the rows of the sensing matrix A) in [31] and further provide an exact, closed-
form optimal solution to the reformulated problem under assumptions on the coherence of A. For
sensing matrices with i.i.d. zero-mean sub-Gaussian rows that have diagonal covariance, the debiased
Lasso estimator, based on this closed-form solution, has entries that are asymptotically zero-mean
and Gaussian. The exact solution significantly improves the time efficiency for debiasing the Lasso
estimator, as shown in the numerical results. Our method is particularly useful for debiasing in
streaming settings where new measurements arrive on the fly.
In Chapter 3, we have presented a technique for determining the sparse vector β ∗ of health status
values from noisy pooled measurements in y, with the additional feature that our technique is designed
to handle bit-flip errors in the pooling matrix. These bit-flip errors can occur at a small number of
unknown locations, due to which the pre-specified matrix A (known) and the actual pooling matrix
 (unknown) via which pooled measurements are acquired, differ from each other. We use the theory
of Lasso debiasing as our basic scaffolding to identify the defective samples in β ∗ , but with extensive
and non-trivial theoretical and algorithmic innovations to (i ) make the debiasing robust to model
mismatch errors (MMEs), and also to (ii ) enable identification of the pooled measurements that were
affected by the MMEs. Our approach is also validated by an extensive set of simulation results, where
the proposed method outperforms intuitive baseline techniques. To our best knowledge, there is no
prior literature on using Lasso debiasing to identify measurements with MMEs.
In Chapter 4, we present algorithms to correct Permutation noise as well as three different types
of Bit-flip error in Group Testing. We provide rigorous empirical results supporting the algorithms
and its capability of correcting the MME’s. We also provide a closed form solution to the optimisation
mechanism to obtain W in Drlt. The theoretical guarantees for the given correction algorithm much
stringent conditions to hold. Therefore, we want to tweak the correction algorithms in such a way so as
to not only hold the capability of the existing algorithm but also to ensure certain sensible theoretical
guarantees. This is the work that remains to be done before the Pre-synopsis Seminar.
46
Chapter 6
Appendix
1 2
Primal objective function value: The primal objective function value is given by n ∥w .j ∥2 =
(1 − µ)2 2 /n = (1 − µ) .
2
∥a ∥
.j 2
(∥a.j ∥22 /n)2 ∥a.j ∥22 /n
The Fenchel dual problem: Consider an optimization problem of the form for a fixed j ∈ [p]:
1 ⊤
inf f (w) + gj A w (6.2)
w n
where f and gj are extended real-valued convex functions. The Fenchel dual (see Chapter 3 of [8]) is
∗ 1
sup −f Au − gj∗ (−u) (6.3)
u n
where f ∗ and gj∗ are the convex conjugates of f and gj respectively. The Fenchel dual satisfies weak
duality (see Chapter 3 of [8]), i.e., for any w and u,
1 ⊤ ∗ 1
f (w) + gj A w ≥ −f Au − gj∗ (−u).
n n
In our setting, for a fixed j, we consider
(
1 0 if ∥w − ej ∥∞ ≤ µ
f (w) := ∥w∥2 and gj (w) := . (6.4)
n ∞ otherwise
Then, for the same j, we have their convex conjugates from Lemma 5:
n
f ∗ (u) = sup u⊤ w − f (w) = ∥u∥2 , (6.5)
w 4
∗ ⊤
gj (u) = sup u w − gj (w) = sup u⊤ w = uj + µ∥u∥1 . (6.6)
w ∥w−ej ∥∞ ≤µ
1 ⊤ ⊤
This gives a dual problem in the form supu − 4n u A Au + uj − µ∥u∥1 .
2(1 − µ)ej
The point u = is feasible for the dual (trivially).
∥a.j ∥22 /n
47
2(1 − µ)ej
Dual objective function value: Plugging in u = , the corresponding dual objective
∥a.j ∥22 /n
function value is
1 ⊤ ⊤ 1 4(1 − µ)2 2(1 − µ) 2(1 − µ)
− u A Au + uj − µ∥u∥1 = − ∥a.j ∥2 + −µ
4n 4n (∥a.j ∥22 /n)2 ∥a.j ∥22 /n ∥a.j ∥22 /n
(1 − µ)2 (1 − µ)2 (1 − µ)2
=− 2 +2 2 = .
∥a.j ∥2 /n ∥a.j ∥2 /n ∥a.j ∥22 /n
Since the primal solution and the dual objective function values are equal, it follows that an optimal
(1 − µ) 2(1 − µ)
solution for the primal is 2 a.j , and that an optimal solution to the dual is ej . This
∥a.j ∥2 /n ∥a.j ∥22 /n
completes the proof.
1
= sup ∥Σ1/2 v∥2 1/2
(Σ1/2 v)⊤ Σ−1/2 ai.
v∈S p−1 ∥Σ v∥2 ψ2
1
≤ sup ∥Σ1/2 v∥2 sup 1/2
(Σ1/2 z)⊤ Σ−1/2 ai.
v∈S p−1 z∈S p−1 ∥Σ z∥2 ψ2
1/2 −1/2
≤ σmax (Σ )∥Σ ai. ∥ψ2
p
≤ Cmax κ, (6.9)
where Cmax is defined in property D2. Therefore, we obtain E[a2ij ] ≤ 2∥ai. ∥2ψ2 ≤ 2Cmax κ2 . From
the definition of eigenvalues, for any x ∈ Rp , x⊤ Σx ≥ σmin (Σ)∥x∥22 ≥ Cmin ∥x∥22 . Putting x = ej ,
where ej is the j th column of Ip , we have, Σjj ≥ Cmin . Since E[a2ij ] = Σjj ≥ Cmin , we have,
h P i
E n1 ni=1 a2ij ≥ Cmin .
For a given j ∈ [p], the variables a2ij are independent for all i ∈ [n]. Hence, using the concentration
inequality of Theorem 3.1.1 and Equation (3.3) of [52], we have for t > 01 ,
2
− 2nt 4
∥a.j ∥22 /n E[∥a.j ∥22 /n]
P − ≥ t ≤ 2e 2C max κ . (6.10)
1
√
We have set c = 1/2, δ := t and K := 2 Cmax κ in Equation (3.3) and the equation immediately preceding it in [52]
48
Using the left-sided inequality of (6.10), we have,
2
− 2nt 4
P ∥a.j ∥22 /n ≤ E[∥a.j ∥22 /n] − t ≤ 2e 2Cmax
κ . (6.11)
Using E[∥a.j ∥22 /n] ≥ Cmin , (6.11) can be rewritten as follows for t > 0:
nt2
−
∥a.j ∥22 /n
2
P ≤ Cmin − t ≤ 2e 2Cmax κ4 . (6.12)
Using the union bound on (6.12) over j ∈ [p], we obtain the following lower tail bound on L:
nt2
− 2
P (L ≤ Cmin − t) ≤ 2pe 2Cmax κ4 . (6.13)
q
log p
Putting t := 2Cmax κ2 n in (6.13), we obtain:
r !!
Cmax 2 log p 2
P L ≤ Cmin 1−2 κ ≤ . (6.14)
Cmin n p
2
4Cmax κ4
For some constant c ∈ (0, 1), if n ≥ 2 (1−c)2
Cmin
log p, then (6.14) becomes:
2 2
P (L ≤ c Cmin ) ≤ =⇒ P (L ≥ c Cmin ) ≥ 1 − . (6.15)
p p
This completes the proof.
Hence, using the symmetry of the inner product and a union bound, we have
2
1 ⊤ 1 ⊤ p − 2C 2nt κ4
P max |a.l a.j | ≥ t = P max |a.l a.j | ≥ t ≤ 2 e max (6.17)
l̸=j n l<j n 2
√ q
Taking t = 2 2Cmax κ2 logn p , we have,
r !
√ 2 log p 1
P ν(A) ≥ 2 2Cmax κ ≤ . (6.18)
n p2
The following result gives the convex conjugates of the functions needed in the proof of Theorem 1.
49
Lemma 5 1. If f (w) = n1 ∥w∥22 , then its convex conjugate is f ∗ (u) = n4 ∥u∥22 .
Proof 1 1. We can write f (w) = 21 w⊤ Qw where Q := n2 Ip is positive definite (and has size p×p).
From Example 3.2.2 of [9], the convex conjugate of a positive definite quadratic form is
−1
1 1 2 n
∗
f (u) = u⊤ Q−1 u = u⊤ Ip u= ∥u∥22 .
2 2 n 4
where C = {w ∈ Rp | ∥w − ej ∥∞ ≤ µ}. This implies that P wi ∈ [eji − µ, eji + µ], ∀i. (Note that
eij = 1 if i = j and 0 otherwise.) To maximize u⊤ w = pi=1 ui wi , the optimal wi can be chosen
as (
eji + µ if ui ≥ 0,
wi = (6.21)
eji − µ if ui < 0.
Substituting into u⊤ w, we obtain u⊤ w = P pi=1 ui eji + µ sign(ui ) , where sign(ui ) is the sign
P
√ β∗ √
+ η = Aβ ∗ + ne∗ + η,
y = A | nIn ∗ √ (6.23)
δ / n
√
where e∗ ≜ δ ∗ / n. Note that the optimization problem (3.6) is
! 2
√
β̂λ1 1 β δ
= arg min y − (A| nIn ) √ + λ1 ∥β∥1 + λ̃2 √ ,
δ̂λ̃2 β,δ 2n δ/ n 2 n 1
√
where λ̃2 = nλ2 . The equivalent robust Lasso optimization problem for the model (6.23) is given
by:
2
√
β̂λ1 1 β
= arg min y − (A| nIn ) + λ1 ∥β∥1 + λ̃2 ∥e∥1 , (6.24)
êλ̃2 β,e 2n e 2
√
where êλ2 = δ̂λ2 / n. In order to prove Theorem 4, we first recall the Extended Restricted Eigenvalue
Condition (EREC) for a sensing matrix from [42]. Given β ∗ and δ ∗ , let us define sets
50
Note that s ≜ |S|, r ≜ |R|.
Extended Restricted Eigenvalue Condition (EREC) [42]: Given S, R as defined in (6.25),
and λ1 , λ̃2 > 0, an n × p matrix A is said to satisfy the EREC if there exists a κ > 0 such that
1 √
√ ∥Ahβ + nhδ ∥2 ≥ κ(∥hβ ∥2 + ∥hδ ∥2 ), (6.26)
n
for all (hβ , hδ ) ∈ C(S, R, λ) with λ := λ̃2 /λ1 and where C is defined as follows:
Here, (hβ )S and (hδ )R are s and r dimensional vectors extracted from hβ and hδ respectively,
restricted to the set S and R as defined in (6.25). ■ In Lemma. 6, we extend Lemma 1 from [42] to
random Rademacher matrices. In this lemma we show that a random Rademacher matrix A satisfies
EREC with high probability for κ = 1/16.
Lemma 6 Let A be an n × p matrix with i.i.d. Rademacher entries. There exist positive constants
C1 , C2 , c3 , c4 such that if n ≥ C1 s log p and r ≤ min{C2 logn n , slog
log p
n } then
√
1 1
P ∀ (hβ , hδ ) ∈ C (S, R, λ) , √ ∥Ahβ + nhδ ∥2 ≥ (∥hβ ∥2 + ∥hδ ∥2 ) ≥ 1 − c3 exp {−c4 n},
n 16
q
where λ = log n
log p and C as in (6.27). ■
Proof of Lemma 6: Using a similar line of argument as in the proof of Lemma 1 of [42], it is enough
to show the following two properties of the sensing matrix A to complete the proof.
1 2
1. Lower bound on + ∥hδ ∥22 . For some κ1 > 0 with high probability,
n ∥Ahβ ∥2
s !
1 2 log n
2 2
∥Ahβ ∥2 + ∥hδ ∥2 ≥ κ1 (∥hβ ∥2 + ∥hδ ∥2 ) ∀ (hβ , hδ ) ∈ C S, R, . (6.28)
n log p
2. Mutual Incoherence: The column space of the matrix A is incoherent with the column space
of the identity matrix. For some κ2 > 0 with high probability,
s !
1 log n
√ |⟨Ahβ , hδ ⟩| ≤ κ2 (∥hβ ∥2 + ∥hδ ∥2 ) 2
∀ (hβ , hδ ) ∈ C S, R, . (6.29)
n log p
The proof is completed if κ1 > 2κ2 . We now show that (6.28) and (6.29) hold together with κ1 > 2κ2
for a Rademacher sensing matrix A.
We now state two important facts on the Rademacher matrix A which will be used in proving (6.28)
and (6.29) respectively.
(1) We use a result following Lemma 1 [34] (see the equation immediately following Lemma 1 in [34],
and set D̄ in that equation to the identity matrix, since we are concerned with signals that are
sparse in the canonical basis). Using this result, there exist positive constants c2 , c′3 , c′4 , such that
with probability at least 1 − c′3 exp {−c′4 n}:
r
1 ∥hβ ∥2 log p
√ ∥Ahβ ∥2 ≥ − c2 ∥hβ ∥1 ∀ hβ ∈ Rp . (6.30)
n 4 n
51
(2) From Theorem 4.4.5 of [52], for a s × r′ dimensional Rademacher matrix ARi Sj , there exists a
constant c1 > 0 such that, for any τ ′ > 0, with probability at least 1 − 2 exp {−nτ ′2 } we have
r !
r′
r
1 1 s ′
√ ∥ARi Sj ∥2 = √ σmax (ARi Sj ) ≤ c1 + +τ . (6.31)
n n n n
7 2
2 2 2
Throughout this proof, we take the constants C1 ≜ (24c 2 and C2 ≜ max{32 c2 , 4(51200c1 ) }, where
2)
c1 , c2 are as defined in (6.30) and (6.31) respectively.
Proof of (6.28): We first obtain a lower bound on n1 ∥Ahβ ∥22 using (6.30). For every (hβ , hδ ) ∈
q
C S, R, log n
log p , we have:
s s
log n √ log n √
∥hβ ∥1 ≤ 4∥(hβ )S ∥1 + 3 ∥(hδ )S ∥1 ≤ 4 s∥hβ ∥2 + 3 r∥hδ ∥2 . (6.32)
log p log p
′ ′
Substituting (6.32)
in
q(6.30), we obtain that, with probability at least 1 − c3 exp {−c4 n}, for every
(hβ , hδ ) ∈ C S, R, log n
log p :
r ! sr
1 1 s log p log n r log p
√ ∥Ahβ ∥2 ≥ − 4c2 ∥hβ ∥2 − 3c2 ∥hδ ∥2 ,
n 4 n log p n
r ! s r !
1 1 s log p log n r log p
∴ √ ∥Ahβ ∥2 + ∥hδ ∥2 ≥ − 4c2 ∥hβ ∥2 + 1 − 3c2 ∥hδ ∥2(6.33)
.
n 4 n log p n
Under the assumption n ≥ C1 s log p, the first term in the brackets of (6.33) is greater than 18 . Again,
under the assumption r ≤ C2 logn n , the second term is greater than 18 . Thus we have, √1n ∥Ahβ ∥2 +
∥hδ ∥2 ≥ 18 (∥hβ ∥2 + ∥hδ ∥2 ). Squaring both sides, we have, 1 2
n ∥Ahβ ∥2 + ∥hδ ∥22 + √2 ∥Ahβ ∥2 ∥hδ ∥2
n
≥
1 2
64(∥hβ ∥2 + ∥hδ ∥2 ) . Using the fact that ∥a∥22 + ∥b∥22 ≥ 2∥a∥2 ∥b∥2 for any vectors a, b, we have,
2 n1 ∥Ahβ ∥22 + ∥hδ ∥22 ≥ n1 ∥Ahβ ∥22 + ∥hδ ∥22 + √2n ∥Ahβ ∥2 ∥hδ ∥2 ≥ 64 1
(∥hβ ∥2 + ∥hδ ∥2 )2 . Hence we
q
have with probability at least 1 − c′3 exp {−c′4 n}, for every (hβ , hδ ) ∈ C S, R, log n
log p
1 1
∥Ahβ ∥22 + ∥hδ ∥22 ≥ (∥hβ ∥2 + ∥hδ ∥2 )2 . (6.34)
n 128
Therefore, we have κ1 = 1/128 completing the proof of (6.28).
Proof of (6.29): This part of the proof directly follows the proof of Lemma 2 in [42], with a few minor
differences in constant factors. Nevertheless, we are including it here to make the paper self-contained.
Divide the set {1, 2, . . . , p} into subsets S1 , S2 , . . . , Sq of size s each, such that the first set S1
contains s largest absolute value entries of hβ indexed by S, the set S2 contains s largest absolute
value entries of the vector (hβ )Sc , S2 contains the second largest s absolute value entries of (hβ )Sc ,
and so on. By the same strategy, we also divide the set {1, 2, . . . , n} into subsets R1 , R2 , . . . , Rk such
that the first set R1 contains r entriesq of hδ indexed by R and sets R2 , R3 , . . . are of size r ≥ r. We
′
log n
have for every (hβ , hδ ) ∈ C S, R, log p ,
1 X 1 1 X
√ |⟨Ahβ , hδ ⟩| ≤ √ |⟨ARi Sj (hβ )Sj , (hδ )Ri ⟩| ≤ max √ ∥ARi Sj ∥2 ∥(hβ )Sj ′ ∥2 ∥(hδ )Ri′ ∥2
n n i,j n ′ ′
i,j i ,j
(6.35)
!
1 X X
= max √ ∥ARi Sj ∥2 ∥(hβ )Sj ′ ∥2 ∥(hδ )Ri′ ∥2 .
i,j n ′ ′
j i
(6.36)
52
Note that ARi Sj (a submatrix of A containing rows belonging to Ri and columns belonging to Sj ) is
values of Sj
itself a Rademacher matrix with i.i.d. entries. Taking the union bound over all possible
and Ri , we have that the inequality in (6.31) holds with probability at least 1−2 rn′ ps exp (−nτ ′2 ). If
n ≥ 4τ ′ −2 s log(p) we obtain, ps ≤ ps ≤ exp(τ ′ 2 n/4). Furthermore, if we assume, n ≥ 4τ ′ −2 r′ log(n),
′
we have rn′ ≤ nr ≤ exp(τ ′ 2 n/4). Later we will give a choice of τ ′ which ensures that these conditions
are satisfied. Therefore, we obtain with probability at least 1 − 2 exp {−nτ ′2 /2},
r !
r′
r
1 s ′
maxi,j √ ∥ARi Sj ∥2 ≤ c1 + +τ . (6.37)
n n n
Pq
Using the first inequality in the last equation of Sectionq 2.1 of [10] we obtain i=3 ∥(hβ )Si ∥2 ≤
log n
√
√ ∥(hβ )Sc ∥1 . Furthermore, for every (hβ , hδ ) ∈ C S, R,
1
log p , we have ∥(hβ )S ∥1 ≤ 3 s∥hβ ∥2 +
c
s
n√
q
3 log
log p r∥hδ ∥2 . Hence,
q q q
s r
X X X log n r
∥(hβ )Si ∥2 = ∥(hβ )S1 ∥2 +∥(hβ )S2 ∥2 + ∥(hβ )Si ∥2 ≤ 2∥hβ ∥2 + ∥(hβ )Si ∥2 ≤ 5∥hβ ∥2 +3 ∥hδ ∥2 .
log p s
i=1 i=3 i=3
Pk
√1 ∥(hδ ) Rc ∥1 . Furthermore, for every
Following a similar process we obtain i=3 ∥(hδ ) Ri ∥2 ≤ r′
q
log n
(hβ , hδ ) ∈ C S, R, log p , we have √ ′ ∥(hδ )Rc ∥1 ≤ 3 r′ q log n ∥hβ ∥2 + 3 rr′ ∥hδ ∥2 . Since r′ ≥ r,
1
ps 1 p
r
log p
k k k r
X X X 3 s
∥(hδ )Ri ∥2 = ∥(hδ )R1 ∥2 +∥(hδ )R2 ∥2 + ∥(hδ )Ri ∥2 ≤ 2∥hδ ∥2 + ∥(hδ )Ri ∥2 ≤ 5∥hδ ∥2 + q ∥hβ ∥2 .
log n r′
i=1 i=3 i=3 log p
′2
Hence, joining (6.37) q into(6.35), we obtain with probability at least 1 − 2 exp {−nτ /2}, for
, (6.31)
every (hβ , hδ ) ∈ C S, R, log n
log p
r ! s !
r′
r r r
1 s log n r 3 s
√ |⟨Ahβ , hδ ⟩| ≤ c1 + + τ′ × 5∥hβ ∥2 + 3 ∥hδ ∥2 × 5∥hδ ∥2 + q ∥hβ ∥2 (6.38
.
n n n log p s log n r′
log p
q q q
r′
Recall that r ≤ slog log p
n , by assumption. Taking r ′ = s log p leads to
log n
log n p r
log p s ≤ log n
log p s = 1 and
1 s ′2
p
r′ = 1. Thus, we obtain with probability at least 1 − 2 exp(−nτ /2) for every (hβ , hδ ) ∈
q
log n
log p
q
C S, R, log n
log p ,
r !
r′
r
1 s
√ |⟨Ahβ , hδ ⟩| ≤ 25c1 + + τ′ × (∥hβ ∥2 + ∥hδ ∥2 )2 (6.39)
n n n
Let τ ′ ≜ 1/(51200c1 ). Recall that, C1 ≜ max{322 c22 , 4(51200c1 )2 }. Then n ≥ C1 s log p implies
n ≥ 4τ ′ −2 s log p = 4τ ′ −2 r′ log n. Furthermore,
r s
r′
r
s s log p
≤ = ≤ τ ′ /2. (6.40)
n n n log n
q
Therefore, we have with probability at least 1−2 exp(−nτ ′2 /2), for every (hβ , hδ ) ∈ C S, R, log n
log p
1 1
√ |⟨Ahβ , hδ ⟩| ≤ 25c1 × 2τ ′ (∥hβ ∥2 + ∥hδ ∥2 )2 ≤ (∥hβ ∥2 + ∥hδ ∥2 )2 (6.41)
n 512
53
Now, from (6.34) and (6.41), using a union bound, we obtain with probability at least 1 −
(c′3 exp(−c′4 n)
+ 2 exp(−nτ ′2 /2)),
1 √
∥Ahβ + nhδ ∥22 ≥ (κ1 − 2κ2 )(∥hβ ∥2 + ∥hδ ∥2 )2 = κ2 (∥hβ ∥2 + ∥hδ ∥2 )2 (6.42)
n
Taking c3 = c′3 + 2 and c4 = min{c′4 , τ ′2 /2}, we have 1 − (c′3 exp(−c′4 n) + 2 exp(−τ ′2 n/2) ≥ 1 −
c3 exp(−c4 n). √
Note that, we have, κ = κ1 − 2κ2 = 1/16. Taking the root over (6.42), we obtain with probability
at least 1 − c3 exp(−c4 n),
s !
1 √ 1 log n
√ ∥Ahβ + nhδ ∥2 ≥ (∥hβ ∥2 + ∥hδ ∥2 ) ∀ (hβ , hδ ) ∈ C S, R, .
n 16 log p
This completes the proof of the lemma. ■
Proof of Theorem 4
Proof of (3.7): We now derive the bound for the l1 norm of the robust lasso √
estimate of the√error
4σ √log p
∗
β̂λ1 −β given by the optimisation problem (3.6). Recall that we have λ1 = n
and λ2 = 4σ nlog n .
√ √
We choose λ̃2 ≜ nλ2 = 4σ √logn
n
. We use λ̃2 to define the cone constraint in (6.27). Note that, in
the proof of Theorem 1 of [42], it is shown that hβ ≜ β̂λ1 − β ∗ and hδ ≜ √1 (δ̂λ
n 2 − δ ∗ ) satisfies the
cone constraint given in (6.27). Therefore, we have
λ̃2 λ̃2
∥(hβ )Sc ∥1 + ∥(hδ )Rc ∥1 ≤ 3(∥(hβ )S ∥1 + ∥(hδ )R ∥1 ). (6.43)
λ1 λ1
Now by using Eqn. (6.43), we have
λ̃2 √ √ λ̃2
∥hβ ∥1 = ∥(hβ )S ∥1 + ∥(hβ )Sc ∥1 ≤ 4∥(hβ )S ∥1 + 3 ∥(hδ )R ∥1 ≤ 4 s∥hβ ∥2 + 3 r ∥hδ (6.44)
∥2 .
λ1 λ1
√ √
Here, the last inequality of Eqn.(6.44) holds since ∥(hβ )S ∥1 ≤ s∥hβ ∥2 and ∥(hδ )R ∥1 ≤ r∥hδ ∥2 .
√ √ √
Note that, max{ s, r} ≤ s + r. Based on the values of λ1 , λ̃2 , we have λ̃2 < λ1 since n < p.
Hence, by using Eqn.(6.44), we have
√ √ √
∥hβ ∥1 ≤ 4 s∥hβ ∥2 + 3 r∥hδ ∥2 ≤ 4 s + r (∥hβ ∥2 + ∥hδ ∥2 ) . (6.45)
∗ δ̂
Recall that, e∗ = √δ
n
and ê = √λ̃n2 in Theorem 1 of [42]. Therefore, by the equivalence of the model
given in (6.23) and the optimisation problem in (3.6) with that of [42], we have
1 √ √
∥β̂λ1 − β ∗ ∥2 + √ (δ̂λ̃2 − δ ∗ ) ≤ 3κ−2 max{λ1 s, λ̃2 r}, (6.46)
n 2
as long as
2∥A⊤ η∥∞ 2∥η∥∞
≤ λ1 , and √ ≤ λ̃2 . (6.47)
n n
Therefore when (6.47) holds, then by using (6.45) (recall hβ = β̂λ1 − β ∗ ) and (6.46), we have
√ √ √
∥β̂λ1 − β ∗ ∥1 ≤ 4 s + r 3κ−2 max{λ1 s, λ̃2 r} ≤ 12κ−2 (s + r)max{λ1 , λ̃2 } ≤ 12κ−2 (s + r)λ(6.48)
1.
54
Using (6.49),(6.50) with Bonferroni’s inequality in (6.48), we have:
r !
log(p) 1 1
P ∥β̂λ1 − β ∗ ∥1 ≤ 48κ−2 (s + r)σ ≥1− − . (6.51)
n n p
Given the optimal solutions β̂λ1 and δ̂λ2 of (3.6), δ̂λ2 can also be viewed as
n
1 X
δ̂λ2 = argminδ {(yi − ai. β̂λ1 − δi )2 } + λ2 ∥δ∥1 (6.53)
2n
i=1
Thus (6.53) can also be viewed as Lasso estimator for z = In δ ∗ + ϱ, where z ≜ y − A⊤ β̂λ1
and ϱ ≜ A(β ∗ − β̂λ1 ) + η with δ ∗ being r-sparse. By using Theorem 11.1(b) of [25] , we have if
λ2 ≥ 2 ∥ϱ∥∞
n , √
∗ 3 rλ2
δ̂λ2 − δ ≤ , (6.54)
2 γr
where γr is the Restricted eigenvalue constant of order r which equals one for In . Now using the
result in Lemma 11.1 of [25], when λ2 ≥ 2 ∥ϱ∥ ∞
n , then
55
6.2.2 Proofs of Theorems and Lemmas on Debiased Lasso
Proof of Theorem 5
Note that we have chosen W = A. Now, recall the expression for β̂W from (3.14) and model as given
in (3.5), we have
∗ 1 ⊤ 1 ⊤ 1
β̂W − β = A η + Ip − A A (β̂λ1 − β ∗ ) + A⊤ δ ∗ − δ̂λ2 . (6.60)
n n n
In Lemma 7, (6.64) and (6.65) show that the second and third term on the RHS of (6.60) are negligible
as n, p increases in probability. Therefore, in view of Lemma 7, we have
√ 1
n(β̂W j − βj∗ ) = √ a⊤ η + oP (1), (6.61)
n .j
where a.j denotes the j th column of matrix A. Given a.j , by using the Gaussianity of η, the first
a⊤
.j a.j
term on the RHS of (6.61) is a Gaussian random variable with mean 0 and variance σ 2 n . Since
a⊤
.j a.j
n = 1, the first term on the RHS is N(0, σ 2 ). This completes the proof of result (1) of the theorem.
We now turn to result (2) of the theorem. By using a similar decomposition argument as in the
case of β̂W in (6.60) and using the expression of δ̂W in (3.15), we have
1 1 1
δ̂W − δ ∗ = In − AA⊤ η + In − AA⊤ A(β ∗ − β̂λ1 ) − AA⊤ δ ∗ − δ̂λ2 .
(6.62)
n n n
⊤
We have ΣA = In − n1 AA⊤ In − n1 AA⊤
.From (6.66)
and (6.67) of Lemma 7, the second and
√
p 1−n/p
third terms on the RHS of (6.62) are both oP n . Therefore, using Lemma 7, we have for
any i ∈ [n]
δ̂W i − δi∗
⊤ !
In − n1 AA⊤
p
i.
η 1 p 1 − n/p
p = p + oP p . (6.63)
ΣAii ΣAii ΣAii n
As η is Gaussian, the first term on the RHS of (6.63) is a Gaussian random variable with mean 0 and
n2
variance σ 2 . In Lemma 8, we show that p(p−n) ΣAii converges to 1 in probability if A is a Rademacher
matrix. This implies that the second term on the RHS of (6.63) is oP (1). This completes the proof of
result (2). ■
√ √
Lemma 7 Let β̂ λ1 , δ̂λ2 be as in (3.6) and set λ1 ≜ 4σ √logn
p
, λ2 ≜ 4σ log n
n .
Given A is a Rademacher
matrix, if n is o(p) and n is ω[((s + r) log p)2 ], then as n, p → ∞ we have following:
√
1 ⊤
n Ip − A A (β ∗ − β̂λ1 ) = oP (1) (6.64)
n ∞
1
√ A⊤ δ ∗ − δ̂λ2 = oP (1) (6.65)
n ∞
n 1
p In − AA⊤ A(β ∗ − β̂λ1 ) = oP (1) (6.66)
p 1 − n/p n
∞
n 1
AA⊤ δ ∗ − δ̂λ2
p = oP (1) (6.67)
p 1 − n/p n ∞
56
Proof of Lemma 7:
When n is ω[((s+r) log p)2 ], the assumptions of Lemma 6 are satisfied. Hence, the Rademacher matrix
A satisfies the assumptions of Theorem 4 with probability that goes to 1 as n, p → ∞. Therefore, to
prove the results, it suffices to condition on the event that the conclusion of Theorem 4 holds.
Proof of (6.64): Using result (4) of Lemma 11, we have:
√ √
1 ⊤
n Ip − A A (β ∗ − β̂λ1 ) ≤ n|A⊤ A/n − Ip |∞ ∥β ∗ − β̂λ1 ∥1 . (6.68)
n ∞
From result (1) of Lemma 10, result (1) of Theorem 4, and result (5) of Lemma 11 , we have,
√
1 ⊤ ∗ log p
n Ip − A A (β − β̂λ1 ) = OP (s + r) √ . (6.69)
n ∞ n
Since A is a Rademacher matrix, we have, √1 A⊤ = √1 . From result (2) of Theorem 4 and result
n ∞ n
(5) of Lemma 11, we have
√
1 r log n
√ A⊤ (δ ∗ − δ̂λ2 ) = OP . (6.71)
n ∞ n3/2
As n, p → ∞, we have
1
√ A⊤ (δ ∗ − δ̂λ2 ) = oP (1). (6.72)
n ∞
Proof of (6.66): Again using result (4) of Lemma 11, we have,
n 1 ⊤ ∗ n 1 1 ⊤
p In − AA A(β − β̂λ1 ) ≤p × In − AA A ∥β ∗ − β̂λ1 ∥(6.73)
1.
p 1 − n/p n 1 − n/p p n ∞
∞
By using result (5) of Lemma 11, result (1) of Theorem 4 and result (2) of Lemma 10, we have
s !r !
n 1 ⊤ ∗ n log(pn) 1 log p
p In − AA A(β − β̂λ1 ) ≤ OP (s + r) p +
p 1 − n/p n 1 − n/p pn n n
∞
s r r !
(s + r)n log(pn) log p (s + r)n 1 log p
= OP p +p
1 − n/p pn n 1 − n/p n n
s r !
(s + r) log(np) log(p) (s + r) log p
= OP p +p (6.74)
.
1 − n/p p 1 − n/p n
n 1 1 1
AA⊤ δ ∗ − δ̂λ2 × AA⊤ δ ∗ − δ̂λ2
p ≤p . (6.76)
p 1 − n/p n ∞
1 − n/p p ∞ 1
57
Since A is a Rademacher matrix, the elements of p1 AA⊤ lie between −1 and 1. Therefore, 1 ⊤
p AA ∞ =
1. By using part (5) of Lemma 11 and result (2) of Theorem 4, we have
√ !
n 1 r log n
AA⊤ δ ∗ − δ̂λ2
p = OP p √ . (6.77)
p 1 − n/p n ∞
1 − n/p n
Lemma 8 Let A be a Rademacher matrix and ΣA be as defined in (3.18). If n log n is o(p), we have
as n, p → ∞ for any i ∈ [n],
n2 P
2
ΣAii → 1. (6.79)
p (1 − n/p)
⊤
Proof of Lemma 8: Recall that from (3.18), ΣA = In − n1 AA⊤ In − n1 AA⊤ . Note that for
i ∈ [n], we have
n2 n2
2 ⊤ 1 ⊤ ⊤
ΣA = 1 − ai. ai. + 2 ai. A Aai.
p2 (1 − n/p) ii p2 (1 − np ) n n
2 2
⊤ n ⊤
n a a
i. i.
X n a a
i. k.
= q 1− + q
p (1 − ) n n n
p (1 − ) n
p k=1 p
k̸=i
2
n
ai. a⊤
n X n k.
= 1− + q . (6.80)
p p (1 − n
) n
k=1 p
k̸=i
n2 n
2
ΣAii ≥ 1 − . (6.81)
p (1 − n/p) p
2(n − 1)
≥ 1− . (6.83)
n2
58
Pn
The last inequality comes using Bonferroni’s inequality which states that P (∩ni=1 Ui ) ≥ 1 − i=1 P (Ui )
for any events U1 , U2 , ..., Un . Therefore by using (6.80) and (6.83), we have
n2
n 4(n − 1) 2 log(n) 2
P 2
ΣAii ≤ 1 − + ≥1− (6.84)
p (1 − n/p) p 1 − n/p p n
n2
n n 4(n − 1) 2 log(n) 2
P 1− ≤ 2 ΣAii ≤ 1 − + ≥1− (6.85)
p p (1 − n/p) p 1 − n/p p n
n 4(n−1) 2 log(n)
Since n log n is o(p), as n, p → ∞, 1 − p → 1 and 1−n/p p → 0. This completes the proof. ■
Now we proceed to the results involving debiasing using the optimal weights matrix W obtained
from Alg. 3. The proofs of these results largely follow the same approach as that for W = A (i.e.
Theorem 5). However there is one major point of departure—due to differences in properties of the
weights matrix W designed from Alg. 3 (given in Lemma 9), as compared to the case where W = A
(given in Lemma 10).
Proof of Theorem 6
Proof of (3.23): Using Result (4) of Lemma 11, we have
√ √
1 ⊤
n Ip − W A (β ∗ − β̂λ1 ) ≤ n∥W ⊤ A/n − Ip |∞ ∥β ∗ − β̂λ1 ∥1 . (6.86)
n ∞
Using Result (2) of Lemma 9, Result (1) of Theorem 4 and Result (5) of Lemma 11, we have
√
1 ⊤ ∗ log p
n Ip − W A (β − β̂λ1 ) = OP (s + r) √ . (6.87)
n ∞ n
1 1
√ W ⊤ (δ ∗ − δ̂λ2 ) ≤ √ W⊤ ∥δ ∗ − δ̂λ2 ∥1 .
n ∞ n ∞
Using Result (3) of Lemma 9, Result (2) of Theorem 4 and Result (5) of Lemma 11, we have
√
1 r log n
√ W ⊤ (δ ∗ − δ̂λ2 ) = OP = oP (1). (6.89)
n ∞ n3/2
Using Result (4) of Lemma 9, Result (1) of Theorem 4 and Result (5) of Lemma 11, we have
s !r !
n 1 (s + r)n log(pn) 1 log p
p In − W A⊤ A(β ∗ − β̂λ1 ) ≤ OP p +
p 1 − n/p n 1 − n/p pn n n
∞
s r !
(s + r) log(np) log(p) (s + r) log p
= OP p +p (6.91).
1 − n/p p 1 − n/p n
59
Since n is o(p) and n is ω[((s + r) log p)2 ], we have
n 1 ⊤
p In − W A A(β ∗ − β̂λ1 ) = oP (1). (6.92)
p 1 − n/p n
∞
n 1 1 1
W A⊤ δ ∗ − δ̂λ2 × W A⊤ δ ∗ − δ̂λ2
p ≤p (6.93)
p 1 − n/p n ∞
1 − n/p p ∞ 1
Using Result (5) of Lemma 9, Result (2) of Theorem 4 and Result (5) of Lemma 11, we have
s !√ !
n 1 r n log(np) log n
W A⊤ δ ∗ − δ̂λ2
p = OP p +1
p 1 − n/p n 1 − n/p p n3/2
∞
s r √ !
r log(np) log n r log n
= OP p +p (6.94)
.
1 − n/p p n 1 − n/p n3/2
n 1
W A⊤ δ ∗ − δ̂λ2
p = oP (1) (6.95)
p 1 − n/p n ∞
Proof of Theorem 7
2 ⊤
Result(1): Recall that W is the output of Alg. 3, Σβ = σn W ⊤ W , Σδ = σ 2 In − n1 W A⊤ In − n1 W A⊤
q
and µ1 = 2 2 log p
n . We will derive the lower bound of Σβjj for all j ∈ [p] following the same idea as
σ2 ⊤
in the proof of Lemma 12 of [31]. Note that, Σβjj = For all j ∈ [p], from (6.120) of result
n w.j w.j .
2 2 3
(2) of Lemma 9, for any feasible W with probability at least 1 − + + , we have
p2 n2 2np
1 ⊤
1− a w.j ≤ µ1 =⇒ 1 − µ1 ≤ a⊤
.j w.j .
n .j
For any feasible W of Alg.3, we have for any c > 0,
1 ⊤ 1 ⊤
⊤
1 ⊤
⊤
w w.j ≥ w w.j + c(1 − µ1 ) − c a.j w.j ≥ min n w w.j + c(1 − µ1 ) − c a.j w.j
n .j n .j w.j ∈R n .j
⊤
c2 a.j a.j c2 a.j ⊤ a.j
1 ⊤
= min n (w.j − ca.j /2) (w.j − ca.j /2) + c(1 − µ1 ) − ≥ c(1 − µ1 ) −
w.j ∈R n 4 n 4 n
c2
= c(1 − µ1 ) − .
4
We obtain the last inequality by putting w.j = ca.j /2 which makes the square term 0. The rightmost
equality is because a.j ⊤ a.j = n. The lower bound on the RHS is maximized for c = 2(1 − µ1 ).
2 2 1
Plugging in this value of c, we obtain the following with probability atleast 1 − p2
+ n2
+ 2np :
1 ⊤
w w.j ≥ (1 − µ1 )2 .
n .j
Hence, from the above equation and (6.115), we obtain the lower bound on Σβjj for any j ∈ [p] as
follows:
2 2
2 2 1
P Σβjj ≥ σ (1 − µ1 ) ≥ 1 − + + . (6.96)
p2 n2 2np
60
Furthermore from Result (1) of Lemma. 9, we have
⊤
P w.j w.j /n ≤ 1 ∀j ∈ [p] = 1. (6.97)
⊤w
w.j
⊤ w /n ≤ 1) = 1. As Σ 2 .j
We use (6.97) to get, for any j ∈ [p], P (w.j .j βjj = σ n , we have for any
j ∈ [p]:
P Σβjj ≤ σ 2 = 1.
(6.98)
Using (6.98) with (6.96), we obtain for any j ∈ [p],
2 2 1
P σ 2 (1 − µ1 )2 ≤ Σβjj ≤ σ 2 ≥ 1 −
+ + .
p2 n2 2np
P
Now under the assumption n is ω[((s + r) log p)2 ], µ1 → 0. Hence, we have, Σβjj → σ 2 . This completes
the proof of Result (1). q
2 log(n)
Result (2): Recall that µ3 = √ 2 p . Now in order to obtain the upper and lower bounds
1−n/p
for Σδii for any i ∈ [n], we use Result (6) of Lemma. 9. We have,
⊤
n WA p 2 2 1
P q − In ≤ µ3 ≥ 1 − + + . (6.99)
p 1 − np n n ∞ p2 n2 2np
61
2 2 1
Using (6.103) in (6.100) yields the following inequality with probability at least 1 − p2
+ n2
+ 2np ,
2
!2
n
n2 wi. a⊤ wi. a⊤
n X n
Σδii = σ2 q i.
−1 +σ 2 k.
p2 (1 − np )
p
p (1 − np ) n p 1 − n/p n
k=1,k̸=i
2
2
wi. a⊤
r
n n
≥ σ2 q i.
− 1 ≥ σ2 1 − − µ3 .
p (1 − np ) n p
We need to now derive an upper bound on Σδii . By the same argument as before, we have from (6.99)
!
n 2 2 1
P vii ≤ µ3 ≥ 1− + + ,
p2 n2 2np
p
p 1 − n/p
!
wi. ai. ⊤
r
n n 2 2 1
=⇒ P − 1 − 1 − ≤ µ3 ≥ 1− + + ,
p2 n2 2np
p
p 1 − n/p n p
!
wi. ai. ⊤
r
n n 2 2 1
=⇒ P − 1 ≤ 1 − + µ3 ≥ 1− + + .
p2 n2 2np
p
p 1 − n/p n p
wi. a⊤
Again for i ∈ [n], k ∈ [n], k ̸= i, vij = n .
We have from (6.99),
k.
!
wi. a⊤
n k. 2 2 1
P ≤ µ3 ≥ 1 − + + .
p2 n2 2np
p
p 1 − n/p n
2
!2
n
n2 wi. a⊤ wi. a⊤
2 n X n
Σδii = σ i.
− 1 + σ2 k.
p2 (1 − np )
q p
p (1 − np ) n p 1 − n/p n
k=1,k̸=i
r 2
2 n
≤ σ 1 − + µ3 + σ 2 (n − 1)µ23 .
p
The last inequality holds with probability at least 1 − 2 p22 + 2
n2
+ 1
2np . Hence, we have
2 !
n2
r
n 2 2 1
P Σδ ≤ σ 2 2 2
1 − + µ3 + σ (n − 1)µ3 ≥ 1 − 2 + + . (6.105)
p2 (1 − np ) ii p p2 n2 2np
Using (6.105) with (6.104), we obtain the following using the union bound, for all i ∈ [n],
2 2 !
2
r r
2 n n 2 n 2 2 2 2 1
P σ 1 − − µ3 ≤ 2 Σδ ≤ σ 1 − + µ3 + σ (n − 1)µ3 ≥ 1−3 + + .
p p (1 − np ) ii p p2 n2 2np
q 2
Therefore, under the assumption n log n is o(p), we have, (n−1)µ23 = (n−1) logp n → 0 and 1− n
p + µ3 →
n2 P
1. Hence, we have p2 (1− n
Σ →
) δii
σ 2 . This completes the proof. ■
p
62
Proof of Theorem 8
Let W be the output of Alg. 3. Using the definition of β̂W from (3.14) and the measurement model
from (3.5), we have
∗ 1 ⊤ 1 ⊤ 1
β̂W − β = W η + Ip − W A (β̂λ1 − β ∗ ) + W ⊤ δ ∗ − δ̂λ2 . (6.106)
n n n
√
Using Results (1) and (2) of Theorem 6, the second and third term on the RHS of (6.106) are oP (1/ n).
2
Recall that Σβ = σn W ⊤ W . Therefore, we have
√ √1 w ⊤ η
n(β̂W j − βj∗ ) n .j
q
p = p + oP 1/ Σβjj , (6.107)
Σβjj Σβjj
where w.j denotes the j th column of matrix W . As η is Gaussian, the first term on the RHS of
(6.107) is a Gaussian random variable with mean 0 and variance 1. Using Result (1) of Theorem 7,
Σβjj converges to σ 2 in probability. This completes the proof of Result (1).
Using (3.15) and the measurement model (3.5), we have
∗ 1 ⊤ 1 ⊤ 1
A(β ∗ − β̂λ1 ) − W A⊤ δ ∗ − δ̂λ2 . (6.108)
δ̂W − δ = In − W A η + In − W A
n n n
Using
√Results(3) and (4) of Theorem 6, the second and third term on the RHS of (6.62) are both
p 1−n/p 2 I − 1 W A⊤ 1 ⊤ ⊤
oP n . Recall from (3.38), that Σδ = σ n n I n − n W A . Therefore, we
have
δ̂W i − δi∗
⊤ !
In − n1 W A⊤
p
i.
η 1 p 1 − n/p
p = p + oP p . (6.109)
Σδii Σδii Σδii n
As η is Gaussian, the first term on the RHS of (6.109) is a Gaussian random variable with mean 0
n2
and variance 1. Using Result (2) of Theorem 7, p2 (1−n/p) Σδii converges to σ 2 in probability so that
√
√ 1 p 1−n/p
n = 1. This completes the proof of result (2). ■
Σ δii
(3) √1 W ⊤ = O (1).
n ∞
q
1 1 ⊤
log(pn) 1
(4) p nW A − In A = OP pn + n .
∞
q
1 ⊤ n log(np)
(5) pW A ∞ = OP p +1 .
q
W A⊤ log(n)
(6) qn
n − np In = OP √ 1
p .
p 1− n
p ∞ 1−n/p
63
Proof of Lemma. 9:
In order to prove these results, we will first show that the intersection
event of the 4 constraints of
2 2 1
Alg. 3 is non-null with probability at least 1 − p2 + n2 + 2np as the solution W = A is in the
feasible set. This will show that the feasible region of the optimisation problem given in Alg. 3 is
non-empty. Let us first define the following sets: n o
G1 (n, p) = A ∈ Rn×p : |A⊤ A/n − Ip |∞ ≤ µ1 , G2 (n, p) = A ∈ Rn×p : p1 (In − AA⊤ /n)A ≤ µ2 ,
( )
⊤
p n×p : a⊤ a /n ≤
G3 (n, p) = A ∈ Rn×p : qn n AA n − n In ∞ ≤ µ3 , G4 (n, p) = {A ∈ R .j .j
p 1− p
q q q
1 ∀ j ∈ [p]} , where, µ1 = 2 2 log(p)
n , µ 2 = 2 log(2np)
np + 1
n and µ 3 = √ 2 2 log(n)
p . Note that,
1−n/p
here A is a n × p Rademacher matrix. We will now state the probabilities of the aforementioned sets.
From (6.135) of Lemma 10, we have
2
P (A ∈ G1 (n, p)) ≥ 1 − (6.110)
p2
Therefore, A satisfies the constraints of Alg.3 with high probability. This implies that there exists
W ∗ that satisfies the constraints. Let
(
1
E(n, p) = A : ∃W ∗ s.t. |W ∗ ⊤ A/n − Ip |∞ ≤ µ1 , (In − W ∗ A⊤ /n)A ≤ µ2 ,
p
)
W ∗ A⊤
n p ∗⊤ ∗
q − In ≤ µ3 , w .j w .j /n ≤ 1 ∀ j ∈ [p] .
p 1− n n n ∞
p
Hence, we have
2 2 1
P (A ∈ E(n, p)) ≥ P (A ∈ ∩4k=1 Gk (n, p)) ≥1− 2
+ 2+ (6.115)
p n 2np
Given that the set of feasible solutions is non-null, we can say that the optimal solution of Alg. 3
denoted by W satisfies the constraints of Alg. 3 with probability 1.
64
Result (1): Recall that the event that there exists a point satisfying constraints C0–C3 is E(n, p).
We have
⊤ ⊤
P w.j w.j /n ≤ 1 ∀j ∈ [p] = P w.j w.j /n ≤ 1 ∀j ∈ [p] A ∈ E(n, p) P (A ∈ E(n, p))
⊤ c c
+ P w.j w.j /n ≤ 1 ∀j ∈ [p] A ∈ E(n, p) P (A ∈ E(n, p)(6.116) )
Now, we have from Alg. 3, if the constraints of the optimisation problem are not satisfied, then we
choose W = A as the output. This event is given by A ∈ E(n, p)c . Now, we know that for Rademacher
matrix A, a⊤
.j a.j /n = 1 with probability 1. Therefore, we have
⊤ c
P w.j w.j /n ≤ 1 ∀j ∈ [p] A ∈ E(n, p) = 1. (6.118)
q
Result (2): Recall that µ1 = 2 2 log(p) n . Note that we have for any two events F1 , F2 , P (F1 ) =
P (F1 ∩ F2 ) + P (F1 ∩ F2c ) ≤ P (F1 ∩ F2 ) + P (F2c ). Therefore, we have,
P |W ⊤ A/n − Ip |∞ ≥ µ1 ≤ P {|W ⊤ A/n − Ip |∞ ≥ µ1 } ∩ E(n, p) + P ({E(n, p)}c )
2 2 1
⊤
≤ P {|W A/n − Ip |∞ ≥ µ1 } ∩ E(n, p) + + +
p2 n2 2np
The last inequality comes from (6.115). Since W is a feasible solution, given A ∈ E(n, p), it will
satisfy the second constraint of Alg. 3 with probability 1. This means that
P {|W ⊤ A/n − Ip |∞ ≥ µ1 } ∩ E(n, p) = 0.
Therefore we have,
⊤
2 2 1
P |W A/n − Ip |∞ ≤ µ1 ≥ 1 − 2
+ 2+ . (6.120)
p n 2np
q
Since µ1 = 2 2 log(p) , we have, |W ⊤ A/n − Ip |∞ = OP ( log(p)/n).
p
n √
Result (3): From (6.119), we have that, for each j ∈ [p], ∥w.j ∥2 ≤ n with probability 1. Note that
√
for any vector x, ∥x∥∞ ≤ ∥x∥2 , we have for every j ∈ [p], ∥w.j ∥∞ ≤ n with probability 1. Since
√
|W ⊤ |∞ ≤ max∥w.j ∥∞ ≤ n with probability 1. Therefore, we have
j∈[p]
1
√ W⊤ = O(1). (6.121)
n ∞
q
1 log(2np)
Result (4): Recall that µ2 = n +2 np . Therefore, we have
1 1 ⊤ 1 1 ⊤
P In − W A A ≥ µ2 ≤ P In − WA A ≥ µ2 ∩ E(n, p) + P ({E(n, p)}c )
p n ∞ p n ∞
1 1 2 2 1
≤ P In − W A⊤ A ≥ µ2 ∩ E(n, p) + + +
p n ∞ p2 n2 2np
65
The last inequality comes from (6.115). Note that, since W is a feasible solution, given A ∈ E(n, p),
it will satisfy the third constraint of Alg. 3 with probability 1. This implies
1 1 ⊤
P In − W A A ≥ µ2 ∩ E(n, p) = 0.
p n ∞
Therefore, we have,
1 1 ⊤ 2 2 1
P In − W A A ≤ µ2 ≥ 1 − + + (6.122)
p n ∞ p2 n2 2np
q
1 1 ⊤
log(pn)
Hence, we have, p n W A − In A = OP
∞ pn + 1/n .
2 2 1
Result (5): Recall that from Eqn.(6.115), we have with probability at least 1 − p2
+ n2
+ 2np ,
1
(W A⊤ /n − In )A ≤ µ2 .
p ∞
2 2 1
Applying triangle inequality, we have with probability atleast 1 − p2
+ n2
+ 2np ,
1 1 1
W A⊤ A ≤ (W A⊤ /n − In )A + |A|∞ ≤ µ2 + 1/p. (6.123)
np ∞ p ∞ p
We now present some useful results about the norms being used in this proof. Let X be a p × p matrix
and Y be a p × n matrix . Recall the following definitions from [50],
∥Y x∥∞ ∥Y x∥2
∥Y ∥∞→∞ ≜ max and ∥Y ∥2→2 ≜ max = σmax (Y ).
x∈Rn \{0} ∥x∥∞ x∈Rn \{0} ∥x∥2
Note that |XY |∞ ≤ ∥X∥∞→∞ |Y |∞ 2 . Since √1p ∥x∥2 ≤ ∥x∥∞ and ∥Y ⊤ x∥∞ ≤ ∥Y ⊤ x∥2 for all x ∈ Rp
, we have
√
∥Y ⊤ ∥∞→∞ ≤ p∥Y ⊤ ∥2→2 . (6.124)
Then by using (6.124), we have
√ √
|XY |∞ = |Y ⊤ X ⊤ |∞ ≤ ∥Y ⊤ ∥∞→∞ |X ⊤ |∞ ≤ p∥Y ⊤ ∥2→2 |X ⊤ |∞ = pσmax (Y )|X|∞ . (6.125)
Substituting X = np 1
W A⊤ A and Y = A† ≜ A⊤ (AA⊤ )−1 , the Moore-Penrose pseudo-inverse of A,
in (6.125), we obtain:
√
1 ⊤ 1 ⊤ † p
WA = W A AA ≤ |W A⊤ A|∞ ∥A† ∥2→2 . (6.126)
np ∞ np ∞ np
We now derive the upper bound for the second factor of (6.126). We have,
1 1
∥A† ∥2→2 = σmax (A† ) = = . (6.127)
σmin (A) σmin (A⊤ )
Note that, for an arbitrary ϵ1 > 0, using Theorem 1.1. from [41] for the mean-zero sub-Gaussian
random matrix A, we have the following
√ √
P (σmin (A⊤ ) ≤ ϵ1 ( p − n − 1)) ≤ (c6 ϵ1 )p−n+1 + (c5 )p . (6.128)
where c6 > 0 and c5 ∈ (0, 1) are constants dependent on the sub-Gaussian norm of the entries of A⊤ .
Let for some small constant ψ ∈ (0, 1) ϵ1 c6 ≜ ψ, we have
!
† c6
P σmax (A ) ≤ √ √ ≥ 1 − ((ψ)p−n+1 + (c5 )p ). (6.129)
ψ( p − n − 1)
2
This is because |XY |∞ = maxi ∥XY.i ∥∞ ≤ maxi ∥X∥∞→∞ ∥Y.i ∥∞ (by the definition of the induced norm) =
∥X∥∞→∞ maxi ∥Y.i ∥∞ = ∥X∥∞→∞ |Y |∞ .
66
we have
c6
P σmax (A† ) ≤ √ √ ≥ 1 − ((ψ)p−n+1 + (c5 )p ). (6.130)
n−1
ψ p 1− √
p
The last inequality comes from (6.115). Note that, since W is a feasible solution, given A ∈ E3 (n, p),
it will satisfy the fourth constraint of Alg. 3 with probability 1. This implies that:
⊤
n WA p
P q − In ≥ µ3 ∩ E(n, p) = 0,
p 1 − n n n ∞
p
which yields:
W A⊤
n p 2 2 1
P q − In ≤ µ3 ≥ 1 − 2
+ 2+ . (6.132)
p 1− n n n ∞ p n 2np
p
2 2 1
Since, 1 − p2
+ n2
+ 2np goes to 0 as n, p → ∞, the proof is completed. ■
67
q
1 ⊤ log(p)
1. nA A − Ip ∞
= OP n .
q
1 1 ⊤
log(pn) 1
2. p n AA − In A = OP pn + .
n
∞
q
qn AA⊤ p √ 1 log(n)
3. If n < p, then n − n In = OP p .
p 1− n
p ∞ 1−n/p
Proof of Lemma 10, [Result (1)]: Let V ≜ A⊤ A/n − Ip . Note that elements of V matrix satisfies
the following:
(akl )2
(P
n
k=1 − 1 = 0 if j = l, l ∈ [p]
vlj = Pn aklnakj (6.133)
k=1 n if j ̸= l, j, l ∈ [p]
Therefore, we now consider off-diagonal elements of V (i.e., l ̸= j) for the bound. Each summand
a a of
a a
vlj is uniformly bounded − n1 ≤ kln kj ≤ n1 since the elements of A are ±1. Note that E kln kj = 0
∀k ∈ [n], l ̸= j ∈ [p]. Furthermore, for l ̸= j ∈ [p], each of the summands of vlj are independent since
the elements of A are independent. Therefore, using Hoeffding’s Inequality (see Lemma 1 of [38]) for
t > 0,
n
!
X akl akj nt2
P (|vlj | ≥ t) = P ≥ t ≤ 2e− 2 , l ̸= j ∈ [p].
n
k=1
Therefore we have
X nt2 nt2
P (|vlj | ≥ t) ≤ 2p(p − 1)e− 2 < 2p2 e−(6.134)
P max |vlj | ≥ t = P ∪l̸=j∈[n] |vlj | ≥ t ≤ 2 .
l̸=j∈[n]
l̸=j∈[n]
q
Putting t = 2 2 log
n
p
in (6.134), we obtain:
r !
2 log p
P max |vlj | ≥ 2 ≤ 2p2 e−4 log(p) = 2p−2 . (6.135)
l̸=j∈[n] n
Thus, we have: r !
log p
|V |∞ = OP . (6.136)
n
This completes the proof of Result (1).
Result (2): Note that,
1 1 1 1
AA⊤ − In A = AA⊤ A − A ≜ V . (6.137)
p n p n
By splitting the sum over l into the terms where l ̸= j and the term where l = j, and simplifying by
68
using the fact that a2kj = 1 for all k, j, we obtain
p X
n n
!
aij 1 X 1 X
aij a2kj
vij = − + ail akl akj
+ np
p np
l=1 k=1 k=1
l̸=j
p X
n n
!
aij 1 X aij 1 X
=− + ail akl akj
+ p n 1
p np
l=1 k=1 k=1
l̸=j
p X
n
1 X
= ail akl akj
.
np
l=1 k=1
l̸=j
Next we split the sum over k into the terms where k ̸= i and the term where k = i to obtain
p X
n p X
n p p X
n
1 X 1 X 1 X 2 1 X p−1
vij = a il akl a kj
= a il akl a kj
+ a il a ij = a il akl a kj
+
np aij .
np np np np
l=1 k=1 l=1 k=1 l=1 l=1 k=1
l̸=j l̸=j k̸=i l̸=j l̸=j k̸=i
(6.138)
1
If we condition on ai. and a.j , the (n − 1)(p − 1) random variables np ail akl akj for k ∈ [n] \ {i} and
1 1
l ∈ [p] \ {j} are independent, have mean zero, and are bounded between − np and np . Therefore, by
Hoeffding’s inequality, for t > 0, we have
p X
n
1 X n2 p2 t2
− 2(n−1)(p−1) 2
≤ 2e−npt /2 .
P
np ail akl akj ≥ t ai. , a.j
≤ 2e (6.139)
l=1 k=1
l̸=j k̸=i
Since the RHS of (6.139) is independent of ai, and a.j the bound also holds on the unconditional
probability, i.e., we have
p X
n
1 X n2 p2 t2
− 2(n−1)(p−1) 2
≤ 2e−npt /2 .
P
np a a a
il kl kj ≥ t
≤ 2e (6.140)
l=1 k=1
l̸=j k̸=i
q
log(2np)
Since aij is Rademacher, p−1np |a ij | < 1
n with probability 1. Choosing t = 2 np and using the
triangle inequality, we have from (6.140),
s s
p Xn
!
1 log(2np) 1 X p−1 log(2np) 1
P |vij | ≥ + 2 ≤P ail akl akj + |aij | ≥ 2 +
n np np np np n
l=1 k=1
l̸=j k̸=i
s
p X
n
1 X log(2np)
≤P ail akl akj ≥ 2 ≤ 1 .
np np 2n2 p2
l=1 k=1
l̸=j k̸=i
69
This completes the proof of Result (2).
Result (3): Reversing the roles of n and p in result (1), (6.133) and (6.135) of this lemma, we have
s !
AA⊤ 2 log(n) 2
P − In ≤ 2 ≥ 1 − 2. (6.142)
p ∞ p n
1
Now, multiplying by √ , we get
1−n/p
s
AA⊤
n p 2 2 log(n) 2
P q − In ≤p ≥ 1 − 2. (6.143)
p 1− n n n ∞ 1 − n/p p n
p
⊤
( ) ( )
1 ⊤
∗ −1 1 ⊤
∗
+ √ WK δ − δ̂λ2 ΣβK √ WK δ − δ̂λ2
n n
( )⊤ ( )
−1 √
1 ⊤ 1 ⊤ ∗
+ 2 √ WK η ΣβK n Ip − W A (β − β̂λ1 )
n n K
( )⊤ ( )
1 1
+ 2 √ WK ⊤ η ΣβK −1 √ WK ⊤ δ ∗ − δ̂λ2
n n
( )⊤ ( )
√
1 ⊤ 1
+ 2 n Ip − W A (β ∗ − β̂λ1 ) ΣβK −1 √ WK ⊤ δ ∗ − δ̂λ2 (6.144)
n K n
In Lemma 4, we will show that ΣβK is positive definite, hence ΣβK −1 exists. Next in Lemma 2, we
will show that the last 5 terms (all except the first term on the RHS) of (6.144) goes to 0 in probability.
( )⊤ (
Lastly, we see that the first term on the RHS of (6.144) can be written as ΣβK −1/2 √1n WK ⊤ η ΣβK −1/2 √1n WK
Since, η is Gaussian with mean 0 and variance covariance matrix σ 2 In , by linearity of Gaussian, we
( ) ( )⊤ ( )
have, ΣβK −1/2 √1n WK ⊤ η ∼ Nk (0, Ik ). Hence, ΣβK −1/2 √1n WK ⊤ η ΣβK −1/2 √1n WK ⊤ η ∼
χ2k .
Therefore, by applying Slutsky’s Theorem on (6.144), we have,
√ √ D
{ n(β̂W − β ∗ )K }⊤ ΣβK −1 { n(β̂W − β ∗ )K } → χ2k . (6.145)
This completes the proof of (3.39). Now, we prove the joint distribution given in (3.40) with the same
70
approach. Let us first expand (3.40) using the structure given in (3.36). We have,
( )⊤ ( )
∗ ⊤ −1 ∗ 1 ⊤ −1 1 ⊤
{(δ̂W − δ )L } ΣδL {(δ̂W − δ )L } = In − W A η ΣδL In − W A η
n L n L
( )⊤ ( )
1 ⊤ ∗ −1 1 ⊤ ∗
+ In − W A A(β − β̂λ1 ) ΣδL In − W A A(β − β̂λ1 )
n L n L
⊤
( ) ( )
1 1
+ WL A⊤ δ ∗ − δ̂λ2 ΣδL −1 WL A⊤ δ ∗ − δ̂λ2
n n
( )⊤ ( )
1 1
+ 2 In − W A⊤ η ΣδL −1 In − W A⊤ A(β ∗ − β̂λ1 )
n L n L
( )⊤ ( )
1 1
− 2 In − W A⊤ η ΣδL −1 WL A⊤ δ ∗ − δ̂λ2
n L n
( )⊤ ( )
1 ⊤ ∗ −1 1 ⊤
∗
− 2 In − W A A(β − β̂λ1 ) ΣδL WL A δ − δ̂λ2 (6.146)
n L n
In Lemma 4, we will show that ΣδL is positive definite, hence ΣδL −1 exists. Next in Lemma 3, we will
show that the last 5 terms (all except the first term on the RHS) of (6.146) goes to 0 in probability.
Lastly, we see that the first term on the RHS of (6.146) can be written as
( )⊤ ( )
−1/2 1 ⊤ −1/2 1 ⊤
ΣδL In − W A η ΣδL In − W A η .
n L n L
( )⊤ ( )
−1/2 1 ⊤ −1/2 1 ⊤
ΣδL In − W A η ΣδL In − W A η ∼ χ2l .
n L n L
Proof of Lemma2:
Result 1.: We will expand the quadratic form and utilize individual probabilistic rates derived in
Chapter 3 to prove the claim. We have
( )⊤ ( )
√ √
1 ⊤ 1
n Ip − W A (β ∗ − β̂λ1 ) ΣβK −1 n Ip − W ⊤ A (β ∗ − β̂λ1 )
n K n K
( )( )
√ √
XX 1 1
= [ΣβK −1 ]lj n Ip − W ⊤ A (β ∗ − β̂λ1 ) n Ip − W ⊤ A (β ∗ − β̂λ1 )
n j n l
j∈K l∈K
( ) ( )
√ √
XX 1 1
≤ [ΣβK −1 ]lj n Ip − W ⊤ A (β ∗ − β̂λ1 ) n Ip − W ⊤ A (β ∗ − β̂λ1 )
n j n l
j∈K l∈K
2 XX
√
1
≤ n Ip − W ⊤ A (β ∗ − β̂λ1 ) [ΣβK −1 ]lj (6.148)
n ∞ j∈K l∈K
71
[ΣβK −1 ]lj = OP (1). Now, we have from (23) of Theorem 3 of
P P
We have from Lemma 4 j∈K l∈K
√
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 )
Chapter 3, under the given conditions = oP (1). Since k is fixed
∞
as n, p → ∞, the proof is complete.
Result 2.: We follow the same process as in the proof of Result 1. We have
⊤
( ) ( )
1 ⊤
∗ −1 1 ⊤
∗
√ WK δ − δ̂λ2 ΣβK √ WK δ − δ̂λ2
n n
( )( )
XX
−11 ⊤
∗
1 ⊤
∗
= [ΣβK ]lj √ W j δ − δ̂λ2 √ W l δ − δ̂λ2
n n
j∈K l∈K
( ) ( )
XX
−1 1 ⊤
∗
1 ⊤
∗
≤ [ΣβK ]lj √ W j δ − δ̂λ2 √ W l δ − δ̂λ2
n n
j∈K l∈K
1 XX
≤ √ W ⊤ δ ∗ − δ̂λ2 [ΣβK −1 ]lj . (6.149)
n ∞ j∈K l∈K
( )⊤ ( )
−1 √
1 ⊤ 1 ⊤ ∗
2 √ WK η ΣβK n Ip − W A (β − β̂λ1 )
n n K
( )
−1/2 √
⊤ 1 ⊤ ∗
= Z ΣβK n Ip − W A (β − β̂λ1 )
n K
√
XX 1 ⊤
≤ [ΣβK ]lj |Zj | n Ip − W A (β ∗ − β̂λ1 )
−1
n l
j∈K l∈K
√
1 XX
≤ n Ip − W ⊤ A (β ∗ − β̂λ1 ) [ΣβK −1 ]lj |Zj | . (6.150)
n ∞ j∈K l∈K
√
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 )
We have from (23) of Theorem 3 of Chapter 3, under the given conditions =
∞
oP (1). Since k is fixed and we have from Lemma 4 j∈K l∈K [ΣβK −1 ]lj = OP (1) as n, p → ∞,
P P
the proof is complete.
Result 4.: By similar arguments as in (6.150), we have,
( )⊤ ( )
1 1
2 √ WK ⊤ η ΣβK −1 √ WK ⊤ δ ∗ − δ̂λ2
n n
( )
1
= Z ⊤ ΣβK −1/2 √ WK ⊤ δ ∗ − δ̂λ2
n
XX 1
≤ [ΣβK −1 ]lj |Zj | √ W ⊤ δ ∗
− δ̂ λ2
n l
j∈K l∈K
1 XX
≤ √ W ⊤ δ ∗ − δ̂λ2 [ΣβK −1 ]lj |Zj | . (6.151)
n ∞ j∈K l∈K
We have from (24) of Theorem 3 of Chapter 3, under the given conditions √1n W ⊤ δ ∗ − δ̂λ2 =
∞
oP (1). Since k is fixed and we have from Lemma 4 j∈K l∈K [ΣβK −1 ]lj = OP (1) as n, p → ∞,
P P
72
the proof is complete.
Result 5.: Expanding the quadratic form, we have
( )⊤ ( )
√
1 ⊤ 1
2 n Ip − W A (β ∗ − β̂λ1 ) ΣβK −1 √ WK ⊤ δ ∗ − δ̂λ2
n K n
XX 1 √ 1 ⊤
−1 ⊤ ∗
≤ [ΣβK ]lj √ W l δ − δ̂λ2 n Ip − W A (β ∗ − β̂λ1 )
n n j
j∈K l∈K
√
1 1 X X
≤ √ W ⊤ δ ∗ − δ̂λ2 n Ip − W ⊤ A (β ∗ − β̂λ1 ) [ΣβK −1 ]lj .
n ∞ n ∞ j∈K l∈K
We have from (23),(24) of Theorem 3 of Chapter 3, under the given conditions √1n W ⊤ δ ∗ − δ̂λ2 =
∞
√
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 )
oP (1) and = oP (1). Since k is fixed and we have from Lemma 4
∞
P P −1
j∈K l∈K [ΣβK ]lj = OP (1) as n, p → ∞, the proof is complete.
Proof of Lemma3:
Result 1.: We will expand the quadratic form and utilize individual probabilistic rates derived in
Chapter 3 to prove the claim. We have
( )⊤ ( )
1 1
In − W A⊤ A(β ∗ − β̂λ1 ) ΣδL −1 In − W A⊤ A(β ∗ − β̂λ1 )
n L n L
( )( )
XX 1 1
= [ΣδL −1 ]ik In − W A⊤ A(β ∗ − β̂λ1 ) In − W A⊤ A(β ∗ − β̂λ1 )
n i n k
i∈L k∈L
( ) ( )
XX
−1 1 ⊤ ∗ 1 ⊤ ∗
≤ [ΣδL ]ik In − W A A(β − β̂λ1 ) In − W A A(β − β̂λ1 )
n i n k
i∈L k∈L
2
n 1 1 XX
≤ p In − W A⊤ A(β ∗ − β̂λ1 ) n2
[ΣδL −1 ]ik . (6.152)
p 1 − n/p n
∞ p2 (1−n/p) i∈L k∈L
1
[ΣδL −1 ]ik = OP (1). Now, we have from (25) of Theo-
P P
We have from Lemma4, n2 i∈L k∈L
p2 (1−n/p)
⊤
( ) ( )
1 1
WL A⊤ δ ∗ − δ̂λ2 ΣδL −1 WL A⊤ δ ∗ − δ̂λ2
n n
( ) ( )
XX 1 1
≤ [ΣδL −1 ]ik Wi A⊤ δ ∗ − δ̂λ2 Wk A⊤ δ ∗ − δ̂λ2
n n
i∈L k∈L
2
n 1 1 XX
≤ p W A⊤ δ ∗ − δ̂λ2 n2
[ΣδL −1 ]ik (6.153)
p n/p n ∞ p2 (1−n/p) i∈L k∈L
⊤ ∗ − δ̂
We have from (26) of Theorem 3 of Chapter 3, under the given conditions √ n 1
W A δ λ2 =
p 1−n/p n
∞
1 P P −1
oP (1). Since k is fixed and we have from Lemma4, n2 i∈L k∈L [ΣδL ]ik = OP (1) as
p2 (1−n/p)
73
n, p → ∞, the proof is complete.
( )
ΣδL −1/2 In − n1 W A⊤ L η
Result 3.: Recall that Z = ∼ Nk (0, Il ). We have
( )⊤ ( )
1 1
2 In − W A⊤ η ΣδL −1
In − W A⊤ ∗
A(β − β̂λ1 )
n L n L
( )
⊤ −1 1 ⊤
= Z ΣδL In − W A A(β ∗ − β̂λ1 )
n L
XX
−1 1 ⊤
≤ [ΣδL ]ik |Zi | In − W A A(β ∗ − β̂λ1 )
n k
i∈L k∈L
1 X X
≤ In − W A⊤ A(β ∗ − β̂λ1 ) [ΣδL −1 ]ik |Zj |
n ∞ i∈L k∈L
n 1 ⊤ 1 XX
≤ p In − W A A(β ∗ − β̂λ1 ) 3 n 2 [ΣδL −1 ]ik . (6.154)
p 1 − n/p n 2
∞ p (1−n/p) i∈L k∈L
We have from (25) of Theorem 3 of Chapter 3, under the given conditions √ n In − n1 W A⊤ A(β ∗ − β̂λ1 )
p 1−n/p
∞
1 P P −1
oP (1). Since k is fixed and we have from Lemma4, n2 i∈L k∈L [ΣδL ]ik = OP (1) as
p2 (1−n/p)
n, p → ∞, the proof is complete.
Result 4.: By similar arguments as in (6.154), we have,
( )⊤ ( )
1 ⊤ −1 1 ⊤
∗
2 In − W A η ΣδL WL A δ − δ̂λ2
n L n
( )
−1/2 1
⊤ ⊤ ∗
= Z ΣδL WL A δ − δ̂λ2
n
XX 1
≤ [ΣδL −1/2 ]ik |Zi | Wk A⊤ δ ∗ − δ̂λ2
n
i∈L k∈L
np 1 ⊤
∗
XX
≤ 1 − n/p W A δ − δ̂λ2 [ΣδL −1/2 ]ik |Zj |
p n ∞ i∈L k∈L
n 1 1 XX
≤ p W A⊤ δ ∗ − δ̂λ2 3 n2
[ΣδL −1 ]ik . (6.155)
p 1 − n/p n ∞ p2 (1−n/p) i∈L k∈L
We have from (26) of Theorem 3 of Chapter 3, under the given conditions np 1 − n/p n1 W A⊤ δ ∗ − δ̂λ2
p
=
∞
1 P P −1
oP (1). Since k is fixed and we have from Lemma4, n2 i∈L k∈L [ΣδL ]ik = OP (1) as
p2 (1−n/p)
n, p → ∞, the proof is complete.
Result 5.: Expanding the quadratic form, we have
( )⊤ ( )
1 ⊤ ∗ −1 1 ⊤
∗
2 In − W A A(β − β̂λ1 ) ΣδL WL A δ − δ̂λ2
n L n
XX
−1 1 ⊤ 1
≤ [ΣδL ]ik In − W A A(β ∗ − β̂λ1 ) Wk A⊤ δ ∗ − δ̂λ2
n i n
i∈L k∈L
n 1 n 1
≤ p In − W A⊤ A(β ∗ − β̂λ1 ) p W A⊤ δ ∗ − δ̂λ2
p 1 − n/p n p 1 − n/p n
∞ ∞
1 XX
n2 [ΣδL −1 ]ik .
p2 (1−n/p) i∈L k∈L
74
We have from (25),(26) of Theorem 3 of Chapter 3, under the given conditions
√n In − n1 W A⊤ A(β ∗ − β̂λ1 ) √n 1 ⊤ ∗ − δ̂
= oP (1) and W A δ λ2 = oP (1).
p 1−n/p p 1−n/p n
∞ ∞
1 P P −1
Since k is fixed and we have from Lemma4, n2 i∈L k∈L [ΣδL ]ik = OP (1) as n, p → ∞,
p2 (1−n/p)
the proof is complete.
Proof of Lemma4:
Result 1: To prove this, we will exploit the singular value bounds of the Rademacher matrix. First
we bound the sum by the maximal element as follows:
XX
[ΣβK −1 ]lj ≤ k 2 ΣβK −1 ∞ (6.156)
j∈K l∈K
ΣβK −1 ∞
≤ ∥ΣβK −1 ∥2 (6.157)
Note that, for an arbitrary ϵ1 > 0, using Theorem 1.1. from [41] for the mean-zero sub-Gaussian
random matrix A, we have the following
copt p
P √ σmin (AK ) ≤ copt ϵ1 (1 − (k − 1)/n) ≤ (c6 ϵ1 )n−k+1 + (c5 )p . (6.159)
n
where c6 > 0 and c5 ∈ (0, 1) are constants dependent on the sub-Gaussian norm of the entries of ΣβK .
Let for some small constant ψ ∈ (0, 1) ϵ1 c6 ≜ ψ. We have
!
−1 c26
P σmax (ΣβK ) ≤ 2 p ≥ 1 − ((ψ)n−k+1 + (c5 )p ). (6.160)
2
copt ψ (1 − (k − 1)/n) 2
For an arbitrary ϵ1 > 0, using Theorem 1.1. from [41] for the mean-zero sub-Gaussian random matrix
A, we have the following
copt ⊤
p
P √ σmin (A ) ≤ copt ϵ1 (1 − n/p) ≤ (c6 ϵ1 )n−k+1 + (c5 )p . (6.163)
n
where c6 > 0 and c5 ∈ (0, 1) are constants dependent on the sub-Gaussian norm of the entries of ΣδL .
Since W A⊤ /n = copt AA⊤ /n, we have,
P σmin W A⊤ /n ≤ copt ϵ21 ( p/n − 1)2 ≤ (c6 ϵ1 )n−k+1 + (c5 )p .
p
(6.164)
75
Further since σmin W A⊤ /n − In = σmin W A⊤ /n −1 and σmin W A⊤ /n − In L ≥ σmin W A⊤ /n − In ,
Let for some small constant ψ ∈ (0, 1) ϵ1 c6 ≜ ψ. Then, we have from (6.162) and (6.165)
!
1 XX
−1 l2
P n2
[ΣδL ]ik ≤ n2
p ≥ 1 − {(c6 ϵ1 )n−k+1 + (c5 )p }.
2 2
(c ϵ ( p/n − 1) − 1) 2
p2 (1−n/p) i∈L k∈L p2 (1−n/p) opt 1
(6.166)
This completes the proof. ■
1. |ϑU |∞ = |ϑ||U |∞ .
2. |U + V |∞ ≤ |U |∞ + |V |∞ .
3. If |U |∞ = OP (h1 (n, p)) and |V |∞ = OP (h2 (n, p)), then |U +V |∞ ≤ OP (max{h1 (n, p), h2 (n, p)}).
4. ∥w⊤ V ∥∞ ≤ |V |∞ ∥w∥1 .
5. If |V |∞ = OP (h1 (n, p)) and ∥w∥1 = OP (hw (n, p)), then ∥w⊤ V ∥∞ ≤ OP (h1 (n, p)hw (n, p)). ■
Proof:
Result (1): We have, |ϑU |∞ = max |ϑuij | = max |ϑ||uij | = |ϑ||U |∞ .
i∈[n],j∈[p] i∈[n],j∈[p]
Result (2): We have, |U + V |∞ = max |uij + vij | ≤ max {|uij | + |vij |} ≤ max |uij | +
i∈[n],j∈[p] i∈[n],j∈[p] i∈[n],j∈[p]
max |vij | = |U |∞ + |V |∞ .
i∈[n],j∈[p]
Result (3): Given |U |∞ = OP (h1 (n, p)) and |V |∞ = OP (h2 (n, p)). From Part (2), we have,
|U +V |∞ ≤ |U |∞ +|V |∞ ≤ OP (h1 (n, p))+OP (h1 (n, p)) = OP (h1 (n, p)+h2 (n, p)) ≤ OP (max{h1 (n, p), h2 (n, p)}).
Result (5): We have from Part (4), ∥w⊤ V ∥∞ ≤ |V |∞ ∥w∥1 = OP (h1 (n, p))OP (hw (n, p)) = OP (h1 (n, p)hw (n, p)).
■
Lemma 12 Let Xi , for i = 1, 2, . . . , k, be Gaussian random variables with mean 0 and variance σ 2 .
Then, we have
p
P max |Xi | ≥ 2σ log(k) ≤ 1/k. (6.167)
i∈[k]
Note that Lemma 12 does not require independence. For a proof see, e.g., [48].
76
6.3 Proofs of Theorems and Lemmas of Chapter 4
6.3.1 Proof of Theorem 10:
The optimisation problem given in Alg.3 is as follows:
p
X
min w.j ⊤ w.j /n (6.168)
W
j=1
p
In order to prove that the optimal solution (6.168) is W = (1 − µ3 1 − n/p)A, we first consider am
equivalent relaxed version of this problem with only the constraint C3. The relaxed problem is given
as follows:
n
1X
min wi. ⊤ wi. (6.169)
W n
i=1
W A⊤
p
subject to − In ≤ µ3 1 − n/p.
p ∞
The objective function in (6.168) is equivalent to that of (6.169) as pj=1 w.j ⊤ w.j = trace(W ⊤ W ) =
P
Pn
trace(W W ⊤ ) = ⊤
i=1 wi. wi. . We write the relaxed problem in this form as this optimisation
problem in now separable with respect to the rows of W . The equivalent separable optimisation
problem of (6.169) is given as follows for all i ∈ [n].
77
where f and g are convex, the Fenchel dual is
1
∗
max −f − Ay − g ∗ (−y)
y p
where f ∗ and g ∗ are the convex conjugates of f and g, respectively.
Therefore, we take,
( p
1 2 0 if ∥w − ei ∥∞ ≤ µ3 1 − n/p
f (w) = ∥w∥ and g(w) = .
p ∞ otherwise
Then
p
f ∗ (y) = sup⟨y, w⟩ − f (w) = ∥y∥2
w 4
and
g ∗ (y) = sup⟨y, w⟩ − g(w) =
p
sup √ ⟨y, w⟩ = yj + µ3 1 − n/p∥y∥1 .
w ∥w−ei ∥∞ ≤µ3 1−n/p
78
6.3.2 Some useful Lemmas
Lemma 13 Let A, B, C be sets. The event (C \ B) ∩ A = ∅ is equivalent to the conditional statement
A ⊆ B | B ⊆ C.
Proof of Lemma 13
If (C \ B) ∩ A = ∅, then A ⊆ B | B ⊆ C.
Assume (C \ B) ∩ A = ∅. This implies that no element of A belongs to C \ B, i.e., for all x ∈ A:
x∈
/ C \ B =⇒ x ∈ B ∪ (U \ C),
where U is the universal set. Now, assume B ⊆ C. Since x ∈ B implies x ∈ C, the complement U \ C
does not contain elements of A. Thus, every x ∈ A must be in B, implying:
A ⊆ B.
Hence, A ⊆ B | B ⊆ C holds.
If A ⊆ B | B ⊆ C, then (C \ B) ∩ A = ∅. Assume B ⊆ C and A ⊆ B. If B ⊆ C, then every element
of B is in C, so A ⊆ B implies A ⊆ C. Additionally, since A ⊆ B, no element of A can be outside B.
Therefore, no element of A can belong to C \ B, and we have:
(C \ B) ∩ A = ∅.
79
Bibliography
[1] List of countries implementing pool testing strategy against COVID-19. https://en.wikipedia.
org/wiki/List_of_countries_implementing_pool_testing_strategy_against_COVID-19.
Last retrieved, Oct 2021.
[2] M. Aldridge, O. Johnson, and J. Scarlett. Group testing: An information theory perspective.
Found. Trends Commun. Inf. Theory, 15:196–392, 2019.
[3] Matthew Aldridge and David Ellis. Pooled Testing and Its Applications in the COVID-19 Pan-
demic, pages 217–249. Springer International Publishing, 2022.
[4] A. Aldroubi, X. Chen, and A.M. Powell. Perturbations of measurement matrices and dictionaries
in compressed sensing. Appl. Comput. Harmon. Anal., 33(2), 2012.
[5] G. Atia and V. Saligrama. Boolean compressed sensing and noisy group testing. IEEE Trans.
Inf. Theory, 58(3), 2012.
[6] S. Banerjee, R. Srivastava, J. Saunderson, and A. Rajwade. Robust non-adaptive group testing
under errors in group membership specifications. 2024.
[7] S. H. Bharadwaja and C. R. Murthy. Recovery algorithms for pooled RT-qPCR based COVID-19
screening. IEEE Trans. Signal Process., 70:4353–4368, 2022.
[8] Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis and Nonlinear Optimization: Theory
and Examples. CMS Books in Mathematics. Springer, 2nd edition, 2006.
[9] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge University Press,
2004.
[10] Emmanuel J. Candès, Justin K. Romberg, and Terence Tao. Stable signal recovery from incom-
plete and inaccurate measurements. Comm. Pure Appl. Math., 59(8):1207–1223, 2006.
[12] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama. Non-adaptive probabilistic group testing with
noisy measurements: Near-optimal bounds with efficient algorithms. In ACCC, pages 1832–1839,
2011.
[13] M. Cheraghchi, A. Hormati, A. Karbasi, and M. Vetterli. Group testing with probabilistic tests:
Theory, design and application. IEEE Transactions on Information Theory, 57(10), 2011.
[14] A. Christoff et al. Swab pooling: A new method for large-scale RT-qPCR screening of SARS-
CoV-2 avoiding sample dilution. PLOS ONE, 16(2):1–12, 02 2021.
[15] S. Comess, H. Wang, S. Holmes, and C. Donnat. Statistical Modeling for Practical Pooled Testing
During the COVID-19 Pandemic. Statistical Science, 37(2):229 – 250, 2022.
[16] R. Dorfman. The detection of defective members of large populations. Ann. Math. Stat.,
14(4):436–440, 1943.
80
[17] Marco F Duarte, Mark A Davenport, Dharmpal Takhar, Jason N Laska, Ting Sun, Kevin F Kelly,
and Richard G Baraniuk. Single-pixel imaging via compressive sampling. IEEE signal processing
magazine, 25(2):83–91, 2008.
[18] E. Fenichel, R. Koch, A. Gilbert, G. Gonsalves, and A. Wyllie. Understanding the barriers to
pooled SARS-CoV-2 testing in the United States. Microbiology Spectrum, 2021.
[19] M. Fischler and R. Bolles. Random sample consensus: A paradigm for model fitting with ap-
plications to image analysis and automated cartography. Communications of the ACM, 24(6),
1981.
[20] D. Forsyth and J. Ponce. Computer Vision: A Modern Approach. Pearson, 2012.
[21] S. Fosson, V. Cerone, and D. Regruto. Sparse linear regression from perturbed data. Automatica,
122, 2020.
[22] Sabyasachi Ghosh, Rishi Agarwal, Mohammad Ali Rehan, Shreya Pathak, Pratyush Agarwal,
Yash Gupta, Sarthak Consul, Nimay Gupta, Ritesh Goenka, Ajit Rajwade, and Manoj Gopalkr-
ishnan. A compressed sensing approach to pooled RT-PCR testing for COVID-19 detection.
IEEE Open Journal of Signal Processing, 2:248–264, 2021.
[23] A. C. Gilbert, M. A. Iwen, and M. J. Strauss. Group testing and sparse signal recovery. In 42nd
Asilomar Conf. Signals, Syst. and Comput., pages 1059–1063, 2008.
[24] N. Grobe, A. Cherif, X. Wang, Z. Dong, and P. Kotanko. Sample pooling: burden or solution?
Clin. Microbiol. Infect., 27(9):1212–1220, 2021.
[25] T. Hastie, R. Tibshirani, and M. Wainwright. Statistical Learning with Sparsity: The LASSO
and Generalizations. CRC Press, 2015.
[26] A. Heidarzadeh and K. Narayanan. Two-stage adaptive pooling with RT-qPCR for COVID-19
screening. In ICASSP, 2021.
[27] M.A. Herman and T. Strohmer. General deviants: an analysis of perturbations in compressed
sensing. IEEE Journal on Sel. Topics Signal Process., 4(2), 2010.
[28] F. Hwang. A method for detecting all defective members in a population by group testing. J Am
Stat Assoc, 67(339):605–608, 1972.
[29] J.D. Ianni and W.A. Grissom. Trajectory auto-corrected image reconstruction. Magnetic Reso-
nance in Medicine, 76(3), 2016.
[30] T. Ince and A. Nacaroglu. On the perturbation of measurement matrix in non-convex compressed
sensing. Signal Process., 98:143–149, 2014.
[31] A. Javanmard and A. Montanari. Confidence intervals and hypothesis testing for high-dimensional
regression. J Mach Learn Res, 2014.
[32] A. Kahng and S. Reda. New and improved BIST diagnosis methods from combinatorial group
testing theory. IEEE Trans. Comp. Aided Design of Inetg. Circ. and Sys., 25(3), 2006.
[34] Y. Li and G. Raskutti. Minimax optimal convex methods for Poisson inverse problems under ℓq
-ball sparsity. IEEE Trans. Inf. Theory, 64(8):5498–5512, 2018.
[35] Yuan Li and Garvesh Raskutti. Minimax optimal convex methods for poisson inverse problems
under ℓq -ball sparsity. IEEE Transactions on Information Theory, 64(8):5498–5512, 2018.
81
[36] Hubert W Lilliefors. On the Kolmogorov-Smirnov test for normality with mean and variance
unknown. Journal of the American statistical Association, 62(318):399–402, 1967.
[37] Dengyu Liu, Jinwei Gu, Yasunobu Hitomi, Mohit Gupta, Tomoo Mitsunaga, and Shree K Na-
yar. Efficient space-time sampling with pixel-wise coded exposure for high-speed imaging. IEEE
transactions on pattern analysis and machine intelligence, 36(2):248–260, 2013.
[39] A. Mazumdar and S. Mohajer. Group testing with unreliable elements. In ACCC, 2014.
[40] D. C. Montgomery, E. Peck, and G. Vining. Introduction to Linear Regression Analysis. Wiley,
2021.
[41] M.Rudelson and R.Vershynin. Smallest singular value of a random rectangular matrix. Comm.
Pure Appl. Math., 2009.
[42] Nam H. Nguyen and Trac D. Tran. Robust LASSO with missing and grossly corrupted observa-
tions. IEEE Trans. Inf. Theory, 59(4):2036–2058, 2013.
[43] H. Pandotra, E. Malhotra, A. Rajwade, and K. S. Gurumoorthy. Dealing with frequency pertur-
bations in compressive reconstructions with Fourier sensing matrices. Signal Process., 165:57–71,
2019.
[44] J. Parker, V. Cevher, and P. Schniter. Compressive sensing under matrix uncertainties: An
approximate message passing approach. In Asilomar Conference on Signals, Systems and Com-
puters, pages 804–808, 2011.
[45] M. Raginsky, R. Willett, Z. Harmany, and R. Marcia. Compressed sensing performance bounds
under Poisson noise. IEEE Trans. Signal Process., 58(8):3990–4002, 2010.
[47] Mark Rudelson and Roman Vershynin. The Littlewood–Offord problem and invertibility of ran-
dom matrices. Advances in Mathematics, 218(2):600–633, 2008.
[49] N. Shental et al. Efficient high throughput SARS-CoV-2 testing to detect asymptomatic carriers.
Sci. Adv., 6(37), September 2020.
[50] J. Todd. Induced Norms, pages 19–28. Birkhäuser Basel, Basel, 1977.
[51] Sara Van de Geer, Peter Bühlmann, Ya’acov Ritov, and Ruben Dezeure. On asymptotically
optimal confidence regions and tests for high-dimensional models. The Annals of Statistics,
42(3):1166–1202, 2014.
[54] Katherine J. Wu. Why pooled testing for the coronavirus isn’t working in America. https://www.
nytimes.com/2020/08/18/health/coronavirus-pool-testing.html. Last retrieved October
2021.
82
[55] H. Zabeti et al. Group testing large populations for SARS-CoV-2. medRxiv, pages 2021–06, 2021.
[56] Cun-Hui Zhang and Stephanie S. Zhang. Confidence intervals for low-dimensional parameters
in high-dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 76(1):217–242, 2014.
[57] J. Zhang, L. Chen, P. Boufounos, and Y. Gu. On the theoretical analysis of cross validation in
compressive sensing. In ICASSP, 2014.
[58] H. Zhu, G. Leus, and G. Giannakis. Sparsity-cognizant total least-squares for perturbed com-
pressive sampling. IEEE Trans. Signal Process., 59(11), 2011.
83