0% found this document useful (0 votes)

8 views84 pages

APS 4 Report

The document discusses robust non-adaptive group testing under erroneous group membership specifications, focusing on the debiasing of the Lasso estimator and its application in high-dimensional sparse regression. It outlines methods for correcting biases in Lasso estimates and introduces a Debiased Robust Lasso Test (Drlt) method to address group membership errors. The document includes empirical results and correction algorithms to enhance the accuracy of group testing in the presence of model mismatches and errors.

Uploaded by

Shuvayan Banerjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views84 pages

APS 4 Report

Uploaded by

Shuvayan Banerjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Robust Non-adaptive Group Testing

under Erroneous Group Membership

Specifications
Fourth Annual Progress Seminar

Shuvayan Banerjee

under the guidance of Prof. Ajit Rajwade, Prof.

James Saunderson and Prof. Radhendushka
Srivastava

Mathematics Department
IITB-Monash Research Academy
Contents

1 Introduction 3

2 Fast Debiasing of the Lasso Estimator 4

2.1 An Overview of the Debiased LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Debiasing the LASSO Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Re-parameterization of the Debiased LASSO . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 A Closed-Form Solution for the Debiasing Matrix W . . . . . . . . . . . . . . . 6
2.3 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Validity of the exact solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 Difference between W e and W o for varying choices of µ . . . . . . . . . . . . . 9

3 A Debiased LASSO Approach for Non-adaptive Group Testing with Errors in

Group Membership Specifications 11
3.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Basic Noise Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Model Mismatch Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Debiased Robust Lasso Test (Drlt) Method . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 Bounds on the Robust Lasso Estimate . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 A note on the Debiased Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.3 Debiasing in the Presence of MMEs . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Optimal Debiased Lasso Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Simultaneous Confidence Intervals for Optimal Drlt’s . . . . . . . . . . . . . . . . . . 21
3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.1 Results with Baseline Debiasing Techniques in the Presence of Effective MMEs 24
3.5.2 Empirical verification of asymptotic results of Theorem 8 . . . . . . . . . . . . 25
3.5.3 Sensitivity and Specificity of Odrlt and Drlt for δ ∗ . . . . . . . . . . . . . . 26
3.5.4 Identification of Defective Samples in β ∗ . . . . . . . . . . . . . . . . . . . . . . 27
3.5.5 RRMSE Comparison of Debiased Robust Lasso Techniques to Baseline Algorithms 28

4 Correction of Group Membership Specification Errors in Group Testing 30

4.1 Permutation Noise Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.1 Correction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Correction of Bitflip MME’s in Group Testing . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.1 Bit-flip models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.2 Correction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.3 Correction Algorithm for different bit-flip model . . . . . . . . . . . . . . . . . 35
4.2.4 Optimal solution for W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.5 Reconstruction error of β post-correction (ongoing) . . . . . . . . . . . . . . . 37
4.2.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Conclusion 46

1
6 Appendix 47
6.1 Proofs of Lemmas and Theorems of Chapter 2 . . . . . . . . . . . . . . . . . . . . . . 47
6.1.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1.4 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1.5 Convex conjugates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 Proofs of Theorems and Lemmas of Chapter 3 . . . . . . . . . . . . . . . . . . . . . . 50
6.2.1 Proofs of Theorems and Lemmas on Robust Lasso . . . . . . . . . . . . . . . . 50
6.2.2 Proofs of Theorems and Lemmas on Debiased Lasso . . . . . . . . . . . . . . . 56
6.2.3 Lemma on properties of A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2.4 Proofs of Simultaneous Confidence Intervals . . . . . . . . . . . . . . . . . . . . 70
6.2.5 Some useful lemmas for Drlt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3 Proofs of Theorems and Lemmas of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . 77
6.3.1 Proof of Theorem 10: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.2 Some useful Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

2
Chapter 1

Introduction

In high-dimensional sparse regression, where the number of predictors significantly exceeds the number
of observations, the Lasso (Least Absolute Shrinkage and Selection Operator) is a widely used method
for variable selection and estimation. By incorporating an ℓ1 regularization term, Lasso promotes
sparsity in the estimated coefficients, enabling effective performance for sparse signal vectors even if
the number of predictors far exceeds the number of samples. The Lasso estimator has well-established
theoretical guarantees for signal and support recovery [25]. Despite its strengths, a well-recognized
limitation of Lasso is its tendency to produce biased estimates. This bias arises from the shrinkage
imposed by the ℓ1 penalty. Consequently, the bias compromises estimation accuracy and impedes
statistical inference tasks such as construction of confidence intervals or hypothesis tests. These
challenges are especially pronounced in high-dimensional regimes, where traditional inference tools
fail due to high dimensionality.
[31] introduced a simple yet powerful approach that constructs debiased Lasso estimates using an
“approximate inverse” of the sample covariance matrix. Their method avoids direct precision matrix
estimation and instead employs an optimization framework to compute a debiasing matrix M that
corrects for bias while ensuring asymptotic normality of the debiased estimates.
In Chapter 2, we build upon the technique of [31], addressing one of its primary computational
bottlenecks: the optimization step required to compute the approximate inverse M . In Chapters 3
and 4, we attempt to identify and correct Group membership specification errors in high dimensional
sparse regression specifically in the field of Group Testing.
Notations: Throughout this paper, In denotes the identity matrix of size n × n. We use the nota-
tion [n] ≜ {1, 2, · · · , n} for n ∈ Z+ . Given a matrix A, its ith row is denoted as ai. , its j th column
is denoted as a.j and the (i, j)th element is denoted by aij . The ith column of the identity matrix
will be denoted as ei . For any vector z ∈ Rn and index set S ⊆ [n], we define z S ∈ Rn such that
∀i ∈ S, (zS )i = zi and ∀i ∈ / S, (zS )i = 0. S c denotes the complement of set S. We define the entrywise
l∞ norm of a matrix A as |A|∞ ≜ max|aij |. Consider two real-valued random sequences xn and rn .
i,j
Then, we say that xn is oP (rn ) if xn /rn → 0 in probability, i.e., limn→∞ P (|xn /rn | ≥ ϵ) = 0 for any
ϵ > 0. Also, we say that xn is OP (rn ) if xn /rn is bounded in probability, i.e., for any ϵ > 0 there
exist m, n0 > 0, such that P (|xn /rn | < m) ≥ 1 − ϵ for all n > n0 . For a positive integer p, we use the
shorthand [p] = {1, 2, . . . , p}. For a vector w ∈ Rm , we denote the ℓq -norm by ∥w∥q := ( m q 1/q
P
i=1 |wi | )
if 1 ≤ q < ∞ and the ℓ∞ -norm by ∥w∥∞ := maxi∈[m] |wi |.

3
Chapter 2

Fast Debiasing of the Lasso Estimator

To address the limitations of inference in high dimensional sparse regression, several methods have
been developed to “debias” the Lasso estimator, allowing for valid statistical inference even in
high-dimensional settings. Notably, [56] introduced a decorrelated score-based approach, leveraging
the Karush–Kuhn–Tucker (KKT) conditions of the Lasso optimization problem to construct bias-
corrected estimators. Their framework relies on precise estimation of the precision matrix (inverse
covariance matrix), which can be computationally challenging and sensitive to regularization choices.
Similarly, [51] proposed a methodology rooted in node-wise regression, where each variable is regressed
on the remaining variables to estimate the precision matrix. While effective, this method is computa-
tionally intensive. This may limit its applicability, particularly in scenarios where the design matrix
lacks favorable properties like sparsity of the rows of the precision matrix.
[31] introduced a simple yet powerful approach that constructs debiased Lasso estimates using an
“approximate inverse” of the sample covariance matrix. Their method avoids direct precision matrix
estimation and instead employs an optimization framework to compute a debiasing matrix M that
corrects for bias while ensuring asymptotic normality of the debiased estimates. A key advantage of
this method is its applicability for random sub-Gaussian sensing matrices, enabling valid inference
across a broad range of high-dimensional applications.
In this chapter, we build upon the technique of [31], addressing one of its primary computational
bottlenecks: the optimization step required to compute the approximate inverse M . By reformulating
the problem to work directly with the “weight matrix” W := AM ⊤ , we entirely eliminate the need
to solve this optimization problem in many practical cases. Our proposed reformulation leverages the
insight that the theoretical guarantees of the debiased Lasso estimator depend on the product AM ⊤
rather than the individual debiasing matrix M . By shifting the focus to the “weight matrix” W :=
AM ⊤ , we simplify the optimization problem while retaining all theoretical properties of the original
framework. Under certain deterministic assumptions on the coherence of A, we provide a simple,
exact, closed form optimal solution for the optimization problem to obtain W . We show that these
assumptions are satisfied with high probability for the ensembles of sensing matrices considered in [31],
under the additional condition that the elements of the rows of A are uncorrelated. In practice, sensing
matrices with uncorrelated entries are commonly used in many applications [17,37] and are also widely
used in many theoretical results in sparse regression [25]. This closed form solution eliminates the
computationally intensive optimization step required to compute M , significantly improving runtime
efficiency. It is applicable in many natural situations, including sensing matrices with i.i.d. isotropic
sub-Gaussian rows (such as i.i.d. Gaussian, or i.i.d. Rademacher entries).

2.1 An Overview of the Debiased LASSO

We consider the high-dimensional linear model

y = Aβ ∗ + η, (2.1)

where β ∗ ∈ Rp is a s-sparse signal (i.e., s := ∥β ∗ ∥0 where s ≪ p), A is a n × p design/sensing matrix

(where n ≪ p), and y ∈ Rn is the measurement vector. Also, η ∈ Rn is an additive noise vector

4
that consists of independent and identically distributed elements drawn from N (0, σ 2 ), where σ 2 is
the noise variance.
The Lasso estimate β̂λ of the sparse signal β ∗ is defined as the solution to the following opti-
mization problem:
1
β̂λ := arg min ∥y − Aβ∥22 + λ∥β∥1 , (2.2)
β 2n

where λ > 0 is a regularization parameter chosen appropriately. The Lasso estimator is known to be
a consistent estimator of the sparse signal β ∗ under the condition that the sensing matrix A satisfies
the Restricted Eigenvalue Condition (REC) [25, Chapter 11].

2.1.1 Debiasing the LASSO Estimator

The Lasso estimator is well-known to produce biased estimates, i.e., E(β̂λ ) ̸= β ∗ where the expecta-
tion is computed over noise instances. This bias arises from the ℓ1 regularization term, which induces
shrinkage in the estimate β̂λ . Moreover, there is no known method to compute a confidence interval
of β ∗ directly from β̂λ .
To reduce this bias and also construct confidence intervals of β ∗ , [31] introduced a debiased Lasso
estimator β̂d , defined as follows:
1
β̂d := β̂λ + M A⊤ (y − Aβ̂λ ). (2.3)
n

Here M is an approximate inverse of the rank deficient matrix Σ̂ := A⊤ A/n, computed by solving
the convex optimization problem given in Algorithm 2. The theoretical properties of β̂d from [31] are
applicable to a sensing matrix A with the following properties:

D1: The rows a1. , a2. , . . . , an,. of matrix A are independent and identically distributed zero-mean
sub-Gaussian random vectors with covariance Σ := E[ai. a⊤ i. ]. Furthermore, the sub-Gaussian
norm κ := ∥Σ−1/2 ai. ∥ψ2 1 is a finite positive constant.

D2: There exist positive constants 0 < Cmin ≤ Cmax , such that the minimum and maximum eigen-
values σmin (Σ), σmax (Σ) of Σ satisfy 0 < Cmin ≤ σmin (Σ) ≤ σmax (Σ) ≤ Cmax < ∞.

Theorem 7(b) of [31] shows that the optimization problem in (2.4) to obtain M is feasible with q high
√ 2
q
Cmax log p
probability, for sensing matrices satisfying properties D1 and D2, as long as µ > 4 3eκ Cmin n .
√
q
log p
If µ is O n and n is ω((s log p)2 ), then Theorem 8 of [31] shows that ∀j ∈ [p], n(β̂dj − βj∗ ) is
asymptotically zero-mean Gaussian with variance σ 2 m⊤
.j Σ̂m.j .

Algorithm 1 Construction of M (from [31])

Input: Design matrix A, µ ∈ (0, 1)
Output: Debiasing matrix M
1: Compute: Σ̂ := A⊤ A/n.
2: For each j ∈ [p], solve the following optimization problem to compute column vector m.j ∈ Rp :

minimize m⊤
.j Σ̂m.j
subject to ∥Σ̂m.j − ej ∥∞ ≤ µ, (2.4)

where ej is the j th column of the identity matrix Ip , and µ ∈ (0, 1).

3: Assemble M as M := (m.1 | · · · |m.p )⊤ .
4: If the optimization problem is infeasible for any j, set M := Ip . =0

1
The sub-Gaussian norm of a random variable x, denoted by ∥x∥ψ2 , is defined as ∥x∥ψ2 := supq≥1 q −1/2 (E|x|q )1/q .
For a random vector x ∈ Rn , its sub-Gaussian norm is defined as ∥x∥ψ2 := supy∈S n−1 ∥y ⊤ x∥ψ2 , where S n−1 denotes
the unit sphere in Rn .

5
2.2 Re-parameterization of the Debiased LASSO
The debiased Lasso estimator in (3.10) can be rewritten in terms of the variable W := AM ⊤ as:
1 ⊤
W (y − Aβ̂λ ).
β̂d = β̂λ + (2.5)
n
The re-parameterization does not affect the debiasing procedure introduced in [31]. Thus, any theo-
retical guarantees established using M extend to those using W .
We now produce a reformulated problem in (2.6) using W , and show that it is equivalent to the
original optimization problem in Algorithm 2. Using the relationship W = AM ⊤ , we can rewrite
m.j as w.j := Am.j . Making this substitution, the objective in (2.4) becomes m⊤ 1 ⊤
.j Σ̂m.j = n w .j w .j
and the constraint ∥Σ̂m.j − ej ∥∞ ≤ µ (where ej is the jth column of the identity matrix) becomes
1 ⊤
n A w .j − ej ∞ ≤ µ. This change of variables suggests the following reformulated optimization
problem (2.6) for the j th column of W :
Pj := minimize 1 ⊤
n w .j w .j
1 ⊤
subject to n A w .j − ej ∞
≤ µ. (2.6)
In fact, the j th reformulated problem (2.6) and the j th original problem (2.4) are equivalent in the
following sense: If m.j is feasible for (2.4) then w.j := Am.j is feasible for (2.6) and n1 w⊤ .j w .j =
⊤ †
m.j Σ̂m.j . Conversely, suppose that w.j is feasible for (2.6). If A is a pseudo-inverse of A, then
m.j := A† w.j is feasible for (2.4) since Σ̂m.j = n1 A⊤ Am.j = n1 A⊤ w.j . Moreover, n1 w⊤ .j w .j =
⊤
m.j Σ̂m.j , so both have the same objective values, establishing that (2.4) and (2.6) are equivalent.
This reformulation provides an equivalent separable problem for each column of W , maintaining all
theoretical guarantees while simplifying the representation of the debiasing procedure.
The reformulated problem (2.6) has a unique optimal solution because the objective function is
strongly convex with convex constraints. In contrast, the original problem (2.4) does not have a unique
solution. Indeed if m.j is any solution to (2.4), then we can add to it any element of the nullspace of
A to obtain another solution to (2.4).

2.2.1 A Closed-Form Solution for the Debiasing Matrix W

In this section, we demonstrate that, for a suitable choice of µ, the optimal solution to the problem
(2.6) can be computed in closed form for a sensing matrix whose minimum column norm is strictly
positive (which is true with probability 1 for random matrices). To derive this result we write down
the Fenchel dual of (2.6), and appeal to weak duality. In particular, we explicitly find primal and dual
feasible points with the same objective value, certifying that both are, in fact, optimal.
∥a.j ∥22
n o
Theorem 1 Let A be a n × p matrix with n < p. Define L(A) := min n . Define the coherence
j∈[p]
ν(A)
of A as ν(A) := maxi̸=j n1 |a⊤
.i a.j |. If µ satisfies L(A)+ν(A) ≤ µ < 1 then the optimal solution of (2.6)
is given by
n(1 − µ)
w.j := a.j (2.7)
∥a.j ∥22
for every j ∈ [p].
For notational simplicity, we will denote ν(A) and L(A) respectively by ν and L in the rest of the
paper.

Remarks:
1. This theorem eliminates the requirement to execute an iterative optimization algorithm to obtain
W (or an iterative optimization algorithm to obtain M ). This is because given A, one can
directly implement the optimal solution of Alg. 2 in the form (2.7) for all j ∈ [p]. This speeds up
the implementation of the debiasing of Lasso for the ensemble of sensing matrices that satisfy
the conditions of Theorem 1.

6
ν(A)
2. The condition L(A)+ν(A) ≤ µ < 1 can be satisfied whenever L is strictly positive, i.e., the column
norms of A are strictly positive. Given the sensing matrix A, the quantity ν/(ν + L) can be
computed exactly.

3. For sensing matrices whose column norms are equal to n, such as random Rademacher, row
sub-sampled DFT or row sub-sampled Hadamard, we clearly have L = 1. Hence, the condition
ν
for Theorem 1 becomes 1+ν ≤ µ < 1.

4. For sensing matrices with (almost surely) unequal column norms such as random sub-Gaussian
matrices, we need a condition that L is strictly positive, i.e., P (L ≥ c Cmin ) ≥ 1 − 2/p for
ν
some constant c > 0. This will be shown in Theorem 3. This implies L+ν ≤ c Cνmin with high
probability. Hence, in this case, it is sufficient to choose µ ≥ c Cνmin .
ν(A)
5. For sensing matrices with equal column norms, the condition on µ given by L(A)+ν(A) ≤µ<1
is a necessary and sufficient condition for the closed form expression in (2.7) to be optimal.
ν(A)
However, for sensing matrices with unequal column norms such as Gaussian, L(A)+ν(A) ≤µ<1
is only a sufficient condition. This is empirically illustrated in Subsection 2.3.2.
q
log p
Recall that as per Theorem 8 of [31], if µ is O n and n is ω((s log p)2 ), then ∀j ∈
√
[p], n(β̂dj −βj∗ ) is asymptotically zero-mean Gaussian when the elements of η are drawn from N (0, σ 2 ).
q
For specific classes of random matrices, we now show, in Theorem 2, that ν+L ≤ c0 logn p with high
ν

probability
q for some
constant c0 . This implies that for these random sensing matrices, the choice
log p
µ := O n ensures both the following: (i ) asymptotic debiasing for β̂d from (2.5) when n is
ν
ω((s log p)2 ) (see Theorem 8 of [31]), and (ii ) fulfillment of the sufficient condition L+ν ≤ µ for the
q
debiasing matrix W to be computed in closed-form. If the relation ν+L ν
≤ c0 logn p is to be satisfied
with high probability, we need an additional (mild) assumption on A as defined below:

D3: Σ, as defined in D1, is a diagonal matrix i.e., the elements of the rows of A are uncorrelated.

Theorem 2 Let A be a n × p dimensional matrix with independent and identically distributed zero-
mean sub-Gaussian rows with uncorrelated entries and sub-Gaussian norm κ := ∥Σ−1/2 ai. ∥ψ2 , where
n < p and Σ := E[ai. ai. ⊤ ]. Let L and ν be as defined in Theorem 1. For any constant c ∈
√ √ 2 4
( 2/(1 + 2), 1), if A obeys properties D1, D2 and D3 and n ≥ C4C max κ
2 (1−c)2 log p, then
min

r !
√ κ2 Cmax

ν log p 2 1
P ≤2 2 ≥1− + 2 . (2.8)
ν+L c Cmin n p p

√ 2 q
log p
Furthermore, the choice µ := 2 2 κc CCmax
min n ensures that the optimal debiasing matrix W is given
by (2.7) with high probability.

Remarks:
√ √ 2
4Cmax κ4
1. The condition that c ∈ ( 2/(1 + 2), 1) ensures that when n ≥ 2
Cmin (1−c)2
log p and µ :=
√ 2 q
log p
2 2 κc CCmax
min n , then we have µ < 1.

√ κ2 C q log p
2. For the choice of µ := 2 2 c Cmax
min n , the optimization problem in (2.6) is feasible with high
probability under the assumptions D1 and D2 (as per Theorem 7b of [31]). Additionally, if
A satisfies assumption D3, then W has the closed form solution as given in (2.7) with high
probability.

7
The proof of Theorem 2 is given in Appendix 6.1.2. The proof utilizes the following results: Theorem 3
and Lemma 1. In Theorem 3, we show that for an ensemble of sensing matrices satisfying assumptions
D1, D2 and D3, the parameter L is greater than c Cmin with high probability for some constant c.

Theorem 3 Let A be a n×p matrix with independently and identically distributed sub-Gaussian rows,
where n < p. Consider L as defined in Theorem 1. For any constant c ∈ (0, 1) and κ := ∥Σ−1/2 ai. ∥ψ2 ,
2 4
if A satisfies properties D1 and D2 and n ≥ C4C max κ
2 (1−c)2 log p, then
min

2
P (L ≥ c Cmin ) ≥ 1 − . (2.9)
p

The proof of Theorem 3 is given in Appendix 6.1.3. In the upcoming Lemma we provide a high
probability upper bound on ν for sensing matrices with independent and identically distributed zero-
mean sub-Gaussian rows with uncorrelated entries.

Lemma 1 Let A be a n × p dimensional matrix satisfying assumptions D1, D2 and D3 and sub-
Gaussian norm κ := ∥Σ−1/2 ai. ∥ψ2 . Define ν as in Theorem 1. Then
r !
√ 2 log p 1
P ν(A) ≤ 2 2Cmax κ ≥ 1 − 2. (2.10)
n p

The proof of Lemma 1 is provided in Appendix Sec.6.1.4.

The reason that property D3 arises in Lemma 1 is as follows. Recall that the coherence is the
maximum (for i ̸= j) of the absolute value of the random variables n1 a⊤ .i a.j . These random variables
have expectation Σi,j , and so it is not possible for ν to go to zero, asymptotically, unless Σ is diagonal.

2.3 Empirical Results

2.3.1 Validity of the exact solution
Aim: The debiased Lasso can be used to determine the support of the unknown vector β ∗ using
hypothesis tests, as per Theorem 8 of [31]. We aim to estimate the support using p hypothesis tests
(one per element of β ∗ ) based on the debiased Lasso estimates using the weights matrix W obtained
from the optimization problem in (2.6) (denoted by Wo ), and that obtained from the closed-form
expression (2.7) (denoted by We ), for varying number of measurements n. The aim is to also compare
these support set estimates with the ground truth support set, and report sensitivity and specificity
values (defined below). We will further show the difference in the run-time for both methods.

Signal Generation: For our simulations, we chose our design matrix A to have elements drawn
independently from the standard Gaussian distribution. We set the size of the signal to be p = 500. We
synthetically generated signals (i.e., β ∗ ) with p = 500 elements in each. The non-zero values of β ∗ were
drawn i.i.d. from U (50, 1000) and placed
Pn at randomly chosen indices. We set s := ∥β ∗ ∥0 = 10 and the
noise standard deviation σ := 0.05 i=1 |ai. β ∗ |/n. We varied n ∈ {200, 250, 300, 350, 400, 450, 500}.
We choose µ = ν/(ν + L) where ν and L are computed exactly given the sensing matrix A.

Sensitivity and Specificity Computation: Let us denote the debiased Lasso estimates obtained
using a matrix W by β̂d,W . We know that asymptotically β̂d,W (j) ∼ N(βj∗ , σ 2 W⊤ .j W.j ) for all j ∈ [p].
Using this result, β̂d,W was binarized to create a vector b̂W in the following way: For all j ∈ [p], we
set b̂W (j) := 1 if the value of β̂Wj was such that the the hypothesis H0,j : βj∗ = 0 was rejected against
the alternate H1,j : βj∗ ̸= 0. b̂W (j) was set to 0 otherwise. Note that for the purpose of our simulation,
we either have W = Wo or W = We . The binary vectors corresponding to these choices of W are
respectively denoted by b̂Wo and b̂We .
A ground truth binary vector b∗ was created such that b∗j := 1 at all locations j where βj∗ ̸= 0
and b∗j := 0 otherwise. Sensitivity and specificity values were computed by comparing corresponding

8
sensitivity specificity time (in s)
∥W o −W e ∥F
n Wo We Wo We Wo We ∥W e ∥F

200 0.6411 0.6411 0.8455 0.8455 3.62 × 102 1.14 × 10−3 5.37 × 10−10
250 0.7047 0.7047 0.8942 0.8942 4.33 × 102 1.87 × 10−3 5.57 × 10−8
300 0.7988 0.7988 0.9452 0.9452 5.52 × 102 2.75 × 10−3 4.40 × 10−7
350 0.8602 0.8602 0.9773 0.9773 5.95 × 102 5.06 × 10−3 2.43 × 10−7
400 0.9342 0.9342 0.9892 0.9892 6.28 × 102 9.44 × 10−3 3.23 × 10−7
450 0.9874 0.9874 0.9924 0.9924 6.74 × 102 2.31 × 10−2 3.01 × 10−7
500 0.9991 0.9991 1 1 7.31 × 102 8.55 × 10−2 4.09 × 10−7

Table 2.1: Sensitivity and Specificity of hypothesis test using debiased estimates obtain from Wo
(optimization method) and We (closed-form expression from (2.7)) with its corresponding runtime
in seconds for varying number of measurements. The fixed parameters are p = 500, s = 10, σ :=
0.05 ni=1 |ai. β ∗ |/n. We set µ = ν/(ν + L) where ν and L are computed exactly given the sensing
P
matrix A.

entries of b∗ to those in b̂Wo and b̂We . Considering the matrix W, we declared an element to be a
true defective if b∗j = 1 and b̂W,j = 1, and a false defective if b∗j = 0 but b̂W,j ̸= 0. We declare it to
be a false non-defective if b∗j = 0 but b̂W,j ̸= 0, and a true non-defective if βj∗ = 0 and b̂W,j = 0. The
sensitivity for β ∗ is defined as (# true defectives)/(# true defectives + # false non-defectives) and
specificity for β ∗ is defined as (# true non-defectives)/(# true non-defectives + # false defectives).

Results: For obtaining Wo , the optimization routine was executed using the lsqlin package in
MATLAB. The sensivitiy and specificity are averaged over 25 runs with independent noise instances.
In Table 2.1, we can see that the sensitivity as well as the specificity of the hypothesis tests for Wo
and We are equal. We further report the relative difference between Wo and We in the Frobenius
norm. We can clearly see that the difference is negligible, consistent with Theorem 1. Furthermore,
we see that using the closed-form expression in (2.7) saves significantly on time (by a factor of nearly
104 ). While the computational efficiency of the iterative approach can be improved by developing
a specialized solver for problems of the form (2.6), no iterative method is expected to outperform
directly computing the simple closed-form expression (2.7).

2.3.2 Difference between W e and W o for varying choices of µ

ν
Aim: In Theorem 1, we show that if L+ν ≤ µ < 1, then the closed form solution of (2.7) represented
by W e is the same as the solution of the optimization problem given in (2.6) represented by W o . In
ν
this subsection, we investigate the difference between W o and We for µ < L+ν as well as in the range
ν
L+ν ≤ µ < 1. We report the difference between W e and W o in terms of relative Error given by

log ∥W∥Wo −W e ∥F
e ∥F
for µ = 0.2, 0.21, 0.22, . . . , 0.60.

Sensing matrix properties: For this experiment, we fix n = 80, p = 100. We run this for two differ-
ent n × p sensing matrices A with elements drawn
from: (1) i.i.d. Gaussian and, (2) i.i.d. Rademacher.
∥W o −W e ∥F ν
In Figure 2.1 we plot µ vs log ∥We ∥F for both of these matrices. The exact value of L+ν is
given by a black vertical line in each case.

Observation: We see that for both the plots in Figure 2.1, the relative error decreases with increas-
ing in µ for µ < ν/(L + ν). For µ ≥ ν/(L + ν), the relative error is very small with fluctuations
primarily due to the solver tolerances in lsqlin when computing W o .
ν
Furthermore, for the Rademacher A, the decrease is sharp after the value of µ crosses L+ν .
(Note that, for Rademacher
A we have that L = 1). However, for the Gaussian sensing matrix
∥W o −W e ∥F ν
A, log ∥We ∥F decreases sharply before µ > L+ν .

9

Figure 2.1: Line plot of µ vs relative error log10 ∥W∥W o −W e ∥F
e ∥F
for two 80 × 100 dimensional sensing
matrices, (left) i.i.d. Gaussian and (right) i.i.d. Rademacher. The exact value of ν/(L + ν) is given
by the black vertical line. The value of ν/(L + ν) is 0.45 for Gaussian sensing matrix(left) and 0.298
for Rademacher sensing matrix (right). Here, Wo is the solution of the optimization problem in (2.6)
and W e is computed as in (2.7).

10
Chapter 3

A Debiased LASSO Approach for

Non-adaptive Group Testing with
Errors in Group Membership
Specifications

Group testing is a well-studied area of data science, information theory and signal processing, dating
back to the classical work of Dorfman in [16]. Consider p samples, one per subject, where each
sample is either defective or non-defective. In the case of defective samples, additional quantitative
information regarding the extent or severity of the defect in the sample may be available. Group
testing typically replaces individual testing of these p samples by testing of n < p ‘groups’ of samples,
thereby saving on the number of tests. Each group (also called a ‘pool’) consists of a mixture of small,
equal portions taken from a subset of the p samples. Let the (perhaps noisy) test results on the n
groups be arranged in an n-dimensional vector z. Let the true status of each of the p samples be
arranged in a p-dimensional vector β ∗ . The aim of group testing is to infer β ∗ from z given accurate
knowledge of the group memberships. We encode group memberships in an n × p-dimensional binary
matrix B (called the ‘pooling matrix’) where Bij = 1 if the j th sample is a member of the ith group,
and Bij = 0 otherwise. If the overall status of a group is the sum of the status values of each of the
samples that participated in the group, we have:
z = Bβ ∗ + η̃, (3.1)
where η̃ ∈ Rn is a noise vector. In a large body of the literature on group testing (e.g., [5,12,16]), z and
β ∗ are modeled as binary vectors, leading to the forward model z = N(Bβ ∗ ), where the matrix-vector
‘multiplication’ Bβ ∗ involves binary OR, AND operations instead of sums and products, and N is a
noise operator that could at random flip some of the bits in z. In this work, however, we consider
z and β ∗ to be vectors in Rn and Rp respectively, as also done in [22, 26, 49], and adopt the linear
model (3.1). This enables encoding of quantitative information in z, β ∗ , and Bβ ∗ now involves the
usual matrix-vector multiplication.
In commonly considered situations in group testing, the number of non-zero samples, i.e., defective
samples, s ≜ ∥β ∗ ∥0 is much less than p, and βj∗ = 0 indicates that the j th sample is non-defective
where 1 ≤ j ≤ p. In such cases, group testing algorithms have shown excellent results for the recovery
of β ∗ from z, B. These algorithms are surveyed in detail in [2] and can be classified into two broad
categories: adaptive and non-adaptive. Adaptive algorithms [16, 26, 28] process the measurements
(i.e., the results of pooled tests available in z) in two or more stages of testing, where the output
of each stage determines the choice of pools in the subsequent testing stage. Non-adaptive algo-
rithms [7, 22, 23, 49], on the other hand, process the measurements with only a single stage of testing.
Non-adaptive algorithms are known to be more efficient in terms of time as well as the number of tests
required, at the cost of somewhat higher recovery errors, as compared to adaptive algorithms [22, 33].
In this work, we focus on non-adaptive algorithms.

11
Problem Motivation: In the recent COVID-19 pandemic, RT-PCR (reverse transcription poly-
merase chain reaction) has been the primary method of testing a person for this disease. Due to
widespread shortage of various resources for testing, group testing algorithms were widely employed
in many countries [1]. Many of these approaches used Dorfman testing [16] (an adaptive algorithm),
but non-adaptive algorithms have also been recommended or used for this task [7,22,49]. In this appli-
cation, the vectors β ∗ and z refer to the real-valued viral loads in the individual samples and the pools
respectively, and B is again a binary pooling matrix. In a pandemic situation, there is heavy demand
on testing labs. This leads to practical challenges for the technicians to implement pooling due to
factors such as (i) a heavy workload, (ii) differences in pooling protocols across different labs, and (iii)
the fact that pooling is inherently more complicated than individual sample testing [54], [18, ‘Results’].
Due to this, there is the possibility of a small number of inadvertent errors in creating the pools. This
causes a difference between a few entries of the pre-specified matrix B and the actual matrix B̂ used
for pooling. Note that B is known whereas B̂ is unknown in practice. The sparsity of the difference
between B and B̂ is a reasonable assumption, if the technicians are competent. Hence only a small
number of group membership specifications contain errors. This issue of errors during pool creation is
well documented in several independent sources such as [54], [18, ‘Results’], [15, Page 2], [46], [55, Sec.
3.1], [14, ‘Discussion’], [24, ‘Specific consideration related to SARS-CoV-2’] and [3, ‘Laboratory infras-
tructure’]. However the vast majority of group testing algorithms — adaptive as well as non-adaptive
— do not account for these errors. To the best of our knowledge, this is the first piece of work on
the problem of a mismatched pooling matrix (i.e., a pooling matrix that contains errors in group
membership specifications) for non-adaptive group testing with real-valued β ∗ and (possibly) noisy z.
We emphasize that besides pooled RT-PCR testing, faulty specification of pooling matrices may also
naturally occur in group testing in many other scenarios, for example when applied to verification of
electronic circuits [32]. Another scenario is in epidemiology [13], for identifying infected individuals
who come in contact with agents who are sent to mix with individuals in the population. The health
status of various individuals is inferred from the health status of the agents. However, sometimes an
agent may remain uninfected even upon coming in contact with an infected individual, which can be
interpreted as an error in the pooling matrix.

Related Work: We now comment on two related pieces of work which deal with group testing
with errors in pooling matrices via non-adaptive techniques. The work in [13] considers probabilistic
and structured errors in the pooling matrix, where an entry bij with a value of 1 could flip to 0 with
a probability 0.5, but not vice versa, i.e., a genuinely zero-valued bij never flips to 1. The work in [39]
considers a small number of ‘pretenders’ in the unknown binary vector β ∗ , i.e., there exist elements
in β ∗ which flip from 1 to 0 with probability 0.5, but not vice versa. Both these techniques consider
binary valued vectors z and β ∗ , unlike real-valued vectors as considered in this work. They also
do not consider noise in z in addition to the errors in B. Furrthemore, we also present a method
to identify the errors in B, unlike the techniques in [13, 39]. Due to these differences between our
work and [13,39], a direct numerical comparison between our results and theirs will not be meaningful.

Sensing Matrix Perturbation in Compressed Sensing: There exists a nice relationship be-
tween the group testing problem and the problem of signal reconstruction in compressed sensing (CS),
as explored in [22, 23]. Likewise, there is literature in the CS community which deals with perturba-
tions in sensing matrices [4, 21, 27, 30, 43, 44, 58]. However, these works either consider dense random
perturbations (i.e., perturbations in every entry) [4, 21, 27, 30, 44, 58] or perturbations in specifications
of Fourier frequencies [29,43]. These perturbation models are vastly different from the sparse set of er-
rors in binary matrices as considered in this work. Furthermore, apart from [29, 43], these techniques
just perform robust signal estimation, without any attempt to identify rows of the sensing matrix
which contained those errors.

Overview of contributions: In this chapter, we present a robust approach for recovering β ∗ ∈ Rp

from noisy z ∈ Rn when n < p, given a (known) pre-specified pooling matrix B, but where the mea-
surements in z have been generated with another unknown pooling matrix B̂, i.e., z = B̂β ∗ + η̃. The

12
approach, which we call the Debiased Robust Lasso Test Method or Drlt, extends existing work
on ‘debiasing’ the well-known Lasso estimator in statistics [31], to also handle errors in B. In this
approach, we present a principled method to identify which measurements in z correspond to rows
with errors in B, using hypothesis testing. We also present an algorithm for direct estimation of β ∗
and a hypothesis test for identification of the defective samples in β ∗ , given errors in B. We estab-
lish the desirable properties of these statistical tests such as consistency. Though our approach was
initially motivated by pooling errors during preparation of pools of COVID-19 samples, it is broadly
applicable to any group-testing problem where the pool membership specifications contain errors.

3.1 Problem formulation

3.1.1 Basic Noise Model
Let us recall the model setup used in this report. Suppose that β ∗ ∈ Rp and that the elements of the
n × p pooling matrix B are independently drawn from Bernoulli(0.5) in (3.1). Additionally, let β ∗ be
sparse (as commonly assumed in group testing) with at most s ≪ p non-zero distinct elements. Assume
that the elements of the noise vector η̃ ∈ Rn in (3.1) are independent and identically distributed (i.i.d.)
Gaussian random variables with mean 0 and variance σ̃ 2 . Note that, throughout this work, we assume
σ̃ 2 to be known. The Lasso estimator β̂, used to estimate β ∗ , is defined as
1
β̂ = arg min ∥z − Bβ∥22 + λ∥β∥1 . (3.2)
β 2n

Given a sufficient number of measurements, the Lasso is known to be consistent for sparse β ∗ [25,
Chapter 11] if the penalty parameter λ > 0 is chosen appropriately and if B satisfies the Restricted
Eigenvalue Condition (REC)1 . Certain deterministic binary pooling matrices can also be used as
in [22, 49] for a consistent estimator of β ∗ . However, we focus on the chosen random pooling matrix
in this paper.
It is more convenient for analysis via the REC, and more closely related to the theory in [31], if
the elements of the pooling matrix have mean 0. Since the elements of B are drawn independently
from Bernoulli(0.5), it does not obey the mean-zero property. Hence, we transform the random binary
matrix B to a random Rademacher matrix A ≜ 2B − 1n×p , which is a simple one-one transformation
similar to that adopted in [45] for Poisson compressive measurements. (Note that 1n×p refers to a
matrix of size n × p containing all ones.) We also transform the measurements in z to equivalent
measurements y associated with Rademacher matrix A.
The expression for each measurement in y is now given by:

∀i ∈ [n], yi = ai. β ∗ + ηi =⇒ y = Aβ ∗ + η, (3.3)

where ηi ∼ N(0, σ 2 ), σ 2 ≜ 4σ̃ 2 . We will henceforth consider y, A for the Lasso estimates in the
following manner: The Lasso estimator β̂, used to estimate β ∗ , is now defined as
1
β̂ = arg min ∥y − Aβ∥22 + λ∥β∥1 . (3.4)
β 2n

3.1.2 Model Mismatch Errors

Consider the measurement model defined in (3.3). We now examine the effect of mis-specification
of samples in a pool. That is, we consider the case where due to errors in mixing of the samples,
the pools are generated using an unknown matrix Â (say) instead of the pre-specified matrix A.
Note that A and Â are respectively obtained from B and B̂. The elements of matrix Â and A are
equal everywhere except for the misspecified samples in each pool. We refer to these errors in group
1
Restricted Eigenvalue Condition: For some constant ψ ≥ 1 and S ⊆ [p], let Cψ (S) ≜ {∆ ∈ Rn : ∥∆S c ∥1 ≤ ψ∥∆S ∥1 }.
We say that a n × p matrix B satisfies the REC with respect to Cψ (S) if there exists a constant γ > 0 such that
1
n
∥B∆∥22 ≥ γ∥∆∥22 for all ∆ ∈ Cψ (S). Here γ is the restricted eigenvalue (RE) constant.

13
membership specifications as ‘bit-flips’. For example, suppose that the ith pool is specified to consist
of samples j1 , j2 , j3 ∈ [p]. But due to errors during pool creation, the ith pool is generated using
samples j1 , j2 , j5 . In this specific instance, ai,j3 ̸= âi,j3 and ai,j5 ̸= âi,j5 .
Note that A is known whereas Â is unknown. Moreover, the locations of the bit-flips are unknown.
Hence they induce signal-dependent and possibly large ‘model mismatch errors’ δi∗ ≜ (âi. − ai. )β ∗
in the ith measurement. In the presence of bit-flips, the model in (3.3) can be expressed as:
∗
∗ ∗ ∗ ∗ β
yi = ai. β + δi + ηi , for i ∈ [n], =⇒ y = Aβ + δ + η = (A|In ) + η. (3.5)
δ∗

We assume δ ∗ , which we call the ‘model mismatch error’ (MME) vector in Rn , to be sparse, and
r ≜ ∥δ ∗ ∥0 ≪ n. The sparsity assumption on δ ∗ is reasonable in many applications (e.g., given a
competent technician performing pooling).
Suppose for a fixed i ∈ [n], âi. contains a bit-flip at index j. If βj∗ is 0 then δi∗ would remain 0
despite the presence of a bit-flip in âi. . Furthermore, such a bit-flip has no effect on the measurements
and is not identifiable from the measurements. However, if βj∗ is non-zero then δi∗ is also non-zero.
Such a bit-flip adversely affects the measurement and we henceforth refer to it as an effective bit-flip.
Effective bit-flips lead to non-zero elements in the MME vector δ ∗ . We refer to the non-zero elements
of δ ∗ as effective MMEs. Without loss of generality, we consider the identification of effective MMEs
in this paper.

3.2 Debiased Robust Lasso Test (Drlt) Method

We now present our proposed approach, named the ‘Debiased Robust Lasso Test Method’ (Drlt),
for recovering the signal β ∗ given measurements y obtained from the erroneous, unknown matrix Â
which is different from the pre-specified, known sensing matrix A. The main objectives of this work
are:

Aim (i): Estimation of β ∗ under model mismatch and development of a statistical test to determine
whether or not the j th sample, j ∈ [p], is defective/diseased.

Aim (ii): Development of a statistical test to determine whether or not the ith measurement (i ∈ [n])
contains an effective MME i.e., δi∗ is non-zero.

A measurement containing an effective MME will appear like an outlier in comparison to other mea-
surements due to the non-zero values in δ ∗ . Therefore identification of measurements containing
effective MMEs is equivalent to determining the non-zero entries of δ ∗ . This idea is inspired by the
concept of ‘Studentised residuals’ which is widely used in the statistics literature to identify outliers
in full-rank regression models [40]. Since our model operates in a compressive regime where n < p,
the distributional property of studentized residuals may not hold. Therefore, we develop our Drlt
method which is tailored for the compressive regime.
Our basic estimator for β ∗ and δ ∗ from y and A is given as

β̂λ1 1
= arg min ∥y − Aβ − δ∥22 + λ1 ∥β∥1 + λ2 ∥δ∥1 , (3.6)
δ̂λ2 β,δ 2n

where λ1 , λ2 are appropriately chosen regularization parameters. This estimator is a robust version
of the Lasso regression [42]. The robust Lasso, just like the Lasso, will incur a bias due to the ℓ1
penalty terms.
The work in [31] provides a method to mitigate the bias in the Lasso estimate and produces a
‘debiased’ signal estimate whose distribution turns out to be approximately Gaussian with specific
observable parameters in the compressive regime (for details, see [31] and Sec. 3.2.2 below). However,
the work in [31] does not take into account errors in sensing matrix specification. We non-trivially
adapt the techniques of [31] to our specific application which considers bit-flips in the pooling matrix,
and we also develop novel procedures to realize Aims (i ) and (ii ) mentioned above.

14
We now first review important concepts which are used to develop our method for the specified
aims. We subsequently develop our method in the rest of this section. However, before that, we
present error bounds on the estimates β̂λ1 and δ̂λ2 from (3.6), which are non-trivial extensions of
results in [42]. These bounds will be essential in developing hypothesis tests to achieve Aims (i) and
(ii).

3.2.1 Bounds on the Robust Lasso Estimate

When the sensing matrix A is iid Gaussian, upper bounds on ∥β ∗ − β̂λ1 ∥2 and ∥δ ∗ − δ̂λ2 ∥2 have been
presented in [42]. In our case, A is i.i.d. Rademacher, and hence some modifications to the results
from [42] are required. We now state a theorem for the upper bound on the reconstruction error of
both β̂λ1 and δ̂λ2 for a random Rademacher pooling matrix A. We further use the so called ‘cone
constraint’ to derive separate bounds on the estimates of both β ∗ and δ ∗ . These bounds will be very
useful in deriving theoretical results for debiasing.
√ √
4σ √log p 4σ log n
Theorem 4 Let β̂ λ1 , δ̂λ2 be as in (3.6) and set λ1 ≜ n
, λ2 ≜ n . Let n < p, S ≜ {j :
βj∗ ̸= 0}, R ≜ {i : δi∗ ̸= 0}, s ≜ |S| and r ≜ |R|. If |A|∞ ≤ 1 and A satisfies the Extended
Restricted Eigenvalue Condition (EREC) from Definition 6.2.1 with κ > 0 and with respect to the
√
cone C(S, R, ( nλ2 )/λ1 ), then we have the following:

(1) Error bound on β̂λ1 :

r !
∗ −2 log(p) 1 1
P β̂λ1 − β ≤ 48κ (s + r)σ ≥1− + . (3.7)
1 n p n

(2) Error bound on δ̂λ2 : Additionally if n log n ≥ (48κ−2 )2 (s + r)2 log p,

√ !
∗ 24σr log n 1 2
P δ̂λ2 − δ ≤ ≥1− + . (3.8)
1 n p n

In Lemma. 6 of Sec.6.2.1, we show that the chosen random Rademacher sensing matrix A satisfies the
EREC with κ = 1/16 if λ1 and λ2 are chosen as in Theorem 4. Furthermore, |A|∞ = 1. Therefore,
the sufficient conditions for Theorem 4 are satisfied with high probability for a random Rademacher
sensing matrix.

Remarks on Theorem 4:
q
1. From Result (1), we see that β̂λ1 − β ∗ = OP (s + r) logn p .
1
√
r log n
2. From Result (2), we see that δ̂λ2 − δ ∗ = OP n .
1

3. The upper bounds of errors given in Theorem 4 increase with σ, as well as s and r, which is
quite intuitive. They also decrease with n.

3.2.2 A note on the Debiased Lasso

Let us consider the measurement vector from (3.5), momentarily setting δ ∗ ≜ 0, i.e., we have y =
Aβ ∗ + η. Let β̂λ be the minimizer of the following Lasso problem
1
min ∥y − Aβ∥22 + λ∥β∥1 , (3.9)
β 2n

15
for a given value of λ. Though Lasso provides excellent theoretical guarantees [25, Chapter 11], it is
well known that it produces biased estimates, i.e., E(β̂λ ) ̸= β ∗ , where the expectation is taken over
different instances of η. The work in [31] replaces β̂λ by a ‘debiased’ estimate β̂d given by:
1
β̂d = β̂λ + M A⊤ (y − Aβ̂λ ), (3.10)
n
where M is an approximate inverse (defined as in Alg. 2) of Σ̂ ≜ A⊤ A/n. Substituting y = Aβ ∗ + η
into (3.10) and treating n1 M A⊤ A as approximately equal to the identity matrix, yields:

1 1
β̂d = β̂λ + M A⊤ (Aβ ∗ + η − Aβ̂λ ) ≈ β ∗ + M A⊤ η, (3.11)
n n

which is referred to as a debiased estimate, as E(β̂d ) ≈ β ∗ . Note that Σ̂ is not an invertible matrix as
n < p. Hence, the approximate inverse is obtained by solving a convex optimization problem as given
by Alg. 2, where the minimization of the diagonal elements of M Σ̂M is motivated by minimizing the
variance of β̂d , as proved in [31, Sec. 2.1]. Furthermore, as proved in [31, Theorem 7], the convex
problem in Alg. 2 is feasible with high probability if Σ ≜ E[ai. (ai. )⊤ ] (where the expectation is taken
over the rows of A) obeys some specific statistical properties (see later in this section).

Algorithm 2 Construction of M (Alg 1 of [31])

Input: Measurement vector y, design matrix A, µ
Output: M
1: Set Σ̂ = A⊤ A/n.
2: Let mi ∈ Rp for i = 1, 2, . . . , p be a solution of:

minimize mi ⊤ Σ̂1 mi
subject to ∥Σ̂mi − ei ∥∞ ≤ µ, (3.12)
q
log p
where ei ∈ Rp is the ith column of the identity matrix I and µ = O n .
3: Set M = (m1 | . . . |mp )⊤ . If any of the above problems is not feasible, then set M = Ip . =0

The
p
debiased
estimate β̂d in (3.10) obtained via an approximate inverse M of Σ̂ using µ =
O (log p)/n satisfies the following statistical properties [31, Theorem 8]:
√ √ √
m(β̂d − β ∗ ) = M A⊤ η/ n + n(M Σ̂ − In )(β ∗ − β̂d ). (3.13)

Here the second term on the RHS is referred to as the bias vector. Moreover it is proved in [31,
Theorem 8] that for sufficiently large n, an appropriate choice of λ in (3.9) and under appropriate
statistical assumptions on Σ, the maximum absolute value of the bias vector is OP ( σs√log n
p
). Thus, if
2
n > O((s log p) ), the largest absolute value of the bias vector will be negligible, and thus the debiasing
effect is achieved since E(β̂d ) ≈ β ∗ .
Our debiasing approach is motivated along similar lines as in Alg. 2, but with MMEs in the
sensing matrix which the earlier method cannot handle. Moreover, we demonstrate via simulations
that ignoring MMEs may lead to larger estimation errors—see Table 3.1 of Sec. 3.5.1.

3.2.3 Debiasing in the Presence of MMEs

In the presence of MMEs, the design matrix (A|In ) from (3.5) plays the role of A in Alg. 2. How-
ever (A|In ) is partly random and partly deterministic, whereas the theory in [31] applies to either
purely random (Theorem 8 of [31]) or purely deterministic (Theorem 6 of [31]) matrices, but not a
combination of both. Hence, the theoretical results of [31] do not apply for the approximate inverse
of n1 (A|In )⊤ (A|In ) obtained using Alg. 2. Numerical results for the poor performance of ‘debiasing’
(as in (3.10)) with such an approximate inverse are demonstrated in Sec. 3.5.1.

16
To produce a debiased estimate of β ∗ in the presence of MMEs in the pooling matrix, we adopt a
different approach than the one in [31]. We define a linear combination of the residual error vectors
obtained by running the robust Lasso estimator from (3.6) via a carefully chosen set of weights,
in order to debias the robust Lasso estimates β̂ λ1 , δ̂ λ2 . The weights of the linear combination are
represented in the form of an appropriately designed matrix W ∈ Rn×p for debiasing β̂λ1 and a
derived weights matrix In − n1 W A⊤ for debiasing δ̂ λ2 . We later provide a procedure to design an

optimal W , as given in Alg. 2.

Given weight matrix W , we define a set of modified debiased Lasso estimates for β ∗ and δ ∗ as
follows:
1 ⊤
β̂W ≜ β̂λ1 + W (y − Aβ̂λ1 − δ̂λ2 ), (3.14)
n
1 ⊤
δ̂W ≜ y − Aβ̂W = δ̂λ2 + In − W A (y − Aβ̂λ1 − δ̂λ2 ). (3.15)
n

In our work, the matrix W does not play the role of M from Alg. 2, but instead plays the role of
AM ⊤ (comparing (3.14) and (3.10)). In Theorem 5 below, we show that these estimates are debiased
in nature for the choice W ≜ A. Thereafter, in Sec. 3.3 and Theorem 8, using a different choice for
W via Alg. 3, we show that the resultant tests are superior in comparison to W = A.

Theorem
√
5 Let√ β̂ λ1 , δ̂λ2 be as in (3.6), β̂W , δ̂W be as in (3.14), (3.15) respectively and set λ1 ≜
4σ √log p
n
, λ2 ≜ 4σ nlog n . Suppose that n is ω[((s + r) log p)2 ] 2 , A is a Rademacher matrix and W ≜ A.

(1) If n < p, then for any j ∈ [p],

√ L
n(β̂W j5pt − βj∗ ) −
→ N 0, σ 2

as p, n → ∞. (3.16)

(2) If n log n is o(p), then for any i ∈ [n],

δ̂W i5pt − δi∗ L

→ N 0, σ 2 as p, n → ∞,

p − (3.17)
ΣAii

where ΣA is defined as follows:

⊤
1 1
ΣA ≜ In − AA⊤ In − AA ⊤
. (3.18)
n n

L
Here −
→ denotes the convergence in law/distribution. ■

Remarks on Theorem 5

1. The asymptotic distributions of the LHS terms in (3.16) and (3.17) do not depend on A. These
distributions are asymptotically Gaussian because the noise vector η is normally distributed.

2. Theorem 5 provides the key result to develop a testing procedure corresponding to Aims (i) and
(ii).

3. If n is ω[((s + r) log p)2 ] then Lemma 6 implies that the Rademacher matrix A satisfies EREC.

4. The condition n < p in Result (1) emerges from (6.163) and (6.160), which are based on proba-
bilistic bounds on the singular values of random Rademacher matrices [41]. For the special case
where n = p (which is no longer a compressive regime), these bounds are no longer applicable,
and instead results such as [47, Thm. 1.2] can be used.
2 f (n)
Given functions f (n) and g(n) of n ∈ R, we say that f (n) is ω(g(n)) if limn→∞ = ∞, i.e. f (n) asymptotically
g(n)
‘dominates’ g(n).

17
Drlt for β ∗ : In Aim (i), we intended to develop a statistical test to determine whether a sample is
defective or not. Given the significance level α ∈ [0, 1], for each j ∈ [p], we reject the null hypothesis
G0,j : βj∗ = 0 in favor of G1,j : βj∗ ̸= 0 when
√
n|β̂W j |/σ > zα/2 , (3.19)

where zα/2 is the upper (α/2)th quantile of a standard normal random variable.
Drlt for δ ∗ : In Aim (ii), we intended to develop a statistical test to determine whether or not a
pooled measurement is affected by MMEs. Given the significance level α ∈ [0, 1], for each i ∈ [n], we
reject the null hypothesis H0,i : δi∗ = 0 in favor of H1,i : δi∗ ̸= 0 when
p
|δ̂W i |/ σ ΣAii > zα/2 . (3.20)

A desirable property of a statistical test is that the probability of rejecting the null hypothesis when
the alternate is true converges to 1 as n → ∞ (referred to as a consistent test). Theorem 5 ensures
that the proposed Drlts are consistent. Additionally, Theorem 5 shows that probability of rejecting
the null hypothesis when the null is true converges to α (referred to as an asymptotically unbiased
test). Further, the sensitivity and specificity (as defined in Sec. 3.5) of both these tests approach 1 as
n, p → ∞.

3.3 Optimal Debiased Lasso Test

It is well known that a statistical test based on a statistic with smaller variance is generally more
powerful than that based on a statistic with higher variance [11]. Therefore, we wish to design a
weight matrix W that reduces the variance of the debiased robust Lasso estimate to construct an
‘optimal’ debiased robust Lasso test. Hence we choose W so as to minimize the sum of the asymptotic
variances of the debiased estimates {β̂W j }pj=1 . The procedure for the design of W is presented in Alg. 3.
Rearranging (3.14) and (3.15) followed by some algebra, we obtain the following expressions:

∗ 1 ⊤ 1 ⊤ 1
β̂W − β = W η + Ip − W A (β ∗ − β̂λ1 ) + W ⊤ δ ∗ − δ̂λ2 , (3.21)
n n n
| {z }
bias terms

∗ 1 ⊤ 1 ⊤ 1
δ̂W − δ = In − W A η + In − W A A(β ∗ − β̂λ1 ) − W A⊤ δ ∗ − δ̂λ2 (3.22)
.
n n n
| {z }
bias terms

Note that the first term on the RHS of both (3.35) and (3.36) is zero-mean Gaussian. The remaining
two terms in both equations are bias terms. In order to develop an optimal hypothesis test for the
debiased robust Lasso, we show that (i ) the variances of the first term on the RHS of (3.35) and
(3.36) are bounded with appropriate scaling as n, p → ∞; and (ii ) the two bias terms in (3.35) and
(3.36) go to 0 in probability as n, p → ∞. In such a situation, the sum of the asymptotic variance of
2 P
the elements of β̂W will be nσ2 pj=1 w.j ⊤ w.j .

18
Algorithm 3 Design of W
Input: A, µ1 , µ2 and µ3
Output: W
1: We solve the following optimisation problem :

p
X
minimize w.j ⊤ w.j
W
j=1

subject to C0 : w.j ⊤ w.j /n ≤ 1 ∀ j ∈ [p]

1 ⊤
C1 : Ip − W A ≤ µ1 ,
n ∞

1 1 ⊤
C2 : In − W A A ≤ µ2 ,
p n ∞
⊤

n WA p
C3 : q − In ≤ µ3 ,
p 1 − np n n ∞

q q q
where µ1 ≜ 2 2 log(p)
n , µ 2 ≜ 2 log(2np)
np + 1
n and µ3 ≜ √ 2 2 log(n)
p .
1−n/p
2: If the above problem is not feasible, then set W = A. =0

Theorem 6 (given below) establishes that the second and third term on P the RHS of both (3.35)
and (3.36) go to 0 in probability. We design W to minimize the expression pj=1 w.j ⊤ w subject to
.j
constraints C0, C1, C2, C3 on W , as summarized in Alg. 3. The values of µ1 , µ2 , µ3 are selected in
such a way that each of the constraints C1, C2, C3 in Alg. 3 holds with high probability for the choice
W ≜ A, as will be formally established in Lemma 10. These constraints are derived from Theorem 6
and ensure that the bias terms go to 0. In particular, the constraint C1 (via µ1 ) controls the rate
of convergence of bias terms on the RHS of (3.35), whereas the constraint C2 (via µ2 ) controls the
rate of convergence of bias terms on the RHS of (3.36). Furthermore, the constraint C3 allows us to
control the asymptotic variance of the first term on RHS of (3.36) (as will be shown via Theorem 7).
Essentially, the choice W ≜ A helps us establish that the set of all possible W matrices which satisfy
the constraints in Alg. 3 is non-empty with high probability. Finally, Theorem 7 establishes that the
variances of the first term on the RHS of (3.35) and (3.36) converge. These theorems play a vital role
in deriving Theorem 8 that leads to developing the optimal debiased robust Lasso tests.

Theorem
√
6 Let√β̂ λ1 , δ̂λ2 be as in (3.6), β̂W , δ̂W be as in (3.14), (3.15) respectively and set λ1 ≜
4σ √log p
n
, λ2 ≜ 4σ nlog n . Let A be a random Rademacher matrix and let W be obtained from Alg. 3.
Then if n is o(p) and n is ω[((s + r) log p)2 ], as p, n → ∞, we have:

1.
√

1 ⊤
n Ip − W A (β ∗ − β̂λ1 ) = oP (1). (3.23)
n ∞

2.
1
√ W ⊤ δ ∗ − δ̂λ2 = oP (1). (3.24)
n ∞

3.
n 1 ⊤
p In − W A A(β ∗ − β̂λ1 ) = oP (1). (3.25)
p 1 − n/p n
∞

4.
n 1
W A⊤ δ ∗ − δ̂λ2

p = oP (1). (3.26)
p 1 − n/p n ∞

19
■

Define the following matrices:

1 1
⊤
Σβ ≜ Var √ W η = σ 2 W ⊤ W , (3.27)
n n
⊤
1 ⊤ 2 1 ⊤ 1 ⊤
Σδ ≜ Var In − W A η = σ In − W A In − W A . (3.28)
n n n

Note that Σβ /n and Σδ are the variance-covariance matrix of the first terms of the RHS of (3.35)
and (3.36), respectively.
Theorem 7 shows that when W is chosen as per Alg. 3, the element-wise variances of the first
term of the RHS of (3.35) (diagonal elements of Σβ ) approach 1 in probability. The constraints C0
and C1 of Alg. 3 are mainly used to establish this theorem. Further, for the optimal choice of W as
in Alg. 3, we show that the element-wise variances of the first term of the RHS of (3.36) (diagonal
elements of Σδ ) goes to 1 in probability. To establish this, we use the constraint C3 of Alg. 3.

Theorem 7 Let A be a Rademacher matrix. Suppose W is obtained from Alg. 3 and Σβ and Σδ are
defined as in (3.37) and (3.38), respectively. If n log n is o(p) and n is ω[((s + r) log p)2 ], as n, p → ∞,
we have the following:

(1) For j ∈ [p],

P
Σβjj → σ 2 . (3.29)

(2) For i ∈ [n],

n2 P
Σδ → σ 2 . (3.30)
p2 (1 − n/p) ii
■

When we choose an optimal W as per the Alg. 3, the equations (3.35) and (3.36) along with Theorem 6
and Theorem 7 can be used to derive the asymptotic distribution of β̂W and δ̂W . This is accomplished
in Theorem 8, which can be viewed as a non-trivial extension of Theorem 5 for such an optimal choice
of W .

Theorem
√
8 Let√ β̂ λ1 , δ̂λ2 be as in (3.6), β̂W , δ̂W be as in (3.14), (3.15) respectively and set λ1 ≜
4σ √log p
n
, λ2 ≜ 4σ nlog n . Let A be a random Rademacher matrix and W be the debiasing matrix obtained
from Alg. 3. If n is ω[((s + r) log p)2 ] and n log n is o(p), then we have:

(1) For fixed j ∈ [p],

√
n(β̂W j5pt − βj∗ ) L
p → N(0, 1) as p, n → ∞.
− (3.31)
Σβjj

(2) For fixed i ∈ [n]

δ̂W i5pt − δi∗ L

p → N(0, 1) as p, n → ∞,
− (3.32)
Σδii

where Σβjj and Σδii are the j th and ith diagonal elements of matrices Σβ (as in (3.37)) and Σδ
(as in (3.38)), respectively.

20
Theorem 8 paves the way to develop an optimal Drlt for Aim (i) and (ii) of this work along a
similar line of development as the Drlt.
Optimal Drlt for β ∗ : As in Drlt for β ∗ , we now present a hypothesis testing procedure for an
optimally designed W to determine defective samples based on Theorem 8. As before, given α > 0
we reject the null hypothesis G0,j : βj∗ = 0 in favor of G1,j : βj∗ ̸= 0, for each j ∈ [p] when
√ q
nβ̂W j Σβjj > zα/2 , (3.33)

where zα/2 is as in (3.19).

Optimal Drlt for δ ∗ : As in Drlt for δ ∗ , we develop a hypothesis testing procedure based on
Theorem 8 corresponding to optimal W to determine whether or not a measurement in y is affected
by an effective MME. As before, given α > 0, for i ∈ [n], we reject the null hypothesis H0,i : δi∗ = 0 in
favor of H1,i : δi∗ ̸= 0 when
p
δ̂W i Σδii > zα/2 . (3.34)
Similar to the Drlts in (3.19) and (3.20), Theorem 8 ensures that the proposed Optimal Drlts
are consistent tests as well as asymptotically unbiased tests. This implies that the sensitivity and
specificity of both these tests approach 1 as n, p → ∞. In Section 3.5, we show superior numerical
results with the Optimal Drlt as compared to Drlt in some finite sample scenarios.

3.4 Simultaneous Confidence Intervals for Optimal Drlt’s

We will now to derive the simultaneous hypothesis testing of multiple parameters of β ∗ and δ ∗
respectively. Here, we want to test G0 : β ∗ K = 0 vs. G1 : β ∗ K ̸= 0, where K ⊂ [p] such that
∥K∥0 = k which is fixed as n, p → ∞. We further want to test H0 : δ ∗ L = 0 vs. G1 : δ ∗ L ̸= 0, where
L ⊂ [n] such that ∥L∥0 = l which is fixed as n, p → ∞. In order to perform such tests, let us first
recall the debiased lasso estimates of β ∗ and δ ∗ respectively.
√ √

∗ 1 ⊤ 1 ⊤ 1
n(β̂W − β )K = √ WK η + n Ip − W A (β ∗ − β̂λ1 ) + √ WK ⊤ δ ∗ − δ̂λ2 , (3.35)
n n K n

1 1 1
(δ̂W − δ ∗ )L = In − W A⊤ η + In − W A⊤ A(β ∗ − β̂λ1 ) − WL A⊤ δ ∗ − δ̂
(3.36)
λ2 .
n L n L n

Furthermore, the variance covariance matrices are given as follows:

1 ⊤ 1 ⊤
ΣβK ≜ Var √ WK η = σ 2 WK WK , (3.37)
n n
⊤
1 ⊤ 2 1 ⊤ 1 ⊤
ΣδL ≜ Var In − W A η = σ In − W A In − W A . (3.38)
n L n L n L

Note that, we assume σ 2 to be known. Now we will provide a theorem that provides the joint
√
distributions of n(β̂W − β ∗ )K of (3.35) and (δ̂W − δ ∗ )L of (3.36) which will aid us in creating the
joint tests and its corresponding confidence intervals.

Theorem 9 Given A is n × p dimensional p Rademacher matrix and W = copt A be the optimal

solution of Alg.3 with copt = (1 − µ3 1 − n/p)2 . Let K ⊂ [p] such that ∥K∥0 = k and L ⊂ [n] such
that ∥L∥0 = l with both k, l being fixed as n, p → ∞. Furthermore, let (β̂W − β ∗ )K and (δ̂W − δ ∗ )L
be as defined in (3.35),(3.36) and ΣβK and ΣδL be as defined in (3.37) and (3.38) respectively. If
n log n is o(p) and n is ω(((s + r) log(p))2 ) then, we have,
√ √ D
{ n(β̂W − β ∗ )K }⊤ ΣβK −1 { n(β̂W − β ∗ )K } → χ2k , (3.39)
D
{(δ̂W − δ ∗ )L }⊤ ΣδL −1 {(δ̂W − δ ∗ )L } → χ2l , (3.40)

21
This theorem provides us with the following simultaneous hypothesis test.
Simultaneous test for β: Let K ⊂ [p] such that ∥K∥0 = k. We reject the null hypothesis G0 :
∗ = 0 vs the alternate G : β ∗ ̸= 0 at α% level of significance if:
βK 1 K
√ √
{ n(β̂W )K }⊤ ΣβK −1 { n(β̂W )K } > χ2k,1−α (3.41)

where χ2k,1−α is the upper α% point of a χ2k distribution.

∗ =0
Simultaneous test for δ: Let L ⊂ [n] such that ∥L∥0 = l. We reject the null hypothesis H0 : δL
∗ ̸= 0 at α% level of significance if:
vs the alternate H1 : δL

{(δ̂W − δ ∗ )L }⊤ ΣδL −1 {(δ̂W − δ ∗ )L } > χ2l,1−α (3.42)

where χ2l,1−α is the upper α% point of a χ2l distribution.

We will now state some supporting Lemmas for Theorem 9. In these Lemmas we show that all the
cross-product terms of the quadratic form in (3.39) and (3.40) goes to 0 in probability. All the proofs
are in the Appendix.

Lemma 2 Given A is n × p dimensional Rademacher matrix and W be the optimal solution of Alg.3.
Let K ⊂ [p] such that ∥K∥0 = k and L ⊂ [n] such that ∥L∥0 = l with both k, l being fixed as n, p → ∞.
Furthermore, let (β̂W − β ∗ )K be as defined in (3.35) and ΣβK be as defined in (3.37). If n log n is
o(p) and n is ω(((s + r) log(p))2 ) then, we have,
( )⊤ ( )
√ ⊤ −1 √ ⊤ P
n Ip − n W A K (β ∗ − β̂λ1 ) ΣβK
1
n Ip − n W A K (β ∗ − β̂λ1 ) → 0.
1

1.

( )⊤ ( )

P
2. √1 WK ⊤ δ ∗ − δ̂λ2 ΣβK −1 √1 WK ⊤ δ ∗ − δ̂λ2 →0
n n

( )⊤ ( )
√1 WK ⊤ η −1 √ P
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 )

3. 2 n
ΣβK K
→ 0.

( )⊤ ( )

P
4. 2 √1 WK ⊤ η ΣβK −1 √1 WK ⊤ δ ∗ − δ̂λ2 → 0.
n n

( )⊤ ( )
√
P
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 ) ΣβK −1 √1 WK ⊤ δ ∗ − δ̂λ2

5. 2 K n
→ 0.

Lemma 3 Given A is n × p dimensional Rademacher matrix and W be the optimal solution of Alg.3.
Let K ⊂ [p] such that ∥K∥0 = k and L ⊂ [n] such that ∥L∥0 = l with l being fixed as n, p → ∞.
Furthermore, let (δ̂W − δ ∗ )L be as defined in (3.36) and ΣδL be as defined in (3.38). If n log n is o(p)
and n is ω(((s + r) log(p))2 ) then, we have,
( )⊤ ( )
P
In − n1 W A⊤ L A(β ∗ − β̂λ1 ) ΣδL −1 In − n1 W A⊤ L A(β ∗ − β̂λ1 ) → 0.

1.

( )⊤ ( )

⊤ P
2. 1
n WL A δ ∗ − δ̂λ2 ΣδL −1 1
n WL A
⊤
δ ∗ − δ̂λ2 → 0.

( )⊤ ( )
P
In − n1 W A⊤ L η ΣδL −1 In − n1 W A⊤ L A(β ∗ − β̂λ1 )

3. 2 → 0.

22
( )⊤ ( )

⊤ −1 ⊤ P
1 1
δ∗

4. 2 In − nW A L
η ΣδL n WL A − δ̂λ2 → 0.

( )⊤ ( )

⊤ −1 ⊤ P
1
A(β ∗ 1
δ ∗ − δ̂λ2

5. 2 In − nW A L
− β̂λ1 ) ΣδL n WL A → 0.

In the upcoming Lemma, we show that inverse of the covariance matrices of βˆ∗ W and δ̂ W are
strictly positive and converges to a constant asymptotically.

Lemma 4 Given A is n × p dimensional Rademacher matrix and W be the optimal solution of Alg.3.
Let K ⊂ [p] such that ∥K∥0 = k and L ⊂ [n] such that ∥L∥0 = l with both k, l being fixed as n, p → ∞.
Furthermore, let ΣβK and ΣδL be as defined in (3.37) and (3.38) respectively. Then, we have,

1.  
XX c26 k 2
P [ΣβK −1 ]lj ≤ p  ≥ 1 − ((ψ)n−k+1 + (c5 )p ). (3.43)
j∈K l∈K
ψ 2 c2opt (1 − (k − 1)/n)2

2.
!
1 XX
−1 l2 (1 − n/p)
P n2
[ΣδL ]ik ≤ p
p2 (1−n/p) i∈L k∈L
(copt ϵ21 (1 − n/p)2 − n/p)2
≥ 1− {(c6 ϵ1 )n−k+1 + (c5 )p }. (3.44)

[ΣβK −1 ]lj = OP (1) and 1

[ΣδL −1 ]ik =
P P P P
Furthermore if n is o(p), then j∈K l∈K n2 i∈L k∈L
p2 (1−n/p)
OP (1). ■

3.5 Experimental Results

Data Generation: We now describe the method of data generation for our simulation study. We
synthetically generated signals (i.e., β ∗ ) with p = 500 elements in each. For the non-zero values of
β ∗ , 40% were drawn i.i.d. from U (50, 100) and the remaining 60% were drawn i.i.d. from U (500, 103 ),
and were placed at randomly chosen indices. The elements of the pooling matrix B were drawn i.i.d.
from Bernoulli(0.5), thereby producing A ≜ 2B − 1n 1p ⊤ where A is Rademacher distributed. In
order to generate effective MMEs, sign changes were induced in an adversarial manner in randomly
chosen rows of A and at column indices corresponding to the non-zero locations of β ∗ . This yielded
the perturbed matrix Â, produced via an adversarial form of the model mismatch error (MMEs) for
bit-flips which will be described in the following paragraph. Define the fractions fsp ≜ s/p, fadv ≜ r/n
to express signal sparsity and a fraction of the number of measurements with effective MMEs respec-
tively. We chose the noise standard deviation P σ to be a fraction of the mean absolute value of the
noiseless measurements, i.e., we set σ ≜ fσ ni=1 |ai. β ∗ |/n where 0 < fσ < 1. For different simula-
tion scenarios, different values of s = ∥β ∗ ∥0 (via fsp ), r = ∥δ ∗ ∥0 (via fadv ), noise standard deviation
σ (via fσ ) and number of measurements n were chosen, as will be described in the following paragraphs.

Choice of Model Mismatch Error: In our work, all effective MMEs were generated in the following
manner: In our convention, a bit-flipped pool (measurement as described in (3.5)) contains exactly
one bit-flip at a randomly chosen index. Suppose that the ith pool (measurement) contains a bit-flip.
Then exactly one of the following two can happen: (1) some j th sample that was intended to be in
the pool (as defined in A) is excluded, or (2) some j th sample that was not intended to be part of the
pool (as defined in A) is included. These two cases lead to the following changes in the ith row of Â
(as compared to the ith row of A), and in both cases the choice of j ∈ [n] is random uniform: Case

23
1: âij = −1 but aij = 1, Case 2: âij = 1 but aij = −1. Note that under this scheme, the generated
MMEs may not be effective. Hence MMEs were applied in an adversarial setting by inducing bit-flips
only at those entries in any row of Â corresponding to indices with non-zero values of β ∗ .

Choice of Regularization Parameters: The regularization parameters λ1 and λ2 were chosen

such that both log(λ1 ) ∈ [1 : 0.25 : 7], log(λ2 ) ∈ [1 : 0.25 : 7], in the following manner: We first
identified values of λ1 and λ2 fromp this range such pthat the
Lilliefors test [36] confirmed the Gaussian
√
distribution for both nβ̂W j / Σβjj and δ̂W i / Σδii (see Odrlt as in (3.33) and (3.34)) at the
1% significance level, for at least 70% of j ∈ [p] (coordinates of β ∗ ) and i ∈ [n] (coordinates of δ ∗ ). Out
of these chosen values, we determined the value of λ1 , λ2 that minimized the average cross-validation
error over 10 folds. In each fold, 90% of the n measurements (denoted by a sub-vector yr correspond-
ing to sub-matrix Ar ) were used to obtain (β̂λ1 , δ̂λ2 ) via the robust Lasso, and the remaining 10%
of the measurements (denoted by a sub-vector ycv corresponding to measurements generated by the
sub-matrix Acv ) were used to estimate the cross-validation error ∥ycv − Acv β̂λ1 − Icv δ̂λ2 ∥22 . Note
that Icv is a sub-matrix of the identity matrix which samples only some elements of y and hence of
δ̂cv . The cross-validation error is known to be an observable, data-driven proxy for the mean-squared
error [57], which justifies its choice as a method for parameter selection.

Evaluation Measures of Hypothesis Tests: Many different variants of the Lasso estimator
were compared empirically against each other as will be described in subsequent subsections. Each of
them were implemented using the CVX package in MATLAB. Results for the hypothesis tests (given
in (3.19),(3.20),(3.33) and (3.34)) are reported in terms of sensitivity and specificity (defined below).
The significance level of these tests is chosen at 1%. Consider a binary signal b̂β with p elements. In
our simulations, a sample at index j in β̂W is declared to be defective if the hypothesis test G0,j is
rejected, in which case we set b̂β,j = 1. In all other cases, we set b̂β,j = 0. We declare an element to
be a true defective if βj∗ ̸= 0 and b̂β,j ̸= 0, and a false defective if βj∗ = 0 but b̂β,j ̸= 0. We declare it to
be a false non-defective if βj∗ = 0 but b̂β,j ̸= 0, and a true non-defective if βj∗ = 0 and b̂β,j = 0. The
sensitivity for β ∗ is defined as (# true defectives)/(# true defectives + # false non-defectives) and
specificity for β ∗ is defined as (# true non-defectives)/(# true non-defectives + # false defectives).
We report the results of testing for the debiased tests using: (i ) W ≜ A corresponding to Drlt
(see (3.19) and (3.20)), and (ii ) the optimal W using Alg. 3 corresponding to Odrlt (see (3.33) and
(3.34)).

3.5.1 Results with Baseline Debiasing Techniques in the Presence of Effective

MMEs
We now describe the results of an experiment to show the impact of Odrlt (3.33) in the presence
of effective MMEs induced in A. We compare Odrlt with the baseline hypothesis test for β ∗ as
defined by [31], which is equivalent to ignoring MMEs (i.e., setting δ ∗ = 0 in (3.5)). Considering the
presence of effective MMEs, we further compare Odrlt with the baseline test defined in [31] which
would use theq approximate inverse of the augmented sensing matrix (A|In ) as obtained from Alg. 2
log(n+p)
with µ = 2 n . We now describe these two chosen baseline hypothesis tests for β ∗ in more
detail.

1. Baseline ignoring MMEs: (Baseline-1) This approach computes the following ‘debiased’
estimate of β ∗ as given in Equation (5) of [31]:
1
β̂b ≜ β̂λ,b + M A⊤ (y − Aβ̂λ,b ), (3.45)
n

where β̂λ,b ≜ argminβ ∥y − Aβ∥22 + λ∥β∥1 , and M is the approximate inverse of A obtained
from Alg. 2. In this baseline approach, we reject the null hypothesis G0,j : βj∗ = 0 in favor of
√ q
G1,j : βj∗ ̸= 0, for each j ∈ [p] when nβ̂bj / σ 2 [M A⊤ AM ⊤ ]jj /n > zα/2 .

24
n Sens - B-1 Sensi-B-2 Sens-Odrlt Spec-B-1 Spec-B-2 Spec-Odrlt
100 0.522 0.602 0.647 0.678 0.702 0.771
200 0.597 0.682 0.704 0.832 0.895 0.931
300 0.698 0.802 0.878 0.884 0.915 0.963
400 0.791 0.834 0.951 0.902 0.927 0.999
500 0.858 0.894 0.984 0.923 0.956 1

Table 3.1: Comparison of average Sensitivity (Sens) and Specificity (Spec) (based on 100 independent
noise runs) for the tests Baseline-1 (B-1), Baseline-2(B-2) and Odrlt for determining defectives
in β ∗ from their respective debiased estimates in the presence of MMEs induced in A (See Sec. 3.5.1
for detailed definitions).

2. Baseline considering MMEs: (Baseline-2) In this approach, we consider the MMEs which
is equivalent to the sensing matrix as (A|In ) and signal vector x∗ = (β ∗ ⊤ , δ ∗ ⊤ )⊤ . The ‘debiased’
estimate of x∗ in this approach is given as:
1
x̃b ≜ x̃λ + M̃ (A|In )⊤ (y − (A|In )x̃λ ), (3.46)
n

where x̃λ ≜ argminβ ∥y − (A|In )x∥22 + λ∥x∥1 and M̃ is the approximate inverse of (A|In )
obtained from Alg. 2. Then β̃b is obtained by extracting the first p elements of x̃b . In this
approach, we reject the null hypothesis G0,j : βj∗ = 0 in favor of G1,j : βj∗ ̸= 0, for each j ∈ [p]
√
q
when nβ̃bj / σ 2 [M̃ (A|In )⊤ (A|In )M̃ ⊤ ]jj /n > zα/2 .

Note that the theoretical results established in [31] hold for completely random or purely deterministic
sensing matrices, whereas the sensing matrix corresponding to the MME model, i.e., (A|In ), is partly
random and partly deterministic. Nonetheless, the second baseline test, i.e. Baseline-2 with the
augmented matrix, is useful as a numerical benchmark. For both baseline approaches, the regular-
ization parameter λ was chosen using cross validation. We chose the λ value which minimized the
validation error with 90% of the measurements used for reconstruction and the remaining 10% used
for cross-validation. In Table 3.1, we compare the average values (over 100 instances of measurement
noise) of Sensitivity and Specificity of Baseline-1, Baseline-2 and Odrlt for different values of n
varying in {100, 200, 300, 400, 500} and p = 500. It is clear from Table 3.1, that for all the values of
n, the Sensitivity and Specificity value of Odrlt is higher as compared to that of Baseline-1 and
Baseline-2. The performance of Baseline-2 dominates Baseline-1 which indicates that ignoring
MMEs may lead to misleading inferences in small sample scenarios. Furthermore, the Sensitivity and
Specificity of Odrlt approaches 1 as n increases. This highlights the superiority of our proposed
technique and its associated hypothesis tests over two carefully chosen baselines. Note that there is
no prior literature on debiasing in the presence of MMEs, and hence these two baselines are the only
possible competitors for our technique.

3.5.2 Empirical verification of asymptotic results of Theorem 8

√
In this subsection, we compare the empirical distribution of TG,j ≜ n(β̂W j − βj∗ )/ [Σβ ]jj and
p

TH,i ≜ (δ̂W i − δi∗ )/ [Σδ ]ii , for the optimal weight matrix W , with its asymptotic distribution N(0, 1)
p

as derived in Theorem 8. We chose p = 500, n = 400, fadv = 0.01, fsp = 0.01 and fσ = 0.01. The
measurement vector y was generated with a perturbed matrix Â containing effective MMEs using the
procedure described earlier. Here, TG,j and TH,i were computed for 100 runs across different noise
instances in η.
The left sub-figure of Fig. 3.1 shows plots of the quantiles of a standard normal random variable
versus the quantiles of TG,j computed over 100 runs for each j ∈ [p]. For the quantiles, each plot is
presented in a different color. A 45◦ straight line passing through the origin is also plotted (black solid
line) as a reference. These p different quantile-quantile (QQ) plots corresponding to j ∈ [p], all super-
imposed on one another, indicate that the quantiles of the {TG,j }pj=1 are close to that of a standard

25
Figure 3.1: Left: Quantile-Quantile plots of N(0, 1) vs. TG,j (defined at the beginning of Sec. 3.5.2)
using 100 independent noise runs for all j ∈ [p] (one plot per index j with different colors). Right:
Quantile-Quantile plots of N(0, 1) vs. TH,i (defined at the beginning of Sec. 3.5.2) using 100 indepen-
dent noise runs for all i ∈ [n] (one plot per index i with different colors). For both plots, the pooling
matrix contained MMEs.

normal distribution in the range of [−2, 2] (thus covering 95% range of the area under the standard
bell curve) for defective as well as non-defective samples. This confirms that the distribution of the
TG,j values is each approximately N(0, 1), even in this chosen finite sample scenario. Similarly, the
right sub-figure of Fig. 3.1 shows the QQ-plot corresponding to TH,i for each i ∈ [n] in different colors.
As before, these n different QQ-plots, one for each i ∈ [n], all super-imposed on one another, indicate
that the {TH,i }ni=1 values are also each approximately standard normal, with or without MMEs.

3.5.3 Sensitivity and Specificity of Odrlt and Drlt for δ ∗

We performed experiments to study sensitivity and specificity of Drlt and Odrlt for δ ∗ . In exper-
imental setup E1, we varied fadv ∈ {0.01, ..., 0.1} with fixed values n = 400, fsp = 0.01, fσ = 0.1. In
E2, we varied n from 200 to 500 in steps of 50 with fadv = 0.01, fsp = 0.01, fσ = 0.1. In E3, we varied
fσ ∈ {0, 0.05, ..., 0.5} with n = 400, fadv = 0.01, fsp = 0.01. In E4, we varied fsp ∈ {0.01, ..., 0.1} with
n = 400, fadv = 0.01, fσ = 0.1. The experiments were run 100 times across different noise instances
in η, for the same signal β ∗ (in E1, E2 and E3) and sensing matrix A (in E1, E3 and E4). In E4, the
sparsity of the signal varies, therefore, the signal vector β ∗ also varies. Similarly, in E2 as n varies,
the sensing matrix A also varies.
The empirical sensitivity and specificity of a test was computed as follows. The estimate δ̂W was
binarized to create a vector b̂W,δ such that for all i ∈ [n], the value of b̂W,δ (i) was set to 1 if Drlt
or Odrlt rejected the hypothesis H0,i , and b̂W,δ (i) was set to 0 otherwise. Likewise, a ground truth
binary vector b∗δ was created which satisfied b∗δ (i) = 1 at all locations i where δi∗ ̸= 0 and b∗δ (i) = 0
otherwise. Sensitivity and specificity values were computed by comparing corresponding entries of
b∗δ and b̂W,δ . The sensitivity of Drlt and Odrlt test for δ ∗ averaged over 100 runs of different η
instances is reported in Fig. 3.2 for the different experimental settings E1, E2, E3, E4. Under setup E2,
the sensitivity plot indicates that the sensitivity of Drlt and Odrlt increases as n increases. Under
setups E1, E3, and E4, the sensitivity of both Drlt and Odrlt is reasonable even with larger values
of fadv , fσ , and fsp (which are difficult regimes). In Fig. 3.2, we compare the sensitivity of Drlt and
Odrlt to that of Robust Lasso from (3.6) without any debiasing step, which is abbreviated as Rl.
To determine defectives and non-defectives for the Rl method, we adopted a thresholding strategy
where an estimated element was considered defective (resp. non-defective) if its value was greater than
or equal to (resp. less than) a threshold τss . The optimal value of τss was chosen clairvoyantly (i.e.,
assuming knowledge of the ground truth signal vector β ∗ ) on a training set so as to maximize Youden’s
index defined as Sensitivity + Specificity − 1. Furthermore, Fig. 3.2 indicates that the sensitivity of
Odrlt is superior to that of Rl and Drlt with Drlt also slightly better than Rl. Note that, in
practice, a choice of the threshold τss for Rl would be challenging and require a representative training
set, whereas Drlt and Odrlt do not require any training set for the choice of such a threshold.

26
Figure 3.2: Average Sensitivity and Specificity plots (over 100 independent noise runs) for detecting
measurements containing MMEs (i.e. detecting non-zero values of δ ∗ ) using Drlt, Odrlt and Robust
Lasso (Rl). The experimental parameters are p = 500, fσ = 0.1, fadv = 0.01, fsp = 0.1, n = 400. Left
to right, top to bottom: results for experiments E1, E2, E3, E4 (see Sec. 3.5.3 for details).

3.5.4 Identification of Defective Samples in β ∗

In the next set of experimental results, we first examined the effectiveness of Drlt and Odrlt to
detect defective samples in β ∗ in the presence of bit-flips in A induced as per adversarial MMEs.
We compared the performance of Drlt and Odrlt to two other closely related algorithms to enable
performance calibration: (1) Robust Lasso (Rl) from (3.6) without debiasing; (2) A hypothesis
testing mechanism on a pooling matrix without model mismatch, which we refer to as Baseline-3.
In Baseline-3, we generated measurements with the correct pooling matrix A (i.e., δ ∗ = 0) and
obtained a debiased Lasso estimate as given by (3.11). (Note that Baseline-3 is very different
from Baseline-1 and Baseline-2 from Sec. 3.5.1 as in this approach δ ∗ = 0.) Using this debiased
estimate, we obtained a hypothesis test similar to Equation (5) of [31]. In the case of Rl, the
decision regarding whether a sample is defective or not was taken based on a threshold τss that was
chosen to maximize the Youden’s index on a training set of signals from the same distribution. The
regularization parameters λ1 , λ2 were chosen separately for every choice of parameters fadv , fσ , fsp and
n.
We examined the variation in sensitivity and specificity with regard to change in the following
parameters, keeping all other parameters fixed: (EA) the number of bit-flips in the matrix A expressed
as a fraction fadv ∈ [0, 1] of n; (EB) number of pools n; (EC) noise standard deviation σ expressed as
a fraction fσ ∈ [0, 1] of the quantity ȳ defined in Sec. 3.5.2; (ED) sparsity (ℓ0 norm) s of vector β ∗
expressed as fraction fsp ∈ [0, 1] of signal dimension p. For the bit-flips experiment i.e., (EA), fadv was
varied in {0.01, 0.02, . . . , 0.1} with n = 400, fsp = 0.01, fσ = 0.1. For the measurements experiment
(EB), n was varied over {200, 150, . . . , 500} with fsp = 0.01, fadv = 0.01, fσ = 0.1. For the noise
experiment (i.e., (EC), we varied fσ in {0, 0.05, . . . , 0.5} with n = 400, fsp = 0.01, fadv = 0.01. For the
sparsity experiment (i.e., (ED), fsp was varied in {0.01, 0.02, . . . , 0.1} with n = 400, fadv = 0.01, fσ =
0.1.
The Sensitivity and Specificity values, averaged over 100 noise instances, for all four experiments
are plotted in Fig. 4.1. The plots demonstrate the superior performance of Odrlt over Rl and Drlt.
Furthermore, the performance of Drlt is also superior to Rl. In all regimes, Baseline performs
best as it uses an error-free sensing matrix. We also see that for higher n, lower fσ and lower fsp , the
sensitivity and specificity for Odrlt comes very close to that of Baseline.

27
Figure 3.3: Average Sensitivity and Specificity plots (over 100 independent noise runs) plots for
detecting defective samples (i.e., non-zero values of β ∗ ) using Drlt, Odrlt, Robust Lasso and
Baseline 3. Left to right, top to bottom: results for experiments (EA), (EB), (EC), (ED). The
experimental parameters are p = 500, fσ = 0.1, fadv = 0.01, fsp = 0.1, n = 400. See Sec. 3.5.4 for more
details.

3.5.5 RRMSE Comparison of Debiased Robust Lasso Techniques to Baseline Al-

gorithms
We computed estimates of β ∗ using the debiased robust Lasso technique in two ways: (i ) with the
weights matrix W ≜ A, and (ii ) the optimal W as obtained using Alg. 3. We henceforth refer to these
estimators as Debiased Robust Lasso (Drl) and Optimal Debiased Robust Lasso (Odrl) respectively.
We computed the relative root mean squared error (RRMSE) for Drl and Odrl as follows: First,
the pooled measurements with MMEs were identified as described in Sec. 3.5.3 and then discarded.
From the remaining measurements, an estimate of β ∗ was obtained using robust Lasso with the
optimal λ1 , λ2 chosen by cross-validation. Given the resultant estimate β̂, the RRMSE was computed
as ∥β ∗ − β̂∥2 /∥β ∗ ∥2 .
We compared the RRMSE of Drl and Odrl to that of the following algorithms:

1. Robust Lasso or Rl from (3.6).

2. Lasso (referred to as L2) based on minimizing ∥y − Aβ∥22 + λ∥β∥1 with respect to β. Note
that this ignores MMEs.

3. An inherently outlier-resistant version of Lasso which uses the ℓ1 data fidelity (referred to as
L1), based on minimizing ∥y − Aβ∥1 + λ∥β∥1 with respect to β.

4. Variants of L1 and L2 combined with the well-known Ransac (Random Sample Consensus)
framework [19] (described below in more detail). The combined estimators are referred to as
Rl1 and Rl2 respectively.

Ransac is a popular randomized robust regression algorithm, widely used in computer vision [20,
Chap. 10]. We apply it here to the signal reconstruction problem considered in this paper. In Ransac,
multiple small subsets of measurements from y are randomly chosen. Let the total number of subsets
be NS . Let the set of the chosen subsets be denoted by {Zi }N i=1 . From each subset Zi , the vector
S

(i)
β̂ is estimated, using either L2 or L1. Every measurement is made to ‘cast a vote’ for one of the
(i) (j)
models from the set {β̂ }N i=1 . We say that measurement yl (where l ∈ [n]) casts a vote for model β̂
S

(j) (k)
(where j ∈ [NS ]) if |yl − al. β̂ | ≤ |yl − al. β̂ | for k ∈ [NS ], k ̸= j. Let the model which garners the

28
Figure 3.4: Average RRMSE comparison (over 100 independent noise runs) using Odrlt, Drlt, L1
(L1 Lasso), L2 (L2 Lasso), RL1 (L1 Lasso with Ransac), the RL2 (L2 Lasso with Ransac),
and robust Lasso (Rl) w.r.t. variation in the following parameters keeping others fixed: bit-flip
proportions fadv as in setup (EA) (topleft), measurements n (top right) as in setup (EB), noise level fσ
as in setup (EC) (bottom left) and sparsity fsp as in setup (ED) (bottom right). The fixed parameters
are dimension of p = 500, fσ = 0.1, fadv = 0.01, fsp = 0.01, n = 400. See Sec. 3.5.5 for more details.

js
largest number of votes be denoted by β̂ , where js ∈ [NS ]. The set of measurements which voted for
this model is called the consensus set. Ransac when combined with L2 and L1 is respectively called
Rl2 and Rl1. In Rl2, the estimator L2 is used to determine β ∗ using measurements only from the
consensus set. Likewise, in Rl1, the estimator L1 is used to determine β ∗ using measurements only
from the consensus set.
Our experiments in this section were performed for signal and sensing matrix settings identical
to those described in Sec. 3.5.4. The performance in all experiments was measured using RRMSE,
averaged over reconstructions from 100 independent noise runs. For all techniques, the regularization
parameters were chosen using cross-validation following the procedure in [57]. The maximum number
of subsets for finding the consensus set in Ransac was set to NS = 500 with 0.9n measurements in
each subset. RRMSE plots for all competing algorithms are presented in Fig. 3.4, where we see that
Odrl and Drl outperformed all other algorithms for all parameter ranges considered here. We also
observe that Odrl produces lower RRMSE than Drl, particularly in the regime involving higher
fadv .
Algorithm Implementation: A MATLAB implementation of the algorithms in this paper can be
found at https://github.com/Shuvayan21/DRLT-for-MMEs.

29
Chapter 4

Correction of Group Membership

Specification Errors in Group Testing

All algorithms for Group testing or Compressed Sensing assume that A is known accurately. However
a technician may make errors while implementing the pooling procedure [3, 18, 24, 55]. That is, we
consider the case where due to errors in mixing of the samples, the pools are generated using an
unknown matrix Â (say) instead of the pre-specified matrix A. The elements of matrix Â and A
are equal everywhere except for the misspecified samples in each pool. We refer to these errors in
group membership specifications as ‘bit-flips’. For example, suppose that the ith pool is specified to
consist of samples j1 , j2 , j3 ∈ [p]. But due to errors during pool creation, the ith pool is generated
using samples j1 , j2 , j5 . In this specific instance, ai,j3 ̸= âi,j3 and ai,j5 ̸= âi,j5 .
Previously, we proposed a method for determining health status values that is resilient to a limited
number of bit-flips Chapter 3. This method uses a ‘debiased’ version of the robust Lasso estimator,
through which we designed hypothesis tests to achieve two objectives: (i ) identifying the unhealthy
subjects by detecting non-zero entries in the health status vector, and (ii ) identifying rows in the
design matrix affected by Model Mismatch Errors (MMEs). In this chapter, we present algorithms
aimed at correcting MMEs within pooled tests and subsequently reconstructing the signal vector based
on the corrected sensing matrix A.
We first address the problem of correction of Permutation Noise in Compressed Sensing.

4.1 Permutation Noise Correction

All algorithms for GT/CS assume that A is known accurately. However a technician may make
errors while implementing the pooling procedure [3, 18, 24, 55]. A specific common form of error is
permutation noise, i.e., when the pooled results are mistakenly mislabeled by the technician [55], for
example when the results of pools i1 ∈ [n], i2 ∈ [n], i1 ̸= i2 are swapped. In such a case, the technician
unknowingly records yi1 = ai2 ,. β ∗ + ηi2 (where ai2 ,. is the i2 th row of A) and yi2 = ai1 ,. β ∗ + ηi1 ,
although the original intention was to record yi1 = ai1 ,. β ∗ + ηi1 and yi2 = ai2 ,. β ∗ + ηi2 .
In this paper, we attempt to solve the problem of pooled inference in the presence of a small
number of permutation noise instances in y. We present a hypothesis testing approach based on our
work in Chapter 3 to identify the measurements that have been permuted. Furthermore, we provide
a greedy algorithm to correct the permutations in the measurements and reconstruct the signal based
on the corrected measurements. At the core of our theory, lies the concept of debiasing of Lasso
estimates [31]. Though the key theorem in this paper was developed by us in Chapter 3, it is applied
here in a different context, i.e. permutation noise in A, as opposed to bit-flip noise Chapter 3 in A
which involves errors by the technician in the choice of subsets for creation of a pool. Also, our work
here provides a correction algorithm unlike Chapter 3.
The pooling matrix is generally binary but measurements with a binary matrix can be easily
converted to those with a Rademacher matrix, as is commonly done in Poisson CS (e.g., [35, Equation
4]). In this paper, we henceforth assume that A is random Rademacher and every element of η is drawn
iid from N(0, σ 2 ) where σ > 0. As mentioned in Sec. ??, consider the case that the results yi1 , yi2 were

30
swapped. This means that the pooled results in y were obtained via an ‘actual’ pooling matrix Â
(unknown) which is different from the known (pre-specified) pooling matrix A with ∆A ≜ Â − A. If
yi1 , yi2 were swapped, we have âi1 ,. = ai2 ,. ̸= ai1 ,. and âi2 ,. = ai1 ,. ̸= ai2 ,. . Accounting for permutation
noise in y, we have the forward model:

y = Âβ + η = (A + ∆A)β ∗ + η = Aβ + δ ∗ + η, (4.1)

where δ ∗ ≜ ∆Aβ ∗ is the signal-dependent noise owing to permutation errors (PEs). In practice, one
expects to have very few PEs given a competent technician. Therefore, we assume that δ ∗ is sparse,
i.e., r ≜ ∥δ ∗ ∥0 , r ≪ n.

4.1.1 Correction Algorithm

After detecting the permuted measurements (denoted by the index set P) using Drlt, we can simply
choose to discard these measurements. We then estimate β ∗ using the remaining measurements. We
refer to this method as Drlt-D (‘D’ for discarding). We now provide a procedure, called Drlt-C
(‘C’ for correction), to correct the permutations after detecting them, as presented in Alg. 4. In

Algorithm 4 Correcting for Permutation noise

Input: Measurement vector y, Sensing matrix A, weight matrix W , the set P of detected permuted
measurements
Output: Corrected measurement vector yc
0: yc ← y
0: for index i ∈ P do
0: correct index ← i
0: max-pvalue = 0.05
0: for index j ∈ P, j ̸= i do
0: yc [i] ← y[j]
0: (β̂ λ1 , δ̂ λ2 ) ← arg min ∥yc − Aβ − δ∥2 + λ1 ∥β∥1 + λ2 ∥δ∥1
β,δ
δ̂W ← δ̂ λ2 + In− n1 W A⊤ (y − Aβ̂ λ1 − δ̂ λ2 )

0:
p
0: pvaluei ← 1 − Φ |δ̂W i |/ Σδii
0: if pvaluei > max-pvalue then
0: correct index ← j, max-pvalue ← pvaluei
0: end if
0: end for
0: yc [i] ← y[correct index] (correcting the value of yc [i] to correct for the PE )
0: end for=0

this procedure, we go through each pair of indices (i1 , i2 ), i1 ̸= i2 in the set of estimated permuted
measurements P and swap the values of yi1 , yi2 . For a fixed i1 , we retain the swap which yields the
highest p-value as per the hypothesis test TH . After correction of the measurements, we re-estimate
β ∗ using (3.6) (cf. Sec. 3.2). Compared to Drlt-D, we note that Drlt-C makes better use of the
available measurements. We also note that the correction procedure does not alter A, but alters only
y.

4.1.2 Experimental Results

Data Generation: We synthetically generated signals (i.e. β ∗ ) with p = 500 elements in each. The
non-zero values of β ∗ were drawn i.i.d. from U (50, 1000) and placed at randomly chosen indices.
We have s = ∥β ∗ ∥0 = fsp p where fsp ∈ [0, 1]. A was chosen to be a random Rademacher matrix.
Permutations were induced in randomly chosen pairs of pool indices (i1 , i2 ), i1 ̸= i2 . The number of
permutations is given by fperm n where fperm ∈ [0, 1]. We chose the noise standard P deviation σ to be
a fraction fσ of the mean absolute value of the noiseless measurements, i.e. σ ≜ fσ ni=1 |ai. β ∗ |/n.

31
Choice of Regularization Parameters: The parameters λ1 , λ2 were chosen such that log(λ1 ) ∈
Rλ , log(λ2 ) ∈ Rλ where Rλ ≜ [1, 1.25, 1.5, . . . , 7], in the following manner: We first identified values
of λ1 , λ2 ∈ Rλ such that the Lilliefors test [36] confirmed the Gaussian distribution for both TG,j ≜
√ p p
nβ̂W j / σ Σβjj , j ∈ [p] and TH,i ≜ δ̂W i / σ Σδii , i ∈ [n] (cf. TG , TH in Sec. ??) at the 1%
significance level, for at least 70% of the coordinates of β ∗ and δ ∗ . Out of these chosen values, we
determined the values λ1 , λ2 that minimized the average cross-validation error (CVE) over 10 folds. In
each fold, 90% of the n measurements (denoted by a sub-vector yr corresponding to sub-matrix Ar )
were used to obtain (β̂λ1 , δ̂λ2 ) via the robust Lasso, and the remaining 10% of the measurements
(denoted by a sub-vector ycv corresponding to sub-matrix Acv ) were used to estimate the CVE
∥ycv − Acv β̂λ1 − Icv δ̂λ2 ∥22 . Note that Icv is a sub-matrix of the identity matrix which samples only
some elements of y, δ̂. The CVE is chosen for parameter selection because it is a data-driven proxy
for the non-computable mean-squared error [57].
Evaluation Measures: In our simulations, a sample at index j in β̂W was declared to be defective if
the hypothesis test G0,j was rejected at 5% significance level, and was declared non-defective otherwise.
Results are reported in terms of sensitivity and specificity. The sensitivity (SE) is defined as (# true
defectives)/(# true defectives + # false non-defectives) and specificity (SP) is defined as (# true
non-defectives)/(# true non-defectives + # false defectives).
Sensitivity and Specificity of Drlt-D and Drlt-C for β ∗ : Here, we first examined the effec-
tiveness of Drlt and Drlt-C to detect the non-zero elements in β ∗ in the presence of permutations
in y. We compared the performance of Drlt, Drlt-D and Drlt-C to two other related algorithms
to enable performance calibration: (1) Robust Lasso (Rl) from (??) without debiasing; (2) A hy-
pothesis testing mechanism on a pooling matrix without model mismatch, which we refer to as Ub
(upper baseline). In Ub, we generated measurements with the correct matrix A and obtained a
debiased Lasso estimate as given by Eqn. 7 of [31]. In the case of Rl and Drlt-C, the decision
regarding whether a sample is defective or not was taken based on a threshold τss that was cho-
sen to maximize the SE+SP on a training set of signals from the same distribution. The tuning
was done separately for every choice of parameters fperm , fσ , fsp and n. We examined the varia-
tion in SE and SP with regard to change in the following parameters, keeping all other parameters
fixed: (EA) number of pools n; (EB) fsp for sparsity of β ∗ ; (EC) fσ for noise standard deviation;
and (ED) fraction fperm for number of permutations in A to generate Â. Note that the fractions
fperm , fσ , fsp are defined earlier in this section. For the measurements experiment (EA), n was varied
over {200, 150, . . . , 500} with fsp = 0.01, fperm = 0.01, fσ = 0.1. For the sparsity experiment (EB),
fsp was varied in {0.01, 0.02, . . . , 0.1} with n = 400, fperm = 0.01, fσ = 0.1. For the noise experiment
(EC), we varied fσ in {0, 0.05, . . . , 0.5} with n = 400, fsp = 0.01, fperm = 0.01. For the permutation
experiment (ED), fperm was varied in {0.01, 0.02, . . . , 0.1} with n = 400, fsp = 0.01, fσ = 0.1. SE and
SP values, averaged over 100 noise instances, for all four experiments are plotted in Fig. 4.1. The plots
demonstrate the superior performance of Drlt-C over Rl and Drlt, with Drlt coming second. In
all regimes, Ub performs the best as it assumes and uses an error-free sensing matrix. But we observe
that for large n, small fσ and small fsp , the SE and SP for Drlt-C is on par with that of Ub.
RRMSE Comparison of Drlt-D and Drlt-C to Baselines: We computed the relative root
mean squared error (RRMSE) for an estimate β̂ by the formula ∥β ∗ − β̂∥2 /∥β ∗ ∥2 . We compared
the RRMSE of Drlt, Rl and Drlt-C to that of the following algorithms for signal and sensing
matrix settings identical to those described earlier: (1) Lasso (referred to as L2) based on minimizing
∥y − Aβ∥22 + λ∥β∥1 which ignores the presence of MMEs, and (2) an outlier-resistant version of
Lasso (referred to as L1), based on minimizing ∥y − Aβ∥1 + λ∥β∥1 . Besides this, we also compared
our algorithms with L1 and L2 combined with the well-known Ransac (Random Sample Consensus)
framework [19], producing estimators Rl1 and Rl2. The performance in all experiments was measured
using average values RRMSE over reconstructions from 50 independent noise runs. For all algorithms,
the threshold to binarize the estimate of β ∗ was chosen to minimize the sum of SE and specificity over
a training set of signals from the same distribution. For all techniques, the regularization parameters
were chosen using cross-validation following the procedure in [57]. The maximum number of subsets
for finding the consensus set in Ransac was set to NS = 500 with 0.9n measurements in each subset.
RRMSE plots for various algorithms for SSM are presented in Fig. 4.2, where we see that Drlt-C and

32
Figure 4.1: Sensitivity and Specificity comparison for Drlt, Drlt-C, Robust Lasso and Ub for
experiments (EA) (top left), (EB) (bottom left), (EC) (bottom right), (ED) (top right). Note that
for (ED), Ub uses fperm = 0 always. See Sec. 4.1.2 for more details.

Figure 4.2: RRMSE comparison for Perm using Drlt-C, Drlt-D, L1 (L1 Lasso), L2 (L2 Lasso),
RL1 (L1 Lasso with Ransac), the RL2 (L2 Lasso with Ransac), and robust Lasso (Rl) w.r.t.
variation in the following parameters keeping others fixed: proportion of permutations fperm (topleft),
measurements n (top right), noise level fσ (bottom left) and sparsity fsp (bottom right). The fixed
parameters are dimension of p = 500, fσ = 0.1, fperm = 0.01, fsp = 0.01, n = 400.

33
Drlt-D significantly outperform all the other algorithms for all parameter ranges and that Drlt-C
produces lower RRMSE than Drlt-D, particularly in the regime involving higher fσ .
Multiple stages of Correction of Permutation Errors: We have noticed that after the first stage
of correction in Drlt-C, a small fraction of PEs remain uncorrected, and a small number of new PEs
are falsely created. Both these are caused due to small but inevitable Type-I and Type-II errors in the
hypothesis test TH . For this set of experiments, we take p = 500, n = 450, s = 5, fσ = 0.01 and r = 8
PEs. After the first stage of correction in Drlt-C, we execute the following three steps iteratively:
(i ) We re-estimate β ∗ and the permutation noise vector based on the corrected measurements using
Robust Lasso Rl. (ii ) We then perform debiasing of these estimates as shown in (3.14) and (3.15).
(iii ) Based on the new set of detected permuted measurements, we perform correction given by Alg. 4.
After each stage of correction, we report the average number of measurements correctly detected to
have PEs, the average number incorrectly detected to have PEs and the average actual number of PEs
(over 20 noise runs). These results are presented in Table ?? which shows that after the fifth stage, the
test only falsely detects a small number of permutations in the model and there are no permutations
left in the sixth stage.

4.2 Correction of Bitflip MME’s in Group Testing

All algorithms for Group testing or Compressed Sensing assume that A is known accurately. However
a technician may make errors while implementing the pooling procedure [3, 18, 24, 55]. That is, we
consider the case where due to errors in mixing of the samples, the pools are generated using an
unknown matrix Â (say) instead of the pre-specified matrix A. The elements of matrix Â and A
are equal everywhere except for the misspecified samples in each pool. We refer to these errors in
group membership specifications as ‘bit-flips’. For example, suppose that the ith pool is specified to
consist of samples j1 , j2 , j3 ∈ [p]. But due to errors during pool creation, the ith pool is generated
using samples j1 , j2 , j5 . In this specific instance, ai,j3 ̸= âi,j3 and ai,j5 ̸= âi,j5 .
Previously, we proposed a method for determining health status values that is resilient to a limited
number of bit-flips [6]. This method uses a ‘debiased’ version of the robust Lasso estimator, through
which we designed hypothesis tests to achieve two objectives: (i ) identifying the unhealthy subjects
by detecting non-zero entries in the health status vector, and (ii ) identifying rows in the design matrix
affected by Model Mismatch Errors (MMEs). In this paper, we present algorithms aimed at correcting
MMEs within pooled tests and subsequently reconstructing the signal vector based on the corrected
sensing matrix A.
To ensure comprehensive correction of MMEs, we run this algorithm iteratively, allowing us to
correct nearly all effective MMEs. Furthermore, we enhance the algorithm’s runtime by deriving an
exact solution for the optimization step to obtain the debiasing weight matrix. We provide theoretical
assurances showing that our method reduces reconstruction errors following MME correction, and we
also demonstrate a guaranteed reduction in bit-flips post-correction. Additionally, empirical results
show improved reconstruction outcomes after MME correction across three different bit-flip models.
Finally, we verify that applying multiple correction stages enables effective elimination of bit-flips
under specific conditions.

4.2.1 Bit-flip models

The hypothesis test for δ ∗ can even be used to correct for bit-flip errors. Before presenting algorithms
for correction, we define three intuitive models for bit-flips in the pooling matrix A, followed by a
model for permutation errors. Precise knowledge of these models is not necessary for detection of
bit-flips, but is needed for correction.

1. Single Switch Model (SSM): In this model, a bit-flipped pool (measurement as described in
Eqn.(5) of [6]) contains exactly one bit-flip at a randomly chosen index. Suppose that the ith
pool (measurement) contain a bit-flip. Under the SSM scheme, exactly one of the following
two can happen: (1) some jth sample that was intended to be in the pool (as defined in A) is
excluded, or (2) some jth sample that was not intended to be part of the pool (as defined in

34
A) is included. These two cases lead to the following changes in the ith row of Â, and in both
cases the choice of j ∈ [p] is random uniform: Case 1: Âij = −1 but Aij = 1, Case 2: Âij = 1
but Aij = −1.

2. Adjacent Switch Model (ASM): In ASM, a bit-flipped pool contains bit-flips at two adjacent
indices. Suppose the ith pool contains bit-flips. Then under the ASM scheme, either (1) the
jth sample that was not intended to be in the pool is included and the j ′ th sample where
j ′ ≜ mod(j + 1, p) that was intended to be in the pool is excluded, or (2) the jth sample
that is intended to be in the pool is excluded and the j ′ th sample where j ′ ≜ mod(j + 1, p)
that is not intended to be in the pool, is included at random. This leads to the following
changes in the ith row of Â, and in both cases the choice of j is uniform random: Case 1:
Âij ′ = −1, Âij = 1 and Aij ′ = 1, Aij = −1, Case 2: Âij ′ = 1, Âij = −1 and Aij ′ = −1, Aij = 1.

3. Random Switch Model (RSM): In RSM, a pool that contains bit-flips will necessarily contain two
bit-flips at random locations. Suppose the ith pool has bit-flips. Then under the RSM scheme,
for two distinct samples k ∈ [p] and l ∈ [p], either (1) the kth sample that is not intended to
be in the pool is mistakenly included and the lth sample that is intended to be in the pool
is mistakenly excluded, or (2) the kth sample that is intended to be in the pool is mistakenly
excluded and the lth sample that is not intended to be in the pool is mistakenly included. This
leads to the following changes in the ith row of Â, for l ̸= k ∈ [p], and in both cases the
choice of k, l is uniform random: Case 1: Âik = −1, Âil = 1 and Aik = 1, Ail = −1, Case 2:
Âik = 1, Âil = −1 and Aik = −1, Ail = 1. Note that ASM is a special case of RSM, with the
second index fixed to mod(j + 1, p) when the first index is j.

4.2.2 Correction Algorithm

We will now present the different correction algorithm for the three bit-flip models.

4.2.3 Correction Algorithm for different bit-flip model

Let J be the index set of all such measurements in y for which the hypothesis test of ODrlt
is rejected, i.e. J contains the indices of all measurements with MMEs. Upon obtaining J, we
can re-estimate β ∗ using the Lasso estimator β̂λ ≜ minβ 2n 1 2
′ ∥yJ c − AJ c β∥2 + λ∥β∥1 using only

n = |J | = n − |J| measurements, i.e. by discarding the measurements in J. However instead of

′ c

discarding the measurements in J, we provide algorithms to correct for errors in the corresponding
rows of matrix Â, again making use of the key principles of the ODrlt technique, as well as exploiting
the particular statistical model for mismatch.
We first provide an algorithm for the RSM for bit-flips – see Alg. 5. For RSM, we do the following:
In any row of A, we check for all pairs of unequal entries Ai,j1 and Ai,j2 with j1 ̸= j2 and swap their
values. Then we recompute δ̂W using Alg.3. We check whether the new estimate δ̂W i satisfies H0,i as
per the test described in [6]. If H0,i is rejected, we have been successful in identifying the bit-flip in
the ith row at locations j1 and j2 with probability 1 − α, where α is the level of significance of the
test. Otherwise, the sign of other entries of the ith row need to be swapped until the bit-flip is found.
Note that as per RSM, a given row can contain bit-flips in at exactly two entries. The procedure
for correction of bit-flips in ASM is quite similar to the one in Alg. 5. Here, instead of toggling Ai,j1
and Ai,j2 with j1 ̸= j2 in Alg. 5, we instead do as follows: If Ai,j ̸= Ai,j ′ , then swap the values of
Ai,j and Ai,j ′ where j ′ = mod(j + 1, p). The remaining steps are exactly as in Alg. 5. In SSM, for
every measurement i in J, we flip the sign of element Aij of A where j ∈ [p]. The rest of the steps
are exactly as in Alg. 5. The matrix thus obtained from the correction algorithm can then be used to
re-estimate β ∗ using Lasso.

4.2.4 Optimal solution for W

The issue with the algorithm in Alg.5 is the runtime as for every combination of j1 and j2 being
swapped, we have to recompute W from Alg.3 which is a costly optimisation method. In order to

35
Algorithm 5 Correction for bit-flips following the Random Switch Model (RSM) using W = copt A
Input: Measurement vector y, pooling matrix A, Lasso estimate β̂λ1 , λ and the set J of corrupted
measurements estimated by ODrlt method
Output: Bit-flip corrected matrix Ã
1: for every i ∈ J do
2: Set bf 1 := −1, bf 2 = −1 (bit-flip flag), max-p value= 0.01.
3: for every j ∈ [p − 1] do
4: for l ∈ [p] do
5: if {{Aij == −1}and{Ail == 1}}or{{Aij == 1}and{Ail == −1}} then
6: if Aij == 1 then
7: Aij = −1, Ail = 1.
8: else if Aij == −1 then
9: Aij = 1, Ail = −1.
10: end if
11: Find the solution β̂λ1 , δ̂λ2 to the convex program given in Eqn.(6) of [6].
12: Calculate the debiased Lasso estimate δ̂W given by Eqn.(15) using W = copt A of [6].
13: Set pval = 1 − Φ(TH,i = √[δ̂W ]i ), where Σδ is defined in Eqn.(28) of [6].
[Σδ ]ii
14: if pval ≥ max-p value then
15: set bf 1 := j, bf 2 := l, max-p value = pval .
16: end if
17: if bf 1 ! = −1 and bf 2 ! = −1 then {bit-flip not detected at Aij }
18: Aij = −Aij , Ail = −Ail {reverse induced bit-flip in Aij }
19: end if
20: end for
21: end for
22: end for
23: return Ã = A. =0

36
rectify this, we show in Thm.10 that an optimal solution of the optimisation problem p in Alg.3 is of the
form W = copt A for random Rademacher sensing matrix A, where copt = 1 − µ3 1 − n/p. This cuts
down on the runtime of the correction algorithm significantly and further allows us to run multi-stage
correction to correct all effective bit-flips.
q q
Theorem 10 Let A be a n × p Rademacher matrix, µ1 = 2 2 log(p) n , µ 2 = 2 log(2np)
np + n1 and
q
2 log(n)
µ3 = √ 2 p . Given the optimisation problem in Alg.3, if n < p, then one of the solution is
1−n/p
p
W = (1 − µ3 1 − n/p)A. ■.
The proof is given in the Appendix.

4.2.5 Reconstruction error of β post-correction (ongoing)

Recall that for the true bitflipped matrix Â, we have, Â = A + ∆A, where ∆A represents the error
matrix due to bit-flips. Note that, after correction we obtain the corrected matrix A.
e Let the corrected
matrix be such that
A
e = A + ∆1 A

, where ∆1 A is the error matrix representing the MME’s remaining post-correction. Hence, from
(3.3), we have,

y = Âβ ∗ + η = Aβ
e ∗ + (∆A − ∆1 A)β ∗ + η = Aβ
e ∗ + δe∗ + η, (4.2)

where δe∗ ≜ (∆A − ∆1 A)β ∗ is the post-correction bit-flip vector. Here, δe∗ is also a sparse vector
however, the sparsity level is a random quantity given by r̃ = ∥δe∗ ∥0 . Here also, we estimate β ∗ and
δe∗ using the Robust Lasso estimator given as follows:
!
βeλ1 1 2
= arg min y − Aβe − δ + λ1 ∥β∥1 + λ2 ∥δ∥ ,
1 (4.3)
δλ2
e β,δ 2n 2

where λ1 , λ2 are appropriately chosen regularization parameters.

In this subsection, we will attempt to derive an upper bound on the reconstruction error of βeλ1
and δeλ2 . However, in order to do that, we will first show that r̃ < r with high probability. Here r
is the sparsity level of the original MME vector δ ∗ . Next, we will need to show that the corrected
matrix A e satisfies the Extended Restricted Eigenvalue Condition(EREC) as defined in [6].
Let us define the following sets: R = {i ∈ [n] : δi∗ ̸= 0} and Re = {i ∈ [n] : δe∗ ̸= 0}. Recall that
i
J is the set of all indices detected to have bit-flips by the hypothesis test. Let k be the cardinality of
J. In order to show r̃ < r with high probability, we will show that R e ⊂ R with high probability as
Re ⊂ R implies r̃ < r. Let us first define the power of the test. We denote the power of the test as
γn,p . Hence, we have,

γn,p = P (rejecting a false null hypothesis) = P (|TH,i | > zα/2 | δi∗ ̸= 0)

= 1 − P −zα/2 ≤ TH,i ≤ zα/2 | δi∗ ̸= 0

!
∗
δi∗ δdW,i − δi δi∗ ∗
= 1 − P −zα/2 − p ≤ p ≤ zα/2 − p δi ̸= 0
σ Σδii σ Σδii σ Σδii
( ! !)
δi∗ δi∗
= 1 − Φ zα/2 − p − Φ −zα/2 − p
σ Σδii σ Σδii
√ n ∗ √n δi∗
    
 δi 
p 1−n/p p 1−n/p
= 1 − Φ zα/2 −  − Φ −zα/2 −  (4.4)
√n √n
p p
 σ Σδii σ Σδii 
p 1−n/p p 1−n/p

Note that, from result 2 of Theorem 2 of [6], √ n

p
σ Σδii → σ as n, p → ∞. Furthermore, zα/2
p 1−n/p
is a constant for a given α. In order to ensure that the power γn,p goes to 1 as n, p → ∞, we need

37
n n
√ |δ ∗ | √ |δ ∗ |
p 1−n/p i p 1−n/p i
n
√
→ ∞ as n, p → ∞. This implies we need, σ → ∞ as n, p → ∞. Hence, under
√ σ Σδii
p 1−n/p
√
∗ p 1−n/p
the condition min|δi | = ω n σ , we have that,
i∈R

√n δ∗ √n δ∗
     
p 1−n/p i p 1−n/p i
 
lim γn,p = lim 1 − Φ zα/2 −  − Φ −zα/2 −  
√n √n
p p
n,p→∞ n,p→∞  σ Σδii σ Σδii 
p 1−n/p p 1−n/p
= 1 − {Φ(−∞) − Φ(−∞)} = 1. (4.5)

Hence, we have that power of the test for δ ∗ goes to 1 as n, p → ∞ under the condition min|δi∗ | =
√ i∈R
p 1−n/p
ω n σ .
Now, recall that, by construction, δ ∗ = ∆Aβ ∗ . Here, ∆A being the error matrix is sparse. In fact,
base on our assumption that there can be at-most one pair of bit-flips per row, min|δi∗ | = O(min|βj∗ |).
i∈R j∈S
Hence, the assumption required for the power to go to 1 is
p !
∗ p 1 − n/p
min|βj | = ω σ .
j∈S n

We will now evaluate the probability that the set of indices with effective bit-flips post-correction
is a subset of the set of indices with effective bit-flips initially. This event is represented by R̃ ⊆ R.
To find the probability of R̃ ⊆ R, we condition it on the event that all the effective bit-flips were
detected in the detection stage by the marginal tests i.e., we condition R̃ ⊆ R on R ⊆ J. Hence,
using theorem of total probability, we have,

P ( R̃ ⊆ R) ≥ P ( R̃ ⊆ R|R ⊆ J)P (R ⊆ J) (4.6)

We will now evaluate both the probabilities on the R.H.S. of (4.6) separately. Note that, {R ⊆ J}
implies that for all the indices that belong to R, the test statistic |TH,i | is rejected. Hence, the event
{R ⊆ J} is equivalent to the event {R∩Jc = ∅} which implies that none of the elements of R belongsc
to J . This implies that the event {R ∩ J = ∅} is equivalent to
c c ∗

∪ |TH,i | ≤ zα/2 |δi ̸= 0 =
i∈R
∩ |TH,i | > zα/2 |δi∗ ̸= 0 . Hence, we have,

i∈R

∗ ∗
P (R ⊆ J) = P ∩ |TH,i | > zα/2 |δi =

̸ 0 = 1 − P ∪ |TH,i | ≤ zα/2 |δi =
̸ 0
i∈R i∈R
X
P |TH,i | ≤ zα/2 |δi∗ ̸= 0

≥ 1−
i∈R
= 1 − r(1 − γn,p ). (4.7)

Here, γn,p is the power of the test for a given n, p. Now, we evaluate P ( R̃ ⊆ R|R ⊆ J). Now, using
Lemma 13, we have,
P ( R̃ ⊆ R|R ⊆ J) = P ( R̃ ∩ Rc ∩J = ∅) (4.8)
We will now define a few notations. Let us recall the in the correction algorithm Alg.5, for each i ∈ J,
for all j1 ∈ [p−1], j2 = j1 +1, . . . , p , we swap the elements the elements aij1 with aij2 . Let us denote the
new bit-flip error at this location as δi∗ (j1 , j2 ). Then we perform Lasso followed by debiasing to obtain
the debiased lasso estimate and its corresponding test statistic denoted by TH,i (j1 , j2 ). Note that, here
we assume that a wrong swap does create a non-zero bit-flip error δi∗ (j1 , j2 ). Lastly let us define the
set of tuples K = {(j1 , j2 ), j1 ∈ [p − 1], j2 = {j1 + 1, . . . , p} : {aij1 ̸= aij2 } ∩ {{βj∗1 ̸= 0} ∪ {βj∗2 ̸= 0}}}.
Note that, the event { R̃ ∩ Rc ∩J = ∅} implies that none of the elements of J\ R are in R̃. Hence,
the event { R̃ ∩ Rc ∩J = ∅} implies that for all i ∈ J \ R, the hypothesis test w.r.t. TH,i (j1 , j2 ) is

38
rejected for all j1 , j2 . Hence, we have,

∗
P ( R̃ ∩ R ∩J = ∅) = P
c

∩ ∩ |TH,i (j1 , j2 )| > zα/2 |δi (j1 , j2 ) ̸= 0
i∈J\R (j1 ,j2 )∈K

∗

= 1−P ∪ ∪ |TH,i (j1 , j2 )| ≤ zα/2 |δi (j1 , j2 ) ̸= 0
i∈J\R (j1 ,j2 )∈K
X X
P ( |TH,i (j1 , j2 )| ≤ zα/2 |δi∗ (j1 , j2 ) ̸= 0 )

≥ 1−
i∈J\R (j1 ,j2 )∈K
X X
= 1− (1 − γn,p )
i∈J\R (j1 ,j2 )∈K
(p − 1)(p − 2)
≥ 1− (k − r)(1 − γn,p ) (4.9)
2
Joining (4.6), (4.7) and (4.9), we get,

(p − 1)(p − 2)
P ( R̃ ⊆ R) ≥ {1 − r(1 − γn,p )} 1 − (k − r)(1 − γn,p ) . (4.10)
2

Under the condition ∥β ∗ ∥∞ = ω np σ , we have γn,p → 1 as n, p → ∞. Since, γn,p → 1 at an

exponential rate, 1 − r(1 − γn,p ) → 1 and 1 − (p−1)(p−2)

2 (k − r)(1 − γn,p ) → 1 as n, p → ∞. Hence,
p
from (4.10), under the condition ∥β ∥∞ = ω n σ , we have P ( R̃ ⊆ R|R ⊆ J) → 1 as n, p → ∞.
∗

4.2.6 Experimental Results

Multiple stages of Correction of RSM: We have noticed that after the first stage of correction
in Drlt-C, a small fraction of RSM errors remain uncorrected, and a small number of new RSM
errors are falsely created. Both these are caused due to small but inevitable Type-I and Type-II errors
in the hypothesis test TH . For this set of experiments, we take p = 500, n = 450, s = 5, fσ = 0.01
and r = 8 RSM errors. After the first stage of correction in Drlt-C, we execute the following three
steps iteratively: (i ) We re-estimate β ∗ and the permutation noise vector based on the corrected
measurements using Robust Lasso Rl. (ii ) We then perform debiasing of these estimates as shown
in Eqn. (14) and (15) of [6]. (iii ) Based on the new set of detected permuted measurements, we
perform correction given by Alg. 5. After each stage of correction, we report the average number
of measurements correctly detected to have RSM errors, the average number incorrectly detected to
have RSM errors and the average actual number of RSM errors (over 20 noise runs). These results
are presented in Table ?? which shows that after the fifth stage, the test only falsely detects a small
number of permutations in the model and there are no permutations left in the sixth stage.

RSM correction for different β and η (Table 4.1-4.16)

In this subsection of experimental results, we run the correction algorithm for RSM for three different β
and for each β we run the correction for three separate runs i.e., different η. For each β, η combination,
we provide two separate tables of results. In the first table, we provide the set of true bit-flips (B), set
of detected bitf-flips (J) and set of altered bit-flips ( C), after each stage of correction in Drlt-C for
RSM errors. In the second table, we provide pre correction and post correction RRMSE, Sensitivity
and Specificity, after each stage of correction in Drlt-C for RSM errors. The fixed set of parameters
are p = 500, n = 400, fσ = 0.01, s = 10. Note that, the true number of bit-flips varies as the RSM
errors are induced in an non-adversial manner with fer = 0.2.

RSM correction for varying fσ and s (Table 4.17-4.28)

In this subsection of experimental results, we run the correction algorithm for RSM for three different
values of fσ = 0.01, 0.03, 0.05. We also run the correction algorithm for three different values of
s = 5, 10, 20. For each fσ and s, we provide two separate tables of results. In the first table, we

39
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (2,67,72,219,361,392) (2,13,44,67,72,145,219,276,361) (2,67,145,219,361)
2 (72,145,392) (13,72,145,276,392) (72,392)
3 (145) (13,145) (145)
4 ϕ (13) ϕ

Table 4.1: First β, first run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered
bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p =
500, n = 400, fσ = 0.01, s = 10, r = 6

# RRMSE Sens Spec p-value simul

Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0517 0.0422 0.83 1 0.987 0.992 1e-8
2 0.0422 0.0389 1 1 0.992 0.995 0.000044
3 0.0389 0.0378 1 1 0.995 0.997 0.0041
4 0.0389 0.0378 1 1 0.995 0.997 0.0271

Table 4.2: fσ = 0.01: Pre correction and post correction RRMSE, Sensitivity and Specificity and
p-value for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6

# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)

1 (2,67,72,219,361,392) (2,67,89,192,219,222,307,392) (2,67,219,392)
2 (72,361) (72,89,192,222) (72,192)
3 (192,361) (89,192,222,361) (192,361)
4 ϕ (89,222) ϕ

Table 4.3: First β, second run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 6

# RRMSE Sens Sens p-value simul

Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0567 0.0485 0.67 0.83 0.987 0.995 1e-6
2 0.0485 0.0452 0.83 1 0.995 0.995 0.00057
3 0.0452 0.0429 1 1 0.995 0.995 0.0022
4 0.0452 0.0429 1 1 0.995 0.995 0.0451

Table 4.4: First β, second run: Pre correction and post correction RRMSE, Sensitivity and Specificity,
after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n = 400, fσ =
0.01, s = 10, r = 6

# Effective bitflips(B) effective bitflips detected J bitflips altered ( C)

1 (2,67,72,219,361,392) (2,59,67,72,108,119,219,255,292,316,361,377,392) (2,67,72,119,219,361,377,392)
2 (119,392) (59,108,119,292,316,392) (119,292,392)
3 (292) (59,108,292,316) (292)
4 ϕ (59,108) ϕ

Table 4.5: First β, third run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 6

40
# RRMSE Sens Sens p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0615 0.0479 1 1 0.982 0.989 1e-14
2 0.0479 0.0477 1 1 0.989 0.995 1e-9
3 0.0477 0.0416 1 1 0.995 0.995 1e-7
4 0.0477 0.0416 1 1 0.995 0.995 0.00542

Table 4.6: First β, third run: Pre correction and post correction RRMSE, Sensitivity and Specificity,
after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n = 400, fσ =
0.01, s = 10, r = 6
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (96,117,272,329,346) (23,96,117,156,213,272,329,336,346) (96,117,213,329,346)
2 (213,272) (23,96,213,276) (213,272)
3 ϕ (23) ϕ

Table 4.7: Second β, first run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 5
# RRMSE Sens Sens p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0447 0.0372 0.8 1 0.992 0.995 1e-7
2 0.0372 0.0346 1 1 0.995 0.998 0.00098
3 0.0372 0.0346 1 1 0.995 0.998 0.00928

Table 4.8: Second β, first run: Pre correction and post correction RRMSE, Sensitivity and Specificity,
after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n = 400, fσ =
0.01, s = 10, r = 5
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (96,117,272,329,346) (55 ,72,96,142,202,291,329,346) (96,202,329,346)
2 (117,202,272) (55,117,202,272,291) (117,202)
3 (272) (55,272) (272)
4 ϕ (55) ϕ

Table 4.9: Second β, second run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 5
# RRMSE Sens Sens p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0552 0.0513 0.6 0.8 0.992 0.995 1e-6
2 0.0513 0.0474 0.8 1 0.995 0.997 0.0021
3 0.0474 0.0436 1 1 0.997 0.998 0.0362

Table 4.10: Second β, second run: Pre correction and post correction RRMSE, Sensitivity and Speci-
ficity, after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 5
# Effective bitflips(B) effective bitflips detected J bitflips altered ( C)
1 (96,117,272,329,346) (82,96,117,172,209,252,329,346,397) (96,117,329,346)
2 (272) (82,172,209,272) (272)
3 ϕ (82,209) ϕ

Table 4.11: Second β, third run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 5

41
# RRMSE Sens Sens p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0602 0.0452 0.8 1 0.992 0.995 1e-13
2 0.0452 0.0427 1 1 0.995 0.997 1e-8
3 0.0452 0.0427 1 1 0.995 0.997 0.00044

Table 4.12: Second β, third run: Pre correction and post correction RRMSE, Sensitivity and Speci-
ficity, after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 5

# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)

1 (23,44,166,192,245,289,356) (23,44,98,166,177,192,245,289,332,396) (23,44,166,192,245,332)
2 (289,332,356) (98,177,289,332,356) (98,289,356)
3 (98,332) (98,177,332) (98,332)
4 ϕ ϕ ϕ

Table 4.13: Third β, first run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 7

# RRMSE Sens Sens p-value sim

Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0692 0.0577 0.713 0.8571 0.989 0.995 1e-12
2 0.0577 0.0512 0.8571 1 0.995 0.998 1e-7
3 0.0512 0.0478 1 1 0.997 1 0.00034

Table 4.14: Third β, first run: Pre correction and post correction RRMSE, Sensitivity and Specificity,
after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n = 400, fσ =
0.01, s = 10, r = 7

# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)

1 (23,44,166,192,245,289,356) (44,67,98,166,192,203,245,289,317,334,396) (44,166,192,245,289,334)
2 (23,334,356) (23,67,98,334,356) (23,334,356)
3 ϕ (67,98) ϕ

Table 4.15: Third β, second run: Set of true bit-flips (B), set of detected bitf-flips (J) and set of
altered bit-flips ( C), after each stage of correction in Drlt-C for RSM errors. The parameters are
p = 500, n = 400, fσ = 0.01, s = 10, r = 7

# RRMSE Sens Sens p-value simul

Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0688 0.0492 0.857 1 0.992 0.995 1e-11
2 0.0492 0.0441 1 1 0.995 0.997 1e-7
3 0.0492 0.0441 1 1 0.995 0.997 0.00093

Table 4.16: Third β, second run: Pre correction and post correction RRMSE, Sensitivity and Speci-
ficity, after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, s = 10, r = 7

42
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (2,67,72,219,361,392) (2,13,44,67,72,145,219,276,361) (2,67,145,219,361)
2 (72,145,392) (13,72,145,276,392) (72,392)
3 (145) (13,145) (145)
4 ϕ (13) ϕ

Table 4.17: fσ = 0.01: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, s = 10, r = 6

# RRMSE Sens Spec p-value simul

Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0517 0.0422 0.83 1 0.987 0.992 1e-8
2 0.0422 0.0389 1 1 0.992 0.995 0.00055
3 0.0389 0.0378 1 1 0.995 0.997 0.0072
4 0.0389 0.0378 1 1 0.995 0.997 0.0277

Table 4.18: fσ = 0.01: Pre correction and post correction RRMSE, Sensitivity and Specificity and
p-value for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6

provide the set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips ( C), after
each stage of correction in Drlt-C for RSM errors. In the second table, we provide pre correction and
post correction RRMSE, Sensitivity and Specificity and the p-value for the simultaneous-test given
in (3.42), after each stage of correction in Drlt-C for RSM errors. The fixed set of parameters are
p = 500, n = 400, fσ = 0.01, s = 10. Note that, the true number of bit-flips varies as the RSM errors
are induced in an non-adversial manner with fer = 0.2.

Proportion of bit-flips correctly altered by RSM correction algorithm (Table 4.29)

In this set of experiments, we present the proportion of times each bit-flip has been correctly altered
in the first stage of the RSM correction algorithm. The fixed parameters of the experiments are
p = 500, n = 400, s = 10, r = 6. We vary fσ = 0.01, 0.03, 0.05. The proportions are taken over 10
different instances of η for each fσ . In Table 4.29, the first column represent the indices that have
true effective bit-flips. Columns 2,3,4 represent the proportion of times that particular bitflip has been
correctly altered by the RSM correction algorithm for fσ = 0.01, fσ = 0.03 and fσ = 0.05 respectively.
The objective of this experiment is to observe how the correction of bit-flips worsen with increase in
noise variance.

# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)

1 (2,67,72,219,361,392) (2,22,52,67,89,194,198,219.232,271,302,311,361,387) (2,67,89,219,311)
2 (72,89,311,361,392) (22,52,72,89,198,271,302,311,387,392) (72,89,198,311)
3 (198,361,392) (22,198,271,311,361,392) (198,271,392)
4 (271,361) (22,271,311,361) (271)
5 (361) (22,361) (361)
6 ϕ (22) ϕ

Table 4.19: fσ = 0.03: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, s = 10, r = 6

43
# RRMSE Sens Spec p-value simul
Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0822 0.0757 0.67 0.8 0.974 0.989 1e-16
2 0.0757 0.0661 0.8 1 0.989 0.992 1e-11
3 0.0661 0.0632 1 1 0.992 0.995 1e-7
4 0.0632 0.0617 1 1 0.995 0.997 0.00049
5 0.0617 0.0611 1 1 0.997 0.998 0.00225
6 0.0617 0.0611 1 1 0.997 0.998 0.0319

Table 4.20: fσ = 0.03: Pre correction and post correction RRMSE, Sensitivity and Specificity and
p-value for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6

# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)

1 (2,67,72,219,361,392) (2,67,219,361,{19 other indices}) (2,67,142,219,293)
2 (72,142,293,361,392) (42,67,72,141,142,193,225,259,283,317,366,392,397) (72,142,193,317,392)
3 (193,293,317,361) (42,67,225,259,293,317,397) (293,317,397)
4 (193,361,397) (42,225,361,397) (361,397)
5 (193,361) (42,193,225,361) (193)
6 (361) (42,225,361) ϕ

Table 4.21: fσ = 0.05: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, s = 10, r = 6

# RRMSE Sens Spec p-value simul

Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.1033 0.0898 0.67 0.6 0.951 0.974 1e-16
2 0.0898 0.0852 0.6 0.8 0.974 0.989 1e-12
3 0.0852 0.0831 0.8 0.75 0.989 0.992 1e-9
4 0.0801 0.0774 0.75 1 0.992 0.995 1e-6
5 0.0774 0.0752 1 1 0.995 0.997 0.000252
6 0.0752 0.0737 1 1 0.997 0.997 0.00121

Table 4.22: fσ = 0.05: Pre correction and post correction RRMSE, Sensitivity and Specificity and
p-value for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM
errors. The parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 6

# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)

1 (4,255) (4,117,255,289,315) (4,255,315)
2 (315) (117,315) (315)
3 ϕ ϕ ϕ

Table 4.23: s = 5: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, r = 2

# RRMSE Sens Spec p-value simul

Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0402 0.0377 1 1 0.997 0.998 0.00062
2 0.0377 0.0356 1 1 0.998 1 0.0071
3 0.0377 0.0356 1 1 1 1 0.0515

Table 4.24: s = 5: Pre correction and post correction RRMSE, Sensitivity and Specificity and p-value
for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM errors. The
parameters are p = 500, n = 400, fσ = 0.01, r = 2

44
# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)
1 (143,184,239,278,282) (69,127,143,177,239,275,278,282,299,375) (69,143,239,278,282)
2 (69,184) (69,177,184,299,375) (69,177,184)
3 (177) (177) (177)
4 ϕ ϕ ϕ

Table 4.25: s = 10: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, r = 5

# RRMSE Sens Spec p-value simul

Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.0598 0.0407 0.8 1 0.987 0.992 1e-6
2 0.0407 0.0392 1 1 0.992 1 0.00091
3 0.0392 0.0379 1 1 1 1 0.0253

Table 4.26: s = 10: Pre correction and post correction RRMSE, Sensitivity and Specificity and p-value
for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM errors. The
parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 5

# True effective bit-flips(B) effective bitflips detected J bitflips altered ( C)

1 (160,186,255,261,285,334,368) (63,109,124,178,186,220,222,261,325,334,346,368) (124,186,261,325,334,36
2 (124,160,255,325) (63,82,109,124,220,222,255,325) (82,124,222,255,325)
3 (82,160,222) (63,82,109,222) (82,222)
4 (160) (63,160) (160)
5 ϕ (63) ϕ

Table 4.27: s = 15: Set of true bit-flips (B), set of detected bitf-flips (J) and set of altered bit-flips
( C), after each stage of correction in Drlt-C for RSM errors. The parameters are p = 500, n =
400, fσ = 0.01, r = 7

# RRMSE Sens Spec p-value simul

Pre-corr Post-corr Pre-corr Post-corr Pre-corr Post-corr
1 0.119 0.0952 0.714 0.75 0.979 0.987 1e-16
2 0.0952 0.0923 0.75 0.67 0.987 0.992 1e-12
3 0.0923 0.0861 0.67 1 0.992 0.995 1e-7
4 0.0861 0.0848 1 1 0.995 0.998 0.00078
5 0.0861 0.0848 1 1 0.995 0.998 0.0045

Table 4.28: s = 15: Pre correction and post correction RRMSE, Sensitivity and Specificity and p-value
for simultaneous tests as given in (3.42) after each stage of correction in Drlt-C for RSM errors. The
parameters are p = 500, n = 400, fσ = 0.01, s = 10, r = 7

Indices of true effective bit-flips fσ = 0.01 fσ = 0.03 fσ = 0.05

2 1(1) 0.9 (1) 0.7 (0.9)
67 1(1) 1 (1) 0.9 (1)
72 0.8 (0.8) 0.6 (0.8) 0.2 (0.5)
219 0.6 (0.9) 0.5 (0.7) 0.5 (0.6)
361 0.9 (1) 0.8 (0.8) 0.5 (0.7)
392 0.4 (0.5) 0.1 (0.3) 0 (0.1)

Table 4.29: Effective Bit-flips correctly altered: first column represent the indices that have true
effective bit-flips. Columns 2,3,4 represent the proportion of times that particular bitflip has been
correctly altered by the RSM correction algorithm for fσ = 0.01, fσ = 0.03 and fσ = 0.05 respectively.
The fixed parameters of the experiments are p = 500, n = 400, s = 10, r = 6. In brackets are the
proportion of the number of times that bit-flip was detected in the first stage of RSM correction.

45
Chapter 5

Conclusion

In this report, we target three objectives in three different chapters connected to the identification
and correction of Group Membership Specification Errors (MME’s) in Group Testing.
In Chapter 2, we reformulate the optimization problem to obtain M (the approximate inverse of
the covariance matrix of the rows of the sensing matrix A) in [31] and further provide an exact, closed-
form optimal solution to the reformulated problem under assumptions on the coherence of A. For
sensing matrices with i.i.d. zero-mean sub-Gaussian rows that have diagonal covariance, the debiased
Lasso estimator, based on this closed-form solution, has entries that are asymptotically zero-mean
and Gaussian. The exact solution significantly improves the time efficiency for debiasing the Lasso
estimator, as shown in the numerical results. Our method is particularly useful for debiasing in
streaming settings where new measurements arrive on the fly.
In Chapter 3, we have presented a technique for determining the sparse vector β ∗ of health status
values from noisy pooled measurements in y, with the additional feature that our technique is designed
to handle bit-flip errors in the pooling matrix. These bit-flip errors can occur at a small number of
unknown locations, due to which the pre-specified matrix A (known) and the actual pooling matrix
Â (unknown) via which pooled measurements are acquired, differ from each other. We use the theory
of Lasso debiasing as our basic scaffolding to identify the defective samples in β ∗ , but with extensive
and non-trivial theoretical and algorithmic innovations to (i ) make the debiasing robust to model
mismatch errors (MMEs), and also to (ii ) enable identification of the pooled measurements that were
affected by the MMEs. Our approach is also validated by an extensive set of simulation results, where
the proposed method outperforms intuitive baseline techniques. To our best knowledge, there is no
prior literature on using Lasso debiasing to identify measurements with MMEs.
In Chapter 4, we present algorithms to correct Permutation noise as well as three different types
of Bit-flip error in Group Testing. We provide rigorous empirical results supporting the algorithms
and its capability of correcting the MME’s. We also provide a closed form solution to the optimisation
mechanism to obtain W in Drlt. The theoretical guarantees for the given correction algorithm much
stringent conditions to hold. Therefore, we want to tweak the correction algorithms in such a way so as
to not only hold the capability of the existing algorithm but also to ensure certain sensible theoretical
guarantees. This is the work that remains to be done before the Pre-synopsis Seminar.

46
Chapter 6

Appendix

6.1 Proofs of Lemmas and Theorems of Chapter 2

6.1.1 Proof of Theorem 1
ν
Primal feasibility: From the assumption L+ν ≤ µ < 1, we have that, Lµ + µν ≥ ν which implies
(1 − µ)ν/L ≤ µ. The choice of w.j given by (2.7) is primal feasible since
1 ⊤ (1 − µ)
A ∥a ∥2 a.j − ej ≤ max{µ, |(1 − µ)ν/L|} = µ. (6.1)
n .j 2
n ∞
This inequality uses the definition of L. For the index j, the LHS is upper bounded by µ, otherwise
it is upper bounded by ν(1 − µ)/L.

1 2
Primal objective function value: The primal objective function value is given by n ∥w .j ∥2 =
(1 − µ)2 2 /n = (1 − µ) .
2
∥a ∥
.j 2
(∥a.j ∥22 /n)2 ∥a.j ∥22 /n

The Fenchel dual problem: Consider an optimization problem of the form for a fixed j ∈ [p]:

1 ⊤
inf f (w) + gj A w (6.2)
w n
where f and gj are extended real-valued convex functions. The Fenchel dual (see Chapter 3 of [8]) is

∗ 1
sup −f Au − gj∗ (−u) (6.3)
u n
where f ∗ and gj∗ are the convex conjugates of f and gj respectively. The Fenchel dual satisfies weak
duality (see Chapter 3 of [8]), i.e., for any w and u,

1 ⊤ ∗ 1
f (w) + gj A w ≥ −f Au − gj∗ (−u).
n n
In our setting, for a fixed j, we consider
(
1 0 if ∥w − ej ∥∞ ≤ µ
f (w) := ∥w∥2 and gj (w) := . (6.4)
n ∞ otherwise
Then, for the same j, we have their convex conjugates from Lemma 5:
n
f ∗ (u) = sup u⊤ w − f (w) = ∥u∥2 , (6.5)
w 4
∗ ⊤
gj (u) = sup u w − gj (w) = sup u⊤ w = uj + µ∥u∥1 . (6.6)
w ∥w−ej ∥∞ ≤µ
1 ⊤ ⊤
This gives a dual problem in the form supu − 4n u A Au + uj − µ∥u∥1 .
2(1 − µ)ej
The point u = is feasible for the dual (trivially).
∥a.j ∥22 /n

47
2(1 − µ)ej
Dual objective function value: Plugging in u = , the corresponding dual objective
∥a.j ∥22 /n
function value is
1 ⊤ ⊤ 1 4(1 − µ)2 2(1 − µ) 2(1 − µ)
− u A Au + uj − µ∥u∥1 = − ∥a.j ∥2 + −µ
4n 4n (∥a.j ∥22 /n)2 ∥a.j ∥22 /n ∥a.j ∥22 /n
(1 − µ)2 (1 − µ)2 (1 − µ)2
=− 2 +2 2 = .
∥a.j ∥2 /n ∥a.j ∥2 /n ∥a.j ∥22 /n
Since the primal solution and the dual objective function values are equal, it follows that an optimal
(1 − µ) 2(1 − µ)
solution for the primal is 2 a.j , and that an optimal solution to the dual is ej . This
∥a.j ∥2 /n ∥a.j ∥22 /n
completes the proof.

6.1.2 Proof of Theorem 2

2
4Cmax κ4 ν
In Theorem 3 we show that if n ≥ log p then L+ν
2 (1−c)2
Cmin
≤ Lν ≤ cCνmin with probability at least
√ q
1 − p2 . Now, Lemma 1 shows that ν ≤ 2 2Cmax κ2 logn p with probability at least 1 − p12 . Hence, by
union bound, we have,
r !
√ κ2 Cmax log p

ν ν 2 1
P ≤ ≤2 2 ≥1− + (6.7)
ν+L cCmin c Cmin n p p2
This completes the proof.

6.1.3 Proof of Theorem 3

∥a ∥2
We have for all j ∈ [p], n.j 2 = n1 ni=1 a2ij . Since the ai. (for i ∈ [n]) are sub-Gaussian, the aij are
P
sub-Gaussian for each j ∈ [p] and ∥aij ∥ψ2 ≤ ∥ai. ∥ψ2 . By the definition of the sub-Gaussian norm (see
footnote in Sec. 2.1.1 with q = 2), we know that
1
E[a2ij ] ≤ ∥aij ∥2ψ2 = ∥e⊤ 2 2
j ai. ∥ψ2 ≤ ∥ai. ∥ψ2 (6.8)
2
.
Recall that κ := ∥Σ−1/2 ai. ∥ψ2 in property D1 of sensing matrix A. We have

∥ai. ∥ψ2 = sup (Σ1/2 v)⊤ Σ−1/2 ai.

v∈S p−1 ψ2

1
= sup ∥Σ1/2 v∥2 1/2
(Σ1/2 v)⊤ Σ−1/2 ai.
v∈S p−1 ∥Σ v∥2 ψ2

1
≤ sup ∥Σ1/2 v∥2 sup 1/2
(Σ1/2 z)⊤ Σ−1/2 ai.
v∈S p−1 z∈S p−1 ∥Σ z∥2 ψ2
1/2 −1/2
≤ σmax (Σ )∥Σ ai. ∥ψ2
p
≤ Cmax κ, (6.9)
where Cmax is defined in property D2. Therefore, we obtain E[a2ij ] ≤ 2∥ai. ∥2ψ2 ≤ 2Cmax κ2 . From
the definition of eigenvalues, for any x ∈ Rp , x⊤ Σx ≥ σmin (Σ)∥x∥22 ≥ Cmin ∥x∥22 . Putting x = ej ,
where ej is the j th column of Ip , we have, Σjj ≥ Cmin . Since E[a2ij ] = Σjj ≥ Cmin , we have,
h P i
E n1 ni=1 a2ij ≥ Cmin .
For a given j ∈ [p], the variables a2ij are independent for all i ∈ [n]. Hence, using the concentration
inequality of Theorem 3.1.1 and Equation (3.3) of [52], we have for t > 01 ,
2
− 2nt 4
∥a.j ∥22 /n E[∥a.j ∥22 /n]

P − ≥ t ≤ 2e 2C max κ . (6.10)
1
√
We have set c = 1/2, δ := t and K := 2 Cmax κ in Equation (3.3) and the equation immediately preceding it in [52]

48
Using the left-sided inequality of (6.10), we have,
2
− 2nt 4
P ∥a.j ∥22 /n ≤ E[∥a.j ∥22 /n] − t ≤ 2e 2Cmax

κ . (6.11)

Using E[∥a.j ∥22 /n] ≥ Cmin , (6.11) can be rewritten as follows for t > 0:
nt2
−
∥a.j ∥22 /n
2
P ≤ Cmin − t ≤ 2e 2Cmax κ4 . (6.12)

Using the union bound on (6.12) over j ∈ [p], we obtain the following lower tail bound on L:
nt2
− 2
P (L ≤ Cmin − t) ≤ 2pe 2Cmax κ4 . (6.13)
q
log p
Putting t := 2Cmax κ2 n in (6.13), we obtain:
r !!
Cmax 2 log p 2
P L ≤ Cmin 1−2 κ ≤ . (6.14)
Cmin n p
2
4Cmax κ4
For some constant c ∈ (0, 1), if n ≥ 2 (1−c)2
Cmin
log p, then (6.14) becomes:

2 2
P (L ≤ c Cmin ) ≤ =⇒ P (L ≥ c Cmin ) ≥ 1 − . (6.15)
p p
This completes the proof.

6.1.4 Proof of Lemma 1

1 Pn
We have n1 |a⊤ .l a.j | = n i=1 aij ail . Here, for given j ̸= l, we know that aij and ail are independent
zero-mean sub-Gaussian
√ random variables. From from (6.8) and (6.9) we know that their sub-Gaussian
norm is at most Cmax κ for all i ∈ [n]. Using Lemma 2.7.7 of [53], we have that for all i ∈ [n], aij ail are
independent sub-Exponential random variables with sub-exponential norm at most Cmax κ2 . Moreover,
E[aij ail ] = Σjl = 0 by property D3. Hence, using Bernstien’s inequality for averages of independent
zero-mean, sub-exponential random variables, given in Corollary 2.8.3 of [53], we have for any t > 0,
n
!
2
1X − 2nt 4
P aij ail ≥ t ≤ 2e 2Cmax κ (6.16)
n
i=1

Hence, using the symmetry of the inner product and a union bound, we have
2
1 ⊤ 1 ⊤ p − 2C 2nt κ4
P max |a.l a.j | ≥ t = P max |a.l a.j | ≥ t ≤ 2 e max (6.17)
l̸=j n l<j n 2
√ q
Taking t = 2 2Cmax κ2 logn p , we have,
r !
√ 2 log p 1
P ν(A) ≥ 2 2Cmax κ ≤ . (6.18)
n p2

This completes the proof.

6.1.5 Convex conjugates

The convex conjugate of a function f (w) is defined as:

f ∗ (u) = sup u⊤ w − f (w) .

(6.19)
w

The following result gives the convex conjugates of the functions needed in the proof of Theorem 1.

49
Lemma 5 1. If f (w) = n1 ∥w∥22 , then its convex conjugate is f ∗ (u) = n4 ∥u∥22 .

2. If gj is the indicator function of the convex set {w ∈ Rp | ∥w − ej ∥∞ ≤ µ}, i.e.,

(
0 if ∥w − ej ∥∞ ≤ µ
gj (w) =
∞ otherwise,

then its convex conjugate is gj∗ (u) = uj + µ∥u∥1 .

Proof 1 1. We can write f (w) = 21 w⊤ Qw where Q := n2 Ip is positive definite (and has size p×p).
From Example 3.2.2 of [9], the convex conjugate of a positive definite quadratic form is
−1
1 1 2 n
∗
f (u) = u⊤ Q−1 u = u⊤ Ip u= ∥u∥22 .
2 2 n 4

2. If gj is the indicator function of the set C, the convex conjugate is given by

gj∗ (u) = sup u⊤ w, (6.20)

w∈C

where C = {w ∈ Rp | ∥w − ej ∥∞ ≤ µ}. This implies that P wi ∈ [eji − µ, eji + µ], ∀i. (Note that
eij = 1 if i = j and 0 otherwise.) To maximize u⊤ w = pi=1 ui wi , the optimal wi can be chosen
as (
eji + µ if ui ≥ 0,
wi = (6.21)
eji − µ if ui < 0.

Substituting into u⊤ w, we obtain u⊤ w = P pi=1 ui eji + µ sign(ui ) , where sign(ui ) is the sign
P

of ui . Simplifying, we have u⊤ w = uj + µ pi=1 |ui |. Thus, we have

gj∗ (u) = uj + µ∥u∥1 . (6.22)

6.2 Proofs of Theorems and Lemmas of Chapter 3

6.2.1 Proofs of Theorems and Lemmas on Robust Lasso
We extend Theorem 1 and Lemma 1 of [42] to prove our Theorem 4. We re-parameterize model (3.5)
and the robust Lasso optimisation problem (3.6) to match those in [42], i.e.,

√ β∗ √

+ η = Aβ ∗ + ne∗ + η,

y = A | nIn ∗ √ (6.23)
δ / n
√
where e∗ ≜ δ ∗ / n. Note that the optimization problem (3.6) is
! 2
√

β̂λ1 1 β δ
= arg min y − (A| nIn ) √ + λ1 ∥β∥1 + λ̃2 √ ,
δ̂λ̃2 β,δ 2n δ/ n 2 n 1

√
where λ̃2 = nλ2 . The equivalent robust Lasso optimization problem for the model (6.23) is given
by:
2
√

β̂λ1 1 β
= arg min y − (A| nIn ) + λ1 ∥β∥1 + λ̃2 ∥e∥1 , (6.24)
êλ̃2 β,e 2n e 2
√
where êλ2 = δ̂λ2 / n. In order to prove Theorem 4, we first recall the Extended Restricted Eigenvalue
Condition (EREC) for a sensing matrix from [42]. Given β ∗ and δ ∗ , let us define sets

S ≜ {j : βj∗ ̸= 0} , R ≜ {i : δi∗ ̸= 0}. (6.25)

50
Note that s ≜ |S|, r ≜ |R|.
Extended Restricted Eigenvalue Condition (EREC) [42]: Given S, R as defined in (6.25),
and λ1 , λ̃2 > 0, an n × p matrix A is said to satisfy the EREC if there exists a κ > 0 such that
1 √
√ ∥Ahβ + nhδ ∥2 ≥ κ(∥hβ ∥2 + ∥hδ ∥2 ), (6.26)
n

for all (hβ , hδ ) ∈ C(S, R, λ) with λ := λ̃2 /λ1 and where C is defined as follows:

C(S, R, λ) ≜ {(hβ , hδ ) ∈ Rp × Rn : ∥(hβ )Sc ∥1 + λ∥(hδ )Rc ∥1 ≤ 3(∥(hβ )S ∥1 + λ∥(hδ )R ∥1 )}.

(6.27)

Here, (hβ )S and (hδ )R are s and r dimensional vectors extracted from hβ and hδ respectively,
restricted to the set S and R as defined in (6.25). ■ In Lemma. 6, we extend Lemma 1 from [42] to
random Rademacher matrices. In this lemma we show that a random Rademacher matrix A satisfies
EREC with high probability for κ = 1/16.

Lemma 6 Let A be an n × p matrix with i.i.d. Rademacher entries. There exist positive constants
C1 , C2 , c3 , c4 such that if n ≥ C1 s log p and r ≤ min{C2 logn n , slog
log p
n } then

√

1 1
P ∀ (hβ , hδ ) ∈ C (S, R, λ) , √ ∥Ahβ + nhδ ∥2 ≥ (∥hβ ∥2 + ∥hδ ∥2 ) ≥ 1 − c3 exp {−c4 n},
n 16
q
where λ = log n
log p and C as in (6.27). ■

Proof of Lemma 6: Using a similar line of argument as in the proof of Lemma 1 of [42], it is enough
to show the following two properties of the sensing matrix A to complete the proof.
1 2
1. Lower bound on + ∥hδ ∥22 . For some κ1 > 0 with high probability,
n ∥Ahβ ∥2
s !
1 2 log n
2 2
∥Ahβ ∥2 + ∥hδ ∥2 ≥ κ1 (∥hβ ∥2 + ∥hδ ∥2 ) ∀ (hβ , hδ ) ∈ C S, R, . (6.28)
n log p

2. Mutual Incoherence: The column space of the matrix A is incoherent with the column space
of the identity matrix. For some κ2 > 0 with high probability,
s !
1 log n
√ |⟨Ahβ , hδ ⟩| ≤ κ2 (∥hβ ∥2 + ∥hδ ∥2 ) 2
∀ (hβ , hδ ) ∈ C S, R, . (6.29)
n log p

By using (6.28) and (6.29), we have, with high probability,

1 √ 1 2
∥Ahβ + nhδ ∥22 = ∥Ahβ ∥22 + ∥hδ ∥22 + √ ⟨Ahβ , hδ ⟩ ≥ κ1 (∥hβ ∥2 + ∥hδ ∥2 )2 − 2κ2 (∥hβ ∥2 + ∥hδ ∥2 )2
n n n
= (κ1 − 2κ2 )(∥hβ ∥2 + ∥hδ ∥2 )2

The proof is completed if κ1 > 2κ2 . We now show that (6.28) and (6.29) hold together with κ1 > 2κ2
for a Rademacher sensing matrix A.
We now state two important facts on the Rademacher matrix A which will be used in proving (6.28)
and (6.29) respectively.

(1) We use a result following Lemma 1 [34] (see the equation immediately following Lemma 1 in [34],
and set D̄ in that equation to the identity matrix, since we are concerned with signals that are
sparse in the canonical basis). Using this result, there exist positive constants c2 , c′3 , c′4 , such that
with probability at least 1 − c′3 exp {−c′4 n}:
r
1 ∥hβ ∥2 log p
√ ∥Ahβ ∥2 ≥ − c2 ∥hβ ∥1 ∀ hβ ∈ Rp . (6.30)
n 4 n

51
(2) From Theorem 4.4.5 of [52], for a s × r′ dimensional Rademacher matrix ARi Sj , there exists a
constant c1 > 0 such that, for any τ ′ > 0, with probability at least 1 − 2 exp {−nτ ′2 } we have
r !
r′
r
1 1 s ′
√ ∥ARi Sj ∥2 = √ σmax (ARi Sj ) ≤ c1 + +τ . (6.31)
n n n n

7 2
2 2 2
Throughout this proof, we take the constants C1 ≜ (24c 2 and C2 ≜ max{32 c2 , 4(51200c1 ) }, where
2)
c1 , c2 are as defined in (6.30) and (6.31) respectively.
Proof of (6.28): We first obtain a lower bound on n1 ∥Ahβ ∥22 using (6.30). For every (hβ , hδ ) ∈
q
C S, R, log n
log p , we have:

s s
log n √ log n √
∥hβ ∥1 ≤ 4∥(hβ )S ∥1 + 3 ∥(hδ )S ∥1 ≤ 4 s∥hβ ∥2 + 3 r∥hδ ∥2 . (6.32)
log p log p
′ ′
Substituting (6.32)
in
q(6.30), we obtain that, with probability at least 1 − c3 exp {−c4 n}, for every
(hβ , hδ ) ∈ C S, R, log n
log p :

r ! sr
1 1 s log p log n r log p
√ ∥Ahβ ∥2 ≥ − 4c2 ∥hβ ∥2 − 3c2 ∥hδ ∥2 ,
n 4 n log p n
r ! s r !
1 1 s log p log n r log p
∴ √ ∥Ahβ ∥2 + ∥hδ ∥2 ≥ − 4c2 ∥hβ ∥2 + 1 − 3c2 ∥hδ ∥2(6.33)
.
n 4 n log p n

Under the assumption n ≥ C1 s log p, the first term in the brackets of (6.33) is greater than 18 . Again,
under the assumption r ≤ C2 logn n , the second term is greater than 18 . Thus we have, √1n ∥Ahβ ∥2 +
∥hδ ∥2 ≥ 18 (∥hβ ∥2 + ∥hδ ∥2 ). Squaring both sides, we have, 1 2
n ∥Ahβ ∥2 + ∥hδ ∥22 + √2 ∥Ahβ ∥2 ∥hδ ∥2
n
≥
1 2
64(∥hβ ∥2 + ∥hδ ∥2 ) . Using the fact that ∥a∥22 + ∥b∥22 ≥ 2∥a∥2 ∥b∥2 for any vectors a, b, we have,
2 n1 ∥Ahβ ∥22 + ∥hδ ∥22 ≥ n1 ∥Ahβ ∥22 + ∥hδ ∥22 + √2n ∥Ahβ ∥2 ∥hδ ∥2 ≥ 64 1
(∥hβ ∥2 + ∥hδ ∥2 )2 . Hence we

q
have with probability at least 1 − c′3 exp {−c′4 n}, for every (hβ , hδ ) ∈ C S, R, log n
log p

1 1
∥Ahβ ∥22 + ∥hδ ∥22 ≥ (∥hβ ∥2 + ∥hδ ∥2 )2 . (6.34)
n 128
Therefore, we have κ1 = 1/128 completing the proof of (6.28).
Proof of (6.29): This part of the proof directly follows the proof of Lemma 2 in [42], with a few minor
differences in constant factors. Nevertheless, we are including it here to make the paper self-contained.
Divide the set {1, 2, . . . , p} into subsets S1 , S2 , . . . , Sq of size s each, such that the first set S1
contains s largest absolute value entries of hβ indexed by S, the set S2 contains s largest absolute
value entries of the vector (hβ )Sc , S2 contains the second largest s absolute value entries of (hβ )Sc ,
and so on. By the same strategy, we also divide the set {1, 2, . . . , n} into subsets R1 , R2 , . . . , Rk such
that the first set R1 contains r entriesq of hδ indexed by R and sets R2 , R3 , . . . are of size r ≥ r. We
′
log n
have for every (hβ , hδ ) ∈ C S, R, log p ,

1 X 1 1 X
√ |⟨Ahβ , hδ ⟩| ≤ √ |⟨ARi Sj (hβ )Sj , (hδ )Ri ⟩| ≤ max √ ∥ARi Sj ∥2 ∥(hβ )Sj ′ ∥2 ∥(hδ )Ri′ ∥2
n n i,j n ′ ′
i,j i ,j
(6.35)
  !
1 X X
= max √ ∥ARi Sj ∥2  ∥(hβ )Sj ′ ∥2  ∥(hδ )Ri′ ∥2 .
i,j n ′ ′
j i
(6.36)

52
Note that ARi Sj (a submatrix of A containing rows belonging to Ri and columns belonging to Sj ) is
values of Sj
itself a Rademacher matrix with i.i.d. entries. Taking the union bound over all possible
and Ri , we have that the inequality in (6.31) holds with probability at least 1−2 rn′ ps exp (−nτ ′2 ). If
n ≥ 4τ ′ −2 s log(p) we obtain, ps ≤ ps ≤ exp(τ ′ 2 n/4). Furthermore, if we assume, n ≥ 4τ ′ −2 r′ log(n),

′
we have rn′ ≤ nr ≤ exp(τ ′ 2 n/4). Later we will give a choice of τ ′ which ensures that these conditions

are satisfied. Therefore, we obtain with probability at least 1 − 2 exp {−nτ ′2 /2},
r !
r′
r
1 s ′
maxi,j √ ∥ARi Sj ∥2 ≤ c1 + +τ . (6.37)
n n n
Pq
Using the first inequality in the last equation of Sectionq 2.1 of [10] we obtain i=3 ∥(hβ )Si ∥2 ≤

log n
√
√ ∥(hβ )Sc ∥1 . Furthermore, for every (hβ , hδ ) ∈ C S, R,
1
log p , we have ∥(hβ )S ∥1 ≤ 3 s∥hβ ∥2 +
c
s
n√
q
3 log
log p r∥hδ ∥2 . Hence,

q q q
s r
X X X log n r
∥(hβ )Si ∥2 = ∥(hβ )S1 ∥2 +∥(hβ )S2 ∥2 + ∥(hβ )Si ∥2 ≤ 2∥hβ ∥2 + ∥(hβ )Si ∥2 ≤ 5∥hβ ∥2 +3 ∥hδ ∥2 .
log p s
i=1 i=3 i=3
Pk
√1 ∥(hδ ) Rc ∥1 . Furthermore, for every
Following a similar process we obtain i=3 ∥(hδ ) Ri ∥2 ≤ r′
q
log n
(hβ , hδ ) ∈ C S, R, log p , we have √ ′ ∥(hδ )Rc ∥1 ≤ 3 r′ q log n ∥hβ ∥2 + 3 rr′ ∥hδ ∥2 . Since r′ ≥ r,
1
ps 1 p
r
log p

k k k r
X X X 3 s
∥(hδ )Ri ∥2 = ∥(hδ )R1 ∥2 +∥(hδ )R2 ∥2 + ∥(hδ )Ri ∥2 ≤ 2∥hδ ∥2 + ∥(hδ )Ri ∥2 ≤ 5∥hδ ∥2 + q ∥hβ ∥2 .
log n r′
i=1 i=3 i=3 log p

′2
Hence, joining (6.37) q into(6.35), we obtain with probability at least 1 − 2 exp {−nτ /2}, for
, (6.31)
every (hβ , hδ ) ∈ C S, R, log n
log p

r ! s !  
r′
r r r
1 s log n r 3 s
√ |⟨Ahβ , hδ ⟩| ≤ c1 + + τ′ × 5∥hβ ∥2 + 3 ∥hδ ∥2 × 5∥hδ ∥2 + q ∥hβ ∥2 (6.38
.
n n n log p s log n r′
log p
q q q
r′
Recall that r ≤ slog log p
n , by assumption. Taking r ′ = s log p leads to
log n
log n p r
log p s ≤ log n
log p s = 1 and
1 s ′2
p
r′ = 1. Thus, we obtain with probability at least 1 − 2 exp(−nτ /2) for every (hβ , hδ ) ∈
q
log n
log p
q
C S, R, log n
log p ,

r !
r′
r
1 s
√ |⟨Ahβ , hδ ⟩| ≤ 25c1 + + τ′ × (∥hβ ∥2 + ∥hδ ∥2 )2 (6.39)
n n n

Let τ ′ ≜ 1/(51200c1 ). Recall that, C1 ≜ max{322 c22 , 4(51200c1 )2 }. Then n ≥ C1 s log p implies
n ≥ 4τ ′ −2 s log p = 4τ ′ −2 r′ log n. Furthermore,
r s
r′
r
s s log p
≤ = ≤ τ ′ /2. (6.40)
n n n log n
q
Therefore, we have with probability at least 1−2 exp(−nτ ′2 /2), for every (hβ , hδ ) ∈ C S, R, log n
log p

1 1
√ |⟨Ahβ , hδ ⟩| ≤ 25c1 × 2τ ′ (∥hβ ∥2 + ∥hδ ∥2 )2 ≤ (∥hβ ∥2 + ∥hδ ∥2 )2 (6.41)
n 512

This completes the proof of (6.29).

53
Now, from (6.34) and (6.41), using a union bound, we obtain with probability at least 1 −
(c′3 exp(−c′4 n)
+ 2 exp(−nτ ′2 /2)),
1 √
∥Ahβ + nhδ ∥22 ≥ (κ1 − 2κ2 )(∥hβ ∥2 + ∥hδ ∥2 )2 = κ2 (∥hβ ∥2 + ∥hδ ∥2 )2 (6.42)
n
Taking c3 = c′3 + 2 and c4 = min{c′4 , τ ′2 /2}, we have 1 − (c′3 exp(−c′4 n) + 2 exp(−τ ′2 n/2) ≥ 1 −
c3 exp(−c4 n). √
Note that, we have, κ = κ1 − 2κ2 = 1/16. Taking the root over (6.42), we obtain with probability
at least 1 − c3 exp(−c4 n),
s !
1 √ 1 log n
√ ∥Ahβ + nhδ ∥2 ≥ (∥hβ ∥2 + ∥hδ ∥2 ) ∀ (hβ , hδ ) ∈ C S, R, .
n 16 log p
This completes the proof of the lemma. ■

Proof of Theorem 4
Proof of (3.7): We now derive the bound for the l1 norm of the robust lasso √
estimate of the√error
4σ √log p
∗
β̂λ1 −β given by the optimisation problem (3.6). Recall that we have λ1 = n
and λ2 = 4σ nlog n .
√ √
We choose λ̃2 ≜ nλ2 = 4σ √logn
n
. We use λ̃2 to define the cone constraint in (6.27). Note that, in
the proof of Theorem 1 of [42], it is shown that hβ ≜ β̂λ1 − β ∗ and hδ ≜ √1 (δ̂λ
n 2 − δ ∗ ) satisfies the
cone constraint given in (6.27). Therefore, we have
λ̃2 λ̃2
∥(hβ )Sc ∥1 + ∥(hδ )Rc ∥1 ≤ 3(∥(hβ )S ∥1 + ∥(hδ )R ∥1 ). (6.43)
λ1 λ1
Now by using Eqn. (6.43), we have
λ̃2 √ √ λ̃2
∥hβ ∥1 = ∥(hβ )S ∥1 + ∥(hβ )Sc ∥1 ≤ 4∥(hβ )S ∥1 + 3 ∥(hδ )R ∥1 ≤ 4 s∥hβ ∥2 + 3 r ∥hδ (6.44)
∥2 .
λ1 λ1
√ √
Here, the last inequality of Eqn.(6.44) holds since ∥(hβ )S ∥1 ≤ s∥hβ ∥2 and ∥(hδ )R ∥1 ≤ r∥hδ ∥2 .
√ √ √
Note that, max{ s, r} ≤ s + r. Based on the values of λ1 , λ̃2 , we have λ̃2 < λ1 since n < p.
Hence, by using Eqn.(6.44), we have
√ √ √
∥hβ ∥1 ≤ 4 s∥hβ ∥2 + 3 r∥hδ ∥2 ≤ 4 s + r (∥hβ ∥2 + ∥hδ ∥2 ) . (6.45)
∗ δ̂
Recall that, e∗ = √δ
n
and ê = √λ̃n2 in Theorem 1 of [42]. Therefore, by the equivalence of the model
given in (6.23) and the optimisation problem in (3.6) with that of [42], we have
1 √ √
∥β̂λ1 − β ∗ ∥2 + √ (δ̂λ̃2 − δ ∗ ) ≤ 3κ−2 max{λ1 s, λ̃2 r}, (6.46)
n 2
as long as
2∥A⊤ η∥∞ 2∥η∥∞
≤ λ1 , and √ ≤ λ̃2 . (6.47)
n n
Therefore when (6.47) holds, then by using (6.45) (recall hβ = β̂λ1 − β ∗ ) and (6.46), we have
√ √ √
∥β̂λ1 − β ∗ ∥1 ≤ 4 s + r 3κ−2 max{λ1 s, λ̃2 r} ≤ 12κ−2 (s + r)max{λ1 , λ̃2 } ≤ 12κ−2 (s + r)λ(6.48)
1.

2∥A⊤ η∥∞ 2∥η∥∞

We will now bound the probability that ≤ λ1 and √ ≤ λ̃2 using the fact that η is
n n
Gaussian. By using Lemma 12 in Appendix 6.2.3 which describes the tail bounds of the Gaussian
vector, we have
√ p 1
P (2∥η∥∞ / n ≤ 4σ log(n)/n) ≥ 1 − , (6.49)
n
√

1 1
P 2 √ A⊤ η
p
/ n ≤ 4σ log(p)/n ≥ 1 − . (6.50)
n ∞ p

54
Using (6.49),(6.50) with Bonferroni’s inequality in (6.48), we have:
r !
log(p) 1 1
P ∥β̂λ1 − β ∗ ∥1 ≤ 48κ−2 (s + r)σ ≥1− − . (6.51)
n n p

This completes the proof of (3.7).

Proof of (3.8): We now derive an upper bound of ∥δ̂λ2 − δ ∗ ∥1 . We approach this by showing that
given the optimal estimate of β̂ λ1 , we can obtain a unique estimate of δ̂ λ2 using (6.53). We then
derive the upper bound on ∥δ̂λ2 − δ ∗ ∥1 using the Lasso bounds given in [25]. Expanding the terms
in (3.5), we obtain:
n
( )
1 2
X 1 2
minβ,δ ∥y − Aβ − δ∥2 + λ1 ∥β∥1 + λ2 ∥δ∥1 = minβ λ1 ∥β∥1 + minδi ((yi − ai. β) − δi ) + λ2 |δi | (6.52
.
2n 2n
i=1

Given the optimal solutions β̂λ1 and δ̂λ2 of (3.6), δ̂λ2 can also be viewed as
n
1 X
δ̂λ2 = argminδ {(yi − ai. β̂λ1 − δi )2 } + λ2 ∥δ∥1 (6.53)
2n
i=1

Thus (6.53) can also be viewed as Lasso estimator for z = In δ ∗ + ϱ, where z ≜ y − A⊤ β̂λ1
and ϱ ≜ A(β ∗ − β̂λ1 ) + η with δ ∗ being r-sparse. By using Theorem 11.1(b) of [25] , we have if
λ2 ≥ 2 ∥ϱ∥∞
n , √
∗ 3 rλ2
δ̂λ2 − δ ≤ , (6.54)
2 γr
where γr is the Restricted eigenvalue constant of order r which equals one for In . Now using the
result in Lemma 11.1 of [25], when λ2 ≥ 2 ∥ϱ∥ ∞
n , then

(δ̂λ2 − δ ∗ )Rc ≤ 3 (δ̂λ2 − δ ∗ )R .

1 1

Therefore by using (6.54) when λ2 ≥ 2 ∥ϱ∥∞

n , we have
√
∥δ̂λ2 − δ ∗ ∥1 = (δ̂λ2 − δ ∗ )Rc + (δ̂λ2 − δ ∗ )R ≤ 4 (δ̂λ2 − δ ∗ )R ≤ 4 r (δ̂λ2 − δ ∗ )R
1 1 1 2
√ ∗
≤ 4 r∥δ̂λ2 − δ ∥2 ≤ 12rλ2 . (6.55)
√ √
Therefore we now show that λ2 = 4σ log n
n
≥ 2 ∥ϱ∥n
∞
(i.e., ∥ϱ∥∞ ≤ 2σ log n) holds with high
probability. Now, by Lemma 11 and the triangle inequality, we have
∥ϱ∥∞ = ∥A(β ∗ − β̂λ1 ) + η∥∞ ≤ |A|∞ ∥β ∗ − β̂λ1 ∥1 + ∥η∥∞ .
By using Lemma 12 in Appendix 6.2.3, we have the following:
p 1
P (∥η∥∞ ≥ σ log(n)) ≤ . (6.56)
n
Since |A|∞ ≤ 1, by using (6.51), we have
r !
−2 log(p) p 2 1
P ∥ϱ∥∞ ≤ 48κ (s + r)σ + σ log(n) ≥ 1 − + . (6.57)
n n p
q
Since n log n ≥ (48κ−2 )2 (s + r)2 log p, we have 48κ−2 (s + r)σ log(p)
p
n < σ log(n). Thus

p 2 1
P (∥ϱ∥∞ ≤ 2σ log n) ≥ 1 − + . (6.58)
n p
We now put (6.58) in (6.55) to obtain:
√
24σr log n 2 1
P ∥δ̂λ2 − δ ∗ ∥1 ≤ ≥1− + . (6.59)
n n p
This completes the proof. ■

55
6.2.2 Proofs of Theorems and Lemmas on Debiased Lasso
Proof of Theorem 5
Note that we have chosen W = A. Now, recall the expression for β̂W from (3.14) and model as given
in (3.5), we have

∗ 1 ⊤ 1 ⊤ 1
β̂W − β = A η + Ip − A A (β̂λ1 − β ∗ ) + A⊤ δ ∗ − δ̂λ2 . (6.60)
n n n

In Lemma 7, (6.64) and (6.65) show that the second and third term on the RHS of (6.60) are negligible
as n, p increases in probability. Therefore, in view of Lemma 7, we have
√ 1
n(β̂W j − βj∗ ) = √ a⊤ η + oP (1), (6.61)
n .j

where a.j denotes the j th column of matrix A. Given a.j , by using the Gaussianity of η, the first
a⊤
.j a.j
term on the RHS of (6.61) is a Gaussian random variable with mean 0 and variance σ 2 n . Since
a⊤
.j a.j
n = 1, the first term on the RHS is N(0, σ 2 ). This completes the proof of result (1) of the theorem.
We now turn to result (2) of the theorem. By using a similar decomposition argument as in the
case of β̂W in (6.60) and using the expression of δ̂W in (3.15), we have

1 1 1
δ̂W − δ ∗ = In − AA⊤ η + In − AA⊤ A(β ∗ − β̂λ1 ) − AA⊤ δ ∗ − δ̂λ2 .

(6.62)
n n n
⊤
We have ΣA = In − n1 AA⊤ In − n1 AA⊤

.From (6.66)
and (6.67) of Lemma 7, the second and
√
p 1−n/p
third terms on the RHS of (6.62) are both oP n . Therefore, using Lemma 7, we have for
any i ∈ [n]

δ̂W i − δi∗
⊤ !
In − n1 AA⊤
p
i.
η 1 p 1 − n/p
p = p + oP p . (6.63)
ΣAii ΣAii ΣAii n

As η is Gaussian, the first term on the RHS of (6.63) is a Gaussian random variable with mean 0 and
n2
variance σ 2 . In Lemma 8, we show that p(p−n) ΣAii converges to 1 in probability if A is a Rademacher
matrix. This implies that the second term on the RHS of (6.63) is oP (1). This completes the proof of
result (2). ■

√ √
Lemma 7 Let β̂ λ1 , δ̂λ2 be as in (3.6) and set λ1 ≜ 4σ √logn
p
, λ2 ≜ 4σ log n
n .
Given A is a Rademacher
matrix, if n is o(p) and n is ω[((s + r) log p)2 ], then as n, p → ∞ we have following:

√

1 ⊤
n Ip − A A (β ∗ − β̂λ1 ) = oP (1) (6.64)
n ∞
1
√ A⊤ δ ∗ − δ̂λ2 = oP (1) (6.65)
n ∞

n 1
p In − AA⊤ A(β ∗ − β̂λ1 ) = oP (1) (6.66)
p 1 − n/p n
∞
n 1
AA⊤ δ ∗ − δ̂λ2

p = oP (1) (6.67)
p 1 − n/p n ∞

56
Proof of Lemma 7:
When n is ω[((s+r) log p)2 ], the assumptions of Lemma 6 are satisfied. Hence, the Rademacher matrix
A satisfies the assumptions of Theorem 4 with probability that goes to 1 as n, p → ∞. Therefore, to
prove the results, it suffices to condition on the event that the conclusion of Theorem 4 holds.
Proof of (6.64): Using result (4) of Lemma 11, we have:
√ √

1 ⊤
n Ip − A A (β ∗ − β̂λ1 ) ≤ n|A⊤ A/n − Ip |∞ ∥β ∗ − β̂λ1 ∥1 . (6.68)
n ∞

From result (1) of Lemma 10, result (1) of Theorem 4, and result (5) of Lemma 11 , we have,
√

1 ⊤ ∗ log p
n Ip − A A (β − β̂λ1 ) = OP (s + r) √ . (6.69)
n ∞ n

Under the assumption n is ω[((s + r) log p)2 ], we have:

√

1 ⊤
n Ip − A A (β ∗ − β̂λ1 ) = oP (1). (6.70)
n ∞

Proof of (6.65): Again by using result (4) of Lemma 11, we have

1 1
√ A⊤ (δ ∗ − δ̂λ2 ) ≤ √ A⊤ ∥δ ∗ − δ̂λ2 ∥1 .
n ∞ n ∞

Since A is a Rademacher matrix, we have, √1 A⊤ = √1 . From result (2) of Theorem 4 and result
n ∞ n
(5) of Lemma 11, we have
√
1 r log n
√ A⊤ (δ ∗ − δ̂λ2 ) = OP . (6.71)
n ∞ n3/2
As n, p → ∞, we have
1
√ A⊤ (δ ∗ − δ̂λ2 ) = oP (1). (6.72)
n ∞
Proof of (6.66): Again using result (4) of Lemma 11, we have,

n 1 ⊤ ∗ n 1 1 ⊤
p In − AA A(β − β̂λ1 ) ≤p × In − AA A ∥β ∗ − β̂λ1 ∥(6.73)
1.
p 1 − n/p n 1 − n/p p n ∞
∞

By using result (5) of Lemma 11, result (1) of Theorem 4 and result (2) of Lemma 10, we have
s !r !
n 1 ⊤ ∗ n log(pn) 1 log p
p In − AA A(β − β̂λ1 ) ≤ OP (s + r) p +
p 1 − n/p n 1 − n/p pn n n
∞
s r r !
(s + r)n log(pn) log p (s + r)n 1 log p
= OP p +p
1 − n/p pn n 1 − n/p n n
s r !
(s + r) log(np) log(p) (s + r) log p
= OP p +p (6.74)
.
1 − n/p p 1 − n/p n

Since n is o(p) and n is ω[((s + r) log p)2 ], (6.74) becomes as n, p → ∞,

n 1 ⊤
p In − AA A(β ∗ − β̂λ1 ) = oP (1). (6.75)
p 1 − n/p n
∞

Proof of (6.67): Using result (4) of Lemma 11, we have,

n 1 1 1
AA⊤ δ ∗ − δ̂λ2 × AA⊤ δ ∗ − δ̂λ2

p ≤p . (6.76)
p 1 − n/p n ∞
1 − n/p p ∞ 1

57
Since A is a Rademacher matrix, the elements of p1 AA⊤ lie between −1 and 1. Therefore, 1 ⊤
p AA ∞ =
1. By using part (5) of Lemma 11 and result (2) of Theorem 4, we have
√ !
n 1 r log n
AA⊤ δ ∗ − δ̂λ2

p = OP p √ . (6.77)
p 1 − n/p n ∞
1 − n/p n

Since n is o(p), we have

n 1
AA⊤ δ ∗ − δ̂λ2

p = oP (1). (6.78)
p 1 − n/p n ∞
This completes the proof. ■

Lemma 8 Let A be a Rademacher matrix and ΣA be as defined in (3.18). If n log n is o(p), we have
as n, p → ∞ for any i ∈ [n],
n2 P
2
ΣAii → 1. (6.79)
p (1 − n/p)
⊤
Proof of Lemma 8: Recall that from (3.18), ΣA = In − n1 AA⊤ In − n1 AA⊤ . Note that for

i ∈ [n], we have

n2 n2

2 ⊤ 1 ⊤ ⊤
ΣA = 1 − ai. ai. + 2 ai. A Aai.
p2 (1 − n/p) ii p2 (1 − np ) n n
 2  2
⊤ n ⊤
n a a
i. i. 
X n a a
i. k. 
=  q 1− +  q
p (1 − ) n n n
p (1 − ) n
p k=1 p
k̸=i
2
 
n
ai. a⊤

n X n k. 
= 1− + q . (6.80)
p p (1 − n
) n
k=1 p
k̸=i

In (6.80), since the second term is positive, we have

n2 n
2
ΣAii ≥ 1 − . (6.81)
p (1 − n/p) p

We have from Result (3) of Lemma 10, for k ∈ [n], k ̸= i,

s !
1 2 2 log(n) 2
P ai. a⊤
k. /p ≤ p ≥1− . (6.82)
n2
p
1 − n/p 1 − n/p p

By using (6.82), we have

 
 2
n
ai. a⊤

X
 q n k.  4(n − 1) 2 log(n) 
P ≤ 

p (1 − np ) n 1 − n/p p 
k=1
k̸=i
  2 

a a⊤
 n i. k.  4 2 log(n) 
≤ P ∩nk=1  q ≤
k̸=i  p (1 − n ) n 1 − n/p p 
p
  s 
⊤
ai. ak.
 n 2 2 log(n) 
= P ∩nk=1 q ≤p
k̸=i  p (1 − n ) n 1 − n/p p 
p

2(n − 1)
≥ 1− . (6.83)
n2

58
Pn
The last inequality comes using Bonferroni’s inequality which states that P (∩ni=1 Ui ) ≥ 1 − i=1 P (Ui )
for any events U1 , U2 , ..., Un . Therefore by using (6.80) and (6.83), we have

n2

n 4(n − 1) 2 log(n) 2
P 2
ΣAii ≤ 1 − + ≥1− (6.84)
p (1 − n/p) p 1 − n/p p n

By using (6.81) and (6.84), we have

n2

n n 4(n − 1) 2 log(n) 2
P 1− ≤ 2 ΣAii ≤ 1 − + ≥1− (6.85)
p p (1 − n/p) p 1 − n/p p n
n 4(n−1) 2 log(n)
Since n log n is o(p), as n, p → ∞, 1 − p → 1 and 1−n/p p → 0. This completes the proof. ■

Now we proceed to the results involving debiasing using the optimal weights matrix W obtained
from Alg. 3. The proofs of these results largely follow the same approach as that for W = A (i.e.
Theorem 5). However there is one major point of departure—due to differences in properties of the
weights matrix W designed from Alg. 3 (given in Lemma 9), as compared to the case where W = A
(given in Lemma 10).

Proof of Theorem 6
Proof of (3.23): Using Result (4) of Lemma 11, we have
√ √

1 ⊤
n Ip − W A (β ∗ − β̂λ1 ) ≤ n∥W ⊤ A/n − Ip |∞ ∥β ∗ − β̂λ1 ∥1 . (6.86)
n ∞

Using Result (2) of Lemma 9, Result (1) of Theorem 4 and Result (5) of Lemma 11, we have
√

1 ⊤ ∗ log p
n Ip − W A (β − β̂λ1 ) = OP (s + r) √ . (6.87)
n ∞ n

If n is ω[((s + r) log p)2 ], we have:

√

1 ⊤
n Ip − W A (β ∗ − β̂λ1 ) = oP (1). (6.88)
n ∞

Proof of (3.24): Using Result (4) of Lemma 11, we have

1 1
√ W ⊤ (δ ∗ − δ̂λ2 ) ≤ √ W⊤ ∥δ ∗ − δ̂λ2 ∥1 .
n ∞ n ∞

Using Result (3) of Lemma 9, Result (2) of Theorem 4 and Result (5) of Lemma 11, we have
√
1 r log n
√ W ⊤ (δ ∗ − δ̂λ2 ) = OP = oP (1). (6.89)
n ∞ n3/2

Proof of (3.25): Using Result (4) of Lemma 11, we have

n 1 ⊤ n 1 1
p In − W A A(β ∗ − β̂λ1 ) ≤p × In − W A⊤
A ∥β ∗ − β̂λ1 ∥(6.90)
1.
p 1 − n/p n 1 − n/p p n ∞
∞

Using Result (4) of Lemma 9, Result (1) of Theorem 4 and Result (5) of Lemma 11, we have
s !r !
n 1 (s + r)n log(pn) 1 log p
p In − W A⊤ A(β ∗ − β̂λ1 ) ≤ OP p +
p 1 − n/p n 1 − n/p pn n n
∞
s r !
(s + r) log(np) log(p) (s + r) log p
= OP p +p (6.91).
1 − n/p p 1 − n/p n

59
Since n is o(p) and n is ω[((s + r) log p)2 ], we have

n 1 ⊤
p In − W A A(β ∗ − β̂λ1 ) = oP (1). (6.92)
p 1 − n/p n
∞

Proof of (3.26): Using Result (4) of Lemma 11, we have

n 1 1 1
W A⊤ δ ∗ − δ̂λ2 × W A⊤ δ ∗ − δ̂λ2

p ≤p (6.93)
p 1 − n/p n ∞
1 − n/p p ∞ 1

Using Result (5) of Lemma 9, Result (2) of Theorem 4 and Result (5) of Lemma 11, we have
s !√ !
n 1 r n log(np) log n
W A⊤ δ ∗ − δ̂λ2

p = OP p +1
p 1 − n/p n 1 − n/p p n3/2
∞
s r √ !
r log(np) log n r log n
= OP p +p (6.94)
.
1 − n/p p n 1 − n/p n3/2

Since n is o(p), we have

n 1
W A⊤ δ ∗ − δ̂λ2

p = oP (1) (6.95)
p 1 − n/p n ∞

This completes the proof. ■

Proof of Theorem 7
2 ⊤
Result(1): Recall that W is the output of Alg. 3, Σβ = σn W ⊤ W , Σδ = σ 2 In − n1 W A⊤ In − n1 W A⊤

q
and µ1 = 2 2 log p
n . We will derive the lower bound of Σβjj for all j ∈ [p] following the same idea as
σ2 ⊤
in the proof of Lemma 12 of [31]. Note that, Σβjj = For all j ∈ [p], from (6.120) of result
n w.j w.j .

2 2 3
(2) of Lemma 9, for any feasible W with probability at least 1 − + + , we have
p2 n2 2np
1 ⊤
1− a w.j ≤ µ1 =⇒ 1 − µ1 ≤ a⊤
.j w.j .
n .j
For any feasible W of Alg.3, we have for any c > 0,

1 ⊤ 1 ⊤
⊤
1 ⊤
⊤
w w.j ≥ w w.j + c(1 − µ1 ) − c a.j w.j ≥ min n w w.j + c(1 − µ1 ) − c a.j w.j
n .j n .j w.j ∈R n .j
⊤
c2 a.j a.j c2 a.j ⊤ a.j

1 ⊤
= min n (w.j − ca.j /2) (w.j − ca.j /2) + c(1 − µ1 ) − ≥ c(1 − µ1 ) −
w.j ∈R n 4 n 4 n
c2
= c(1 − µ1 ) − .
4
We obtain the last inequality by putting w.j = ca.j /2 which makes the square term 0. The rightmost
equality is because a.j ⊤ a.j = n. The lower bound on the RHS is maximized for c = 2(1 − µ1 ).
2 2 1
Plugging in this value of c, we obtain the following with probability atleast 1 − p2
+ n2
+ 2np :

1 ⊤
w w.j ≥ (1 − µ1 )2 .
n .j
Hence, from the above equation and (6.115), we obtain the lower bound on Σβjj for any j ∈ [p] as
follows:
2 2
2 2 1
P Σβjj ≥ σ (1 − µ1 ) ≥ 1 − + + . (6.96)
p2 n2 2np

60
Furthermore from Result (1) of Lemma. 9, we have

⊤
P w.j w.j /n ≤ 1 ∀j ∈ [p] = 1. (6.97)
⊤w
w.j
⊤ w /n ≤ 1) = 1. As Σ 2 .j
We use (6.97) to get, for any j ∈ [p], P (w.j .j βjj = σ n , we have for any
j ∈ [p]:
P Σβjj ≤ σ 2 = 1.

(6.98)
Using (6.98) with (6.96), we obtain for any j ∈ [p],

2 2 1
P σ 2 (1 − µ1 )2 ≤ Σβjj ≤ σ 2 ≥ 1 −

+ + .
p2 n2 2np
P
Now under the assumption n is ω[((s + r) log p)2 ], µ1 → 0. Hence, we have, Σβjj → σ 2 . This completes
the proof of Result (1). q
2 log(n)
Result (2): Recall that µ3 = √ 2 p . Now in order to obtain the upper and lower bounds
1−n/p
for Σδii for any i ∈ [n], we use Result (6) of Lemma. 9. We have,
 
⊤

n WA p 2 2 1
P q − In ≤ µ3  ≥ 1 − + + . (6.99)
p 1 − np n n ∞ p2 n2 2np

We have for i ∈ [n],

n2 n2

2 ⊤ 1 ⊤ ⊤
Σδ = σ2 1 − w a
i. i. + w i. A Aw i.
p2 (1 − np ) ii p2 (1 − np ) n n2
2 2
   
⊤ n ⊤
n wi. ai.  X
 q n wi. ak. 
= σ2  q 1− + σ2 (6.100)
.
p (1 − p )n n n
p (1 − p ) n
k=1
k̸=i

W A⊤
Let V = n − np In . Note that, for i ∈ [n],

wi. ai. ⊤ wi. ai. ⊤

p p
vii = − = −1 + 1− . (6.101)
n n n n
Hence from (6.101), we have
wi. ai. ⊤

n n 1 n
p vii = p −1 + p −1
p 1 − n/p p 1 − n/p n 1 − n/p p
wi. ai. ⊤

n 1 n
= p −1 − p 1−
p 1 − n/p n 1 − n/p p
⊤
r
n wi. ai. n
= p −1 − 1− . (6.102)
p 1 − n/p n p

We have, from (6.99),

!
n 2 2 1
P p vii ≥ −µ3 ≥1− 2
+ 2+ .
p 1 − n/p p n 2np

Therefore using (6.102), we have

!
wi. ai. ⊤
r
n n 2 2 1
P − 1 − 1 − ≥ −µ3 ≥ 1− + +
p2 n2 2np
p
p 1 − n/p n p
!
wi. ai. ⊤
r
n n 2 2 1
=⇒ P − 1 ≥ 1 − − µ3 ≥ 1− + + . (6.103)
p2 n2 2np
p
p 1 − n/p n p

61

2 2 1
Using (6.103) in (6.100) yields the following inequality with probability at least 1 − p2
+ n2
+ 2np ,

2
  !2
n
n2 wi. a⊤ wi. a⊤

n X n
Σδii = σ2  q i.
−1  +σ 2 k.
p2 (1 − np )
p
p (1 − np ) n p 1 − n/p n
k=1,k̸=i
 2
2
wi. a⊤
r
n n
≥ σ2  q i.
− 1  ≥ σ2 1 − − µ3 .
p (1 − np ) n p

Therefore the lower bound on Σδii is as follows:

2 !
n2
r
n 2 2 1
P Σδ ≥ σ 2 1 − − µ3 ≥1− + + . (6.104)
p2 (1 − np ) ii p p2 n2 2np

We need to now derive an upper bound on Σδii . By the same argument as before, we have from (6.99)
!
n 2 2 1
P vii ≤ µ3 ≥ 1− + + ,
p2 n2 2np
p
p 1 − n/p
!
wi. ai. ⊤
r
n n 2 2 1
=⇒ P − 1 − 1 − ≤ µ3 ≥ 1− + + ,
p2 n2 2np
p
p 1 − n/p n p
!
wi. ai. ⊤
r
n n 2 2 1
=⇒ P − 1 ≤ 1 − + µ3 ≥ 1− + + .
p2 n2 2np
p
p 1 − n/p n p

wi. a⊤
Again for i ∈ [n], k ∈ [n], k ̸= i, vij = n .
We have from (6.99),
k.

!
wi. a⊤

n k. 2 2 1
P ≤ µ3 ≥ 1 − + + .
p2 n2 2np
p
p 1 − n/p n

Now from (6.100), we have

2
  !2
n
n2 wi. a⊤ wi. a⊤

2 n X n
Σδii = σ i.
− 1  + σ2 k.
p2 (1 − np )
q p
p (1 − np ) n p 1 − n/p n
k=1,k̸=i
r 2
2 n
≤ σ 1 − + µ3 + σ 2 (n − 1)µ23 .
p

The last inequality holds with probability at least 1 − 2 p22 + 2
n2
+ 1
2np . Hence, we have

2 !
n2
r
n 2 2 1
P Σδ ≤ σ 2 2 2
1 − + µ3 + σ (n − 1)µ3 ≥ 1 − 2 + + . (6.105)
p2 (1 − np ) ii p p2 n2 2np

Using (6.105) with (6.104), we obtain the following using the union bound, for all i ∈ [n],
2 2 !
2
r r
2 n n 2 n 2 2 2 2 1
P σ 1 − − µ3 ≤ 2 Σδ ≤ σ 1 − + µ3 + σ (n − 1)µ3 ≥ 1−3 + + .
p p (1 − np ) ii p p2 n2 2np
q 2
Therefore, under the assumption n log n is o(p), we have, (n−1)µ23 = (n−1) logp n → 0 and 1− n
p + µ3 →
n2 P
1. Hence, we have p2 (1− n
Σ →
) δii
σ 2 . This completes the proof. ■
p

62
Proof of Theorem 8
Let W be the output of Alg. 3. Using the definition of β̂W from (3.14) and the measurement model
from (3.5), we have

∗ 1 ⊤ 1 ⊤ 1
β̂W − β = W η + Ip − W A (β̂λ1 − β ∗ ) + W ⊤ δ ∗ − δ̂λ2 . (6.106)
n n n
√
Using Results (1) and (2) of Theorem 6, the second and third term on the RHS of (6.106) are oP (1/ n).
2
Recall that Σβ = σn W ⊤ W . Therefore, we have
√ √1 w ⊤ η
n(β̂W j − βj∗ ) n .j
q
p = p + oP 1/ Σβjj , (6.107)
Σβjj Σβjj

where w.j denotes the j th column of matrix W . As η is Gaussian, the first term on the RHS of
(6.107) is a Gaussian random variable with mean 0 and variance 1. Using Result (1) of Theorem 7,
Σβjj converges to σ 2 in probability. This completes the proof of Result (1).
Using (3.15) and the measurement model (3.5), we have

∗ 1 ⊤ 1 ⊤ 1
A(β ∗ − β̂λ1 ) − W A⊤ δ ∗ − δ̂λ2 . (6.108)

δ̂W − δ = In − W A η + In − W A
n n n

Using
√Results(3) and (4) of Theorem 6, the second and third term on the RHS of (6.62) are both
p 1−n/p 2 I − 1 W A⊤ 1 ⊤ ⊤

oP n . Recall from (3.38), that Σδ = σ n n I n − n W A . Therefore, we
have

δ̂W i − δi∗
⊤ !
In − n1 W A⊤
p
i.
η 1 p 1 − n/p
p = p + oP p . (6.109)
Σδii Σδii Σδii n

As η is Gaussian, the first term on the RHS of (6.109) is a Gaussian random variable with mean 0
n2
and variance 1. Using Result (2) of Theorem 7, p2 (1−n/p) Σδii converges to σ 2 in probability so that
√
√ 1 p 1−n/p
n = 1. This completes the proof of result (2). ■
Σ δii

Lemma 9 Let A be a n × p Rademacher matrix and W be the corresponding output of Alg. 3. If n

is o(p), we have the following results:

(1) P w.j ⊤ w /n ≤ 1 ∀j ∈ [p] = 1.
.j
q
1 ⊤ log(p)
(2) nW A − Ip ∞
= OP n .

(3) √1 W ⊤ = O (1).
n ∞
q
1 1 ⊤
log(pn) 1
(4) p nW A − In A = OP pn + n .
∞
q
1 ⊤ n log(np)
(5) pW A ∞ = OP p +1 .

q
W A⊤ log(n)
(6) qn
n − np In = OP √ 1
p .
p 1− n
p ∞ 1−n/p

63
Proof of Lemma. 9:
In order to prove these results, we will first show that the intersection
event of the 4 constraints of
2 2 1
Alg. 3 is non-null with probability at least 1 − p2 + n2 + 2np as the solution W = A is in the
feasible set. This will show that the feasible region of the optimisation problem given in Alg. 3 is
non-empty. Let us first define the following sets: n o
G1 (n, p) = A ∈ Rn×p : |A⊤ A/n − Ip |∞ ≤ µ1 , G2 (n, p) = A ∈ Rn×p : p1 (In − AA⊤ /n)A ≤ µ2 ,

( )
⊤
p n×p : a⊤ a /n ≤
G3 (n, p) = A ∈ Rn×p : qn n AA n − n In ∞ ≤ µ3 , G4 (n, p) = {A ∈ R .j .j
p 1− p
q q q
1 ∀ j ∈ [p]} , where, µ1 = 2 2 log(p)
n , µ 2 = 2 log(2np)
np + 1
n and µ 3 = √ 2 2 log(n)
p . Note that,
1−n/p
here A is a n × p Rademacher matrix. We will now state the probabilities of the aforementioned sets.
From (6.135) of Lemma 10, we have
2
P (A ∈ G1 (n, p)) ≥ 1 − (6.110)
p2

Again from (6.141) of Lemma 10, we have

1
P (A ∈ G2 (n, p)) ≥ 1 − (6.111)
2np

Similarly from (6.143) of Lemma 10, we have

2
P (A ∈ G3 (n, p)) ≥ 1 − (6.112)
n2
Q
p p
Lastly, since A is Rademacher, P (a⊤ ⊤
.j a.j /n ≤ 1 ∀ j ∈ [p]) = P ∩j=1 a.j a.j /n ≤ 1 =
⊤
j=1 P (a.j a.j /n ≤
1) = 1p = 1. Therefore, we have,
P (A ∈ G4 (n, p)) = 1 (6.113)
4
A ∈ {∩4k=1 Gk (n, p)}c = P A ∈ {∪4k=1 (Gk (n, p))c } ≤ c
P
Note
that, P k=1 P (A ∈ (Gk (n, p)) ) ≤
2
+ n22 + 2np
1
. Therefore, we obtain, P A ∈ {∩4k=1 Gk (n, p)}c = 1 − P A ∈ {∩4k=1 Gk (n, p)} ≤

p2

2 2 1
p2
+ n2 + 2np . Hence,

2 2 1
{∩4k=1 Gk (n, p)}

P A∈ ≥1− 2
+ 2+ . (6.114)
p n 2np

Therefore, A satisfies the constraints of Alg.3 with high probability. This implies that there exists
W ∗ that satisfies the constraints. Let

(
1
E(n, p) = A : ∃W ∗ s.t. |W ∗ ⊤ A/n − Ip |∞ ≤ µ1 , (In − W ∗ A⊤ /n)A ≤ µ2 ,
p
)
W ∗ A⊤

n p ∗⊤ ∗
q − In ≤ µ3 , w .j w .j /n ≤ 1 ∀ j ∈ [p] .
p 1− n n n ∞
p

Hence, we have

2 2 1
P (A ∈ E(n, p)) ≥ P (A ∈ ∩4k=1 Gk (n, p)) ≥1− 2
+ 2+ (6.115)
p n 2np

Given that the set of feasible solutions is non-null, we can say that the optimal solution of Alg. 3
denoted by W satisfies the constraints of Alg. 3 with probability 1.

64
Result (1): Recall that the event that there exists a point satisfying constraints C0–C3 is E(n, p).
We have

⊤ ⊤
P w.j w.j /n ≤ 1 ∀j ∈ [p] = P w.j w.j /n ≤ 1 ∀j ∈ [p] A ∈ E(n, p) P (A ∈ E(n, p))

⊤ c c
+ P w.j w.j /n ≤ 1 ∀j ∈ [p] A ∈ E(n, p) P (A ∈ E(n, p)(6.116) )

If there exists a feasible solution to C0–C3 then w⊤.j w .j /n ≤ 1. Therefore, we have

⊤
P w.j w.j /n ≤ 1 ∀j ∈ [p] A ∈ E(n, p) = 1. (6.117)

Now, we have from Alg. 3, if the constraints of the optimisation problem are not satisfied, then we
choose W = A as the output. This event is given by A ∈ E(n, p)c . Now, we know that for Rademacher
matrix A, a⊤
.j a.j /n = 1 with probability 1. Therefore, we have

⊤ c
P w.j w.j /n ≤ 1 ∀j ∈ [p] A ∈ E(n, p) = 1. (6.118)

Therefore, we have from (6.117),(6.118) and (6.116),

⊤
P w.j w.j /n ≤ 1 ∀j ∈ [p] = P (A ∈ E(n, p)) + P (A ∈ E(n, p)c ) = 1. (6.119)

q
Result (2): Recall that µ1 = 2 2 log(p) n . Note that we have for any two events F1 , F2 , P (F1 ) =
P (F1 ∩ F2 ) + P (F1 ∩ F2c ) ≤ P (F1 ∩ F2 ) + P (F2c ). Therefore, we have,

P |W ⊤ A/n − Ip |∞ ≥ µ1 ≤ P {|W ⊤ A/n − Ip |∞ ≥ µ1 } ∩ E(n, p) + P ({E(n, p)}c )
2 2 1

⊤
≤ P {|W A/n − Ip |∞ ≥ µ1 } ∩ E(n, p) + + +
p2 n2 2np
The last inequality comes from (6.115). Since W is a feasible solution, given A ∈ E(n, p), it will
satisfy the second constraint of Alg. 3 with probability 1. This means that

P {|W ⊤ A/n − Ip |∞ ≥ µ1 } ∩ E(n, p) = 0.

Therefore we have,

⊤
2 2 1
P |W A/n − Ip |∞ ≤ µ1 ≥ 1 − 2
+ 2+ . (6.120)
p n 2np
q
Since µ1 = 2 2 log(p) , we have, |W ⊤ A/n − Ip |∞ = OP ( log(p)/n).
p
n √
Result (3): From (6.119), we have that, for each j ∈ [p], ∥w.j ∥2 ≤ n with probability 1. Note that
√
for any vector x, ∥x∥∞ ≤ ∥x∥2 , we have for every j ∈ [p], ∥w.j ∥∞ ≤ n with probability 1. Since
√
|W ⊤ |∞ ≤ max∥w.j ∥∞ ≤ n with probability 1. Therefore, we have
j∈[p]

1
√ W⊤ = O(1). (6.121)
n ∞
q
1 log(2np)
Result (4): Recall that µ2 = n +2 np . Therefore, we have

1 1 ⊤ 1 1 ⊤
P In − W A A ≥ µ2 ≤ P In − WA A ≥ µ2 ∩ E(n, p) + P ({E(n, p)}c )
p n ∞ p n ∞

1 1 2 2 1
≤ P In − W A⊤ A ≥ µ2 ∩ E(n, p) + + +
p n ∞ p2 n2 2np

65
The last inequality comes from (6.115). Note that, since W is a feasible solution, given A ∈ E(n, p),
it will satisfy the third constraint of Alg. 3 with probability 1. This implies

1 1 ⊤
P In − W A A ≥ µ2 ∩ E(n, p) = 0.
p n ∞

Therefore, we have,

1 1 ⊤ 2 2 1
P In − W A A ≤ µ2 ≥ 1 − + + (6.122)
p n ∞ p2 n2 2np
q
1 1 ⊤
log(pn)
Hence, we have, p n W A − In A = OP
∞ pn + 1/n .

2 2 1
Result (5): Recall that from Eqn.(6.115), we have with probability at least 1 − p2
+ n2
+ 2np ,

1
(W A⊤ /n − In )A ≤ µ2 .
p ∞

2 2 1
Applying triangle inequality, we have with probability atleast 1 − p2
+ n2
+ 2np ,

1 1 1
W A⊤ A ≤ (W A⊤ /n − In )A + |A|∞ ≤ µ2 + 1/p. (6.123)
np ∞ p ∞ p
We now present some useful results about the norms being used in this proof. Let X be a p × p matrix
and Y be a p × n matrix . Recall the following definitions from [50],
∥Y x∥∞ ∥Y x∥2
∥Y ∥∞→∞ ≜ max and ∥Y ∥2→2 ≜ max = σmax (Y ).
x∈Rn \{0} ∥x∥∞ x∈Rn \{0} ∥x∥2

Note that |XY |∞ ≤ ∥X∥∞→∞ |Y |∞ 2 . Since √1p ∥x∥2 ≤ ∥x∥∞ and ∥Y ⊤ x∥∞ ≤ ∥Y ⊤ x∥2 for all x ∈ Rp
, we have
√
∥Y ⊤ ∥∞→∞ ≤ p∥Y ⊤ ∥2→2 . (6.124)
Then by using (6.124), we have
√ √
|XY |∞ = |Y ⊤ X ⊤ |∞ ≤ ∥Y ⊤ ∥∞→∞ |X ⊤ |∞ ≤ p∥Y ⊤ ∥2→2 |X ⊤ |∞ = pσmax (Y )|X|∞ . (6.125)

Substituting X = np 1
W A⊤ A and Y = A† ≜ A⊤ (AA⊤ )−1 , the Moore-Penrose pseudo-inverse of A,
in (6.125), we obtain:
√
1 ⊤ 1 ⊤ † p
WA = W A AA ≤ |W A⊤ A|∞ ∥A† ∥2→2 . (6.126)
np ∞ np ∞ np
We now derive the upper bound for the second factor of (6.126). We have,
1 1
∥A† ∥2→2 = σmax (A† ) = = . (6.127)
σmin (A) σmin (A⊤ )
Note that, for an arbitrary ϵ1 > 0, using Theorem 1.1. from [41] for the mean-zero sub-Gaussian
random matrix A, we have the following
√ √
P (σmin (A⊤ ) ≤ ϵ1 ( p − n − 1)) ≤ (c6 ϵ1 )p−n+1 + (c5 )p . (6.128)

where c6 > 0 and c5 ∈ (0, 1) are constants dependent on the sub-Gaussian norm of the entries of A⊤ .
Let for some small constant ψ ∈ (0, 1) ϵ1 c6 ≜ ψ, we have
!
† c6
P σmax (A ) ≤ √ √ ≥ 1 − ((ψ)p−n+1 + (c5 )p ). (6.129)
ψ( p − n − 1)
2
This is because |XY |∞ = maxi ∥XY.i ∥∞ ≤ maxi ∥X∥∞→∞ ∥Y.i ∥∞ (by the definition of the induced norm) =
∥X∥∞→∞ maxi ∥Y.i ∥∞ = ∥X∥∞→∞ |Y |∞ .

66
we have
 
c6
P σmax (A† ) ≤ √ √  ≥ 1 − ((ψ)p−n+1 + (c5 )p ). (6.130)
n−1
ψ p 1− √
p

Using Eqns. (6.130) and (6.123), we have

" #
√ 1 ⊤ † 1 c6 2 2 1 p−n+1 p
P p |W A A|∞ |A |2→2 ≤ µ2 + ≥1− + + + (1/2) + (c5 ) .
p2 n2 2np
p
np p ψ(1 − n/p)

Therefore we have by Bonferroni’s inequality,

!
1 1 c6 2 2 1
P W A⊤ ≤ µ2 + p ≥1− 2
+ 2+ + (ψ) p−n+1 p
+ (c5 ) .
np ∞ p ψ(1 − n/p) p n 2np

Under the condition n = o(p) as n, p → ∞, the probability 1 − p22 + n22 + 2np
1
+ (ψ)p−n+1 + (c5 )p →
1. Therefore, we have,
s !! s ! s !
1 log(np) 1 1 n log(np) n n log(np)
W A⊤ = OP n + + = OP +1+ = OP +1 .
p ∞ np n p p p p
(6.131)
q
2 log(n)
Result (6): Recall that µ3 = √ 2 p . We have,
1−n/p
 
W A⊤

n p
P q − In ≥ µ3 
p 1 − np n n ∞
  
W A⊤

 n p 
≤ P q − In ≥ µ3 ∩ E(n, p) + P ({E(n, p)}c )
p 1 − n n n ∞ 
p
  
W A⊤

 n p  2 2 1
≤ P q − In ≥ µ3 ∩ E(n, p) + 2
+ 2+
p 1 − n n n ∞  p n 2np
p

The last inequality comes from (6.115). Note that, since W is a feasible solution, given A ∈ E3 (n, p),
it will satisfy the fourth constraint of Alg. 3 with probability 1. This implies that:
  
⊤

 n WA p 
P q − In ≥ µ3 ∩ E(n, p) = 0,
p 1 − n n n ∞ 
p

which yields:
 
W A⊤

n p 2 2 1
P q − In ≤ µ3  ≥ 1 − 2
+ 2+ . (6.132)
p 1− n n n ∞ p n 2np
p

2 2 1
Since, 1 − p2
+ n2
+ 2np goes to 0 as n, p → ∞, the proof is completed. ■

6.2.3 Lemma on properties of A

Lemma 10 Let A be a n × p random Rademacher matrix. Then the following holds:

67
q
1 ⊤ log(p)
1. nA A − Ip ∞
= OP n .
q
1 1 ⊤
log(pn) 1
2. p n AA − In A = OP pn + .
n
∞

q
qn AA⊤ p √ 1 log(n)
3. If n < p, then n − n In = OP p .
p 1− n
p ∞ 1−n/p

Proof of Lemma 10, [Result (1)]: Let V ≜ A⊤ A/n − Ip . Note that elements of V matrix satisfies
the following:
(akl )2
(P
n
k=1 − 1 = 0 if j = l, l ∈ [p]
vlj = Pn aklnakj (6.133)
k=1 n if j ̸= l, j, l ∈ [p]
Therefore, we now consider off-diagonal elements of V (i.e., l ̸= j) for the bound. Each summand
a a of
a a
vlj is uniformly bounded − n1 ≤ kln kj ≤ n1 since the elements of A are ±1. Note that E kln kj = 0
∀k ∈ [n], l ̸= j ∈ [p]. Furthermore, for l ̸= j ∈ [p], each of the summands of vlj are independent since
the elements of A are independent. Therefore, using Hoeffding’s Inequality (see Lemma 1 of [38]) for
t > 0,
n
!
X akl akj nt2
P (|vlj | ≥ t) = P ≥ t ≤ 2e− 2 , l ̸= j ∈ [p].
n
k=1

Therefore we have
X nt2 nt2
P (|vlj | ≥ t) ≤ 2p(p − 1)e− 2 < 2p2 e−(6.134)

P max |vlj | ≥ t = P ∪l̸=j∈[n] |vlj | ≥ t ≤ 2 .
l̸=j∈[n]
l̸=j∈[n]
q
Putting t = 2 2 log
n
p
in (6.134), we obtain:
r !
2 log p
P max |vlj | ≥ 2 ≤ 2p2 e−4 log(p) = 2p−2 . (6.135)
l̸=j∈[n] n

Thus, we have: r !
log p
|V |∞ = OP . (6.136)
n
This completes the proof of Result (1).
Result (2): Note that,

1 1 1 1
AA⊤ − In A = AA⊤ A − A ≜ V . (6.137)
p n p n

Fix i ∈ [n] and j ∈ [p], and consider

p n
aij 1 XX
vij = − + ail akl akj
p np
l=1 k=1

By splitting the sum over l into the terms where l ̸= j and the term where l = j, and simplifying by

68
using the fact that a2kj = 1 for all k, j, we obtain
 
p X
n n
!
aij 1  X 1 X
aij a2kj

vij = − +  ail akl akj 
 + np
p np 
l=1 k=1 k=1
l̸=j
 
p X
n n
!
aij 1  X  aij 1 X
=− +  ail akl akj 
+ p n 1
p np 
l=1 k=1 k=1
l̸=j
 
p X
n
1  X 
=  ail akl akj 
.
np 
l=1 k=1
l̸=j

Next we split the sum over k into the terms where k ̸= i and the term where k = i to obtain
     
p X
n p X
n p p X
n
1  X  1  X  1 X 2 1  X  p−1
vij =  a il akl a kj
 =  a il akl a kj
+ a il a ij =  a il akl a kj
+
 np aij .
np   np   np np 
l=1 k=1 l=1 k=1 l=1 l=1 k=1
l̸=j l̸=j k̸=i l̸=j l̸=j k̸=i
(6.138)
1
If we condition on ai. and a.j , the (n − 1)(p − 1) random variables np ail akl akj for k ∈ [n] \ {i} and
1 1
l ∈ [p] \ {j} are independent, have mean zero, and are bounded between − np and np . Therefore, by
Hoeffding’s inequality, for t > 0, we have
 
p X
n
 1 X n2 p2 t2
− 2(n−1)(p−1) 2
≤ 2e−npt /2 .

P
 np ail akl akj ≥ t ai. , a.j 
 ≤ 2e (6.139)
l=1 k=1
l̸=j k̸=i

Since the RHS of (6.139) is independent of ai, and a.j the bound also holds on the unconditional
probability, i.e., we have
 
p X
n
 1 X n2 p2 t2
− 2(n−1)(p−1) 2
≤ 2e−npt /2 .

P
 np a a a
il kl kj ≥ t 
 ≤ 2e (6.140)
l=1 k=1
l̸=j k̸=i

q
log(2np)
Since aij is Rademacher, p−1np |a ij | < 1
n with probability 1. Choosing t = 2 np and using the
triangle inequality, we have from (6.140),
 
s s
p Xn
!
1 log(2np)  1 X p−1 log(2np) 1 
P |vij | ≥ + 2 ≤P ail akl akj + |aij | ≥ 2 + 
n np  np np np n
l=1 k=1
l̸=j k̸=i
 
s
p X
n
 1 X log(2np) 
≤P ail akl akj ≥ 2 ≤ 1 .
 np np  2n2 p2
l=1 k=1
l̸=j k̸=i

Then, by a union bound over np events,

s !
1 log(2np) 1 1
P max |vij | ≥ + 2 ≤ np 2 2
= . (6.141)
i,j n np 2n p 2np

69
This completes the proof of Result (2).
Result (3): Reversing the roles of n and p in result (1), (6.133) and (6.135) of this lemma, we have
s !
AA⊤ 2 log(n) 2
P − In ≤ 2 ≥ 1 − 2. (6.142)
p ∞ p n

1
Now, multiplying by √ , we get
1−n/p
 s 
AA⊤

n p 2 2 log(n)  2
P q − In ≤p ≥ 1 − 2. (6.143)
p 1− n n n ∞ 1 − n/p p n
p

This completes the proof. ■

6.2.4 Proofs of Simultaneous Confidence Intervals

Proof of Theorem 9:
We will first prove the joint distribution of β ∗ given in (3.39). Hence, we now expand (3.39) using the
structure given in (3.35) as follows:
( )⊤ ( )
√ √ 1 1
{ n(β̂W − β ∗ )K }⊤ ΣβK −1 { n(β̂W − β ∗ )K } = √ WK ⊤ η ΣβK −1 √ WK ⊤ η
n n
( ) ⊤ ( )
√ √

1 ⊤ 1
+ n Ip − W A (β ∗ − β̂λ1 ) ΣβK −1 n Ip − W ⊤ A (β ∗ − β̂λ1 )
n K n K

⊤
( ) ( )
1 ⊤

∗ −1 1 ⊤

∗

+ √ WK δ − δ̂λ2 ΣβK √ WK δ − δ̂λ2
n n
( )⊤ ( )
−1 √

1 ⊤ 1 ⊤ ∗
+ 2 √ WK η ΣβK n Ip − W A (β − β̂λ1 )
n n K
( )⊤ ( )
1 1
+ 2 √ WK ⊤ η ΣβK −1 √ WK ⊤ δ ∗ − δ̂λ2
n n
( )⊤ ( )
√

1 ⊤ 1
+ 2 n Ip − W A (β ∗ − β̂λ1 ) ΣβK −1 √ WK ⊤ δ ∗ − δ̂λ2 (6.144)
n K n

In Lemma 4, we will show that ΣβK is positive definite, hence ΣβK −1 exists. Next in Lemma 2, we
will show that the last 5 terms (all except the first term on the RHS) of (6.144) goes to 0 in probability.
( )⊤ (
Lastly, we see that the first term on the RHS of (6.144) can be written as ΣβK −1/2 √1n WK ⊤ η ΣβK −1/2 √1n WK

Since, η is Gaussian with mean 0 and variance covariance matrix σ 2 In , by linearity of Gaussian, we
( ) ( )⊤ ( )
have, ΣβK −1/2 √1n WK ⊤ η ∼ Nk (0, Ik ). Hence, ΣβK −1/2 √1n WK ⊤ η ΣβK −1/2 √1n WK ⊤ η ∼

χ2k .
Therefore, by applying Slutsky’s Theorem on (6.144), we have,
√ √ D
{ n(β̂W − β ∗ )K }⊤ ΣβK −1 { n(β̂W − β ∗ )K } → χ2k . (6.145)

This completes the proof of (3.39). Now, we prove the joint distribution given in (3.40) with the same

70
approach. Let us first expand (3.40) using the structure given in (3.36). We have,
( )⊤ ( )
∗ ⊤ −1 ∗ 1 ⊤ −1 1 ⊤
{(δ̂W − δ )L } ΣδL {(δ̂W − δ )L } = In − W A η ΣδL In − W A η
n L n L
( )⊤ ( )
1 ⊤ ∗ −1 1 ⊤ ∗
+ In − W A A(β − β̂λ1 ) ΣδL In − W A A(β − β̂λ1 )
n L n L

⊤
( ) ( )
1 1
+ WL A⊤ δ ∗ − δ̂λ2 ΣδL −1 WL A⊤ δ ∗ − δ̂λ2
n n
( )⊤ ( )
1 1
+ 2 In − W A⊤ η ΣδL −1 In − W A⊤ A(β ∗ − β̂λ1 )
n L n L
( )⊤ ( )
1 1
− 2 In − W A⊤ η ΣδL −1 WL A⊤ δ ∗ − δ̂λ2
n L n
( )⊤ ( )
1 ⊤ ∗ −1 1 ⊤

∗

− 2 In − W A A(β − β̂λ1 ) ΣδL WL A δ − δ̂λ2 (6.146)
n L n

In Lemma 4, we will show that ΣδL is positive definite, hence ΣδL −1 exists. Next in Lemma 3, we will
show that the last 5 terms (all except the first term on the RHS) of (6.146) goes to 0 in probability.
Lastly, we see that the first term on the RHS of (6.146) can be written as
( )⊤ ( )
−1/2 1 ⊤ −1/2 1 ⊤
ΣδL In − W A η ΣδL In − W A η .
n L n L

Since,(η is Gaussian with mean 0 and 2

) variance covariance matrix σ In , by linearity of Gaussian, we
have, ΣδL −1/2 In − n1 W A⊤ L η ∼ Nk (0, Il ). Hence,

( )⊤ ( )
−1/2 1 ⊤ −1/2 1 ⊤
ΣδL In − W A η ΣδL In − W A η ∼ χ2l .
n L n L

Therefore, by applying Slutsky’s Theorem on (6.146), we have,

D
{(δ̂W − δ ∗ )L }⊤ ΣδL −1 {(δ̂W − δ ∗ )L } → χ2l . (6.147)
This completes the proof of (3.40). ■

Proof of Lemma2:
Result 1.: We will expand the quadratic form and utilize individual probabilistic rates derived in
Chapter 3 to prove the claim. We have
( )⊤ ( )
√ √

1 ⊤ 1
n Ip − W A (β ∗ − β̂λ1 ) ΣβK −1 n Ip − W ⊤ A (β ∗ − β̂λ1 )
n K n K
( )( )
√ √

XX 1 1
= [ΣβK −1 ]lj n Ip − W ⊤ A (β ∗ − β̂λ1 ) n Ip − W ⊤ A (β ∗ − β̂λ1 )
n j n l
j∈K l∈K
( ) ( )
√ √

XX 1 1
≤ [ΣβK −1 ]lj n Ip − W ⊤ A (β ∗ − β̂λ1 ) n Ip − W ⊤ A (β ∗ − β̂λ1 )
n j n l
j∈K l∈K
2 XX
√

1
≤ n Ip − W ⊤ A (β ∗ − β̂λ1 ) [ΣβK −1 ]lj (6.148)
n ∞ j∈K l∈K

71
[ΣβK −1 ]lj = OP (1). Now, we have from (23) of Theorem 3 of
P P
We have from Lemma 4 j∈K l∈K
√
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 )

Chapter 3, under the given conditions = oP (1). Since k is fixed
∞
as n, p → ∞, the proof is complete.
Result 2.: We follow the same process as in the proof of Result 1. We have
⊤
( ) ( )
1 ⊤

∗ −1 1 ⊤

∗

√ WK δ − δ̂λ2 ΣβK √ WK δ − δ̂λ2
n n
( )( )
XX
−11 ⊤

∗
1 ⊤

∗

= [ΣβK ]lj √ W j δ − δ̂λ2 √ W l δ − δ̂λ2
n n
j∈K l∈K
( ) ( )
XX
−1 1 ⊤

∗
1 ⊤

∗

≤ [ΣβK ]lj √ W j δ − δ̂λ2 √ W l δ − δ̂λ2
n n
j∈K l∈K
1 XX
≤ √ W ⊤ δ ∗ − δ̂λ2 [ΣβK −1 ]lj . (6.149)
n ∞ j∈K l∈K

[ΣβK −1 ]lj = OP (1). Now, we have from (24) of Theorem 3 of

P P
We have from Lemma 4 j∈K l∈K
Chapter 3, under the given conditions √1n W ⊤ δ ∗ − δ̂λ2 = oP (1). Since k is fixed as n, p → ∞,
∞
the proof is complete. ( )
Result 3.: Recall that Z ≜ ΣβK −1/2 √1n WK ⊤ η ∼ Nk (0, Ik ). We have

( )⊤ ( )
−1 √

1 ⊤ 1 ⊤ ∗
2 √ WK η ΣβK n Ip − W A (β − β̂λ1 )
n n K
( )
−1/2 √

⊤ 1 ⊤ ∗
= Z ΣβK n Ip − W A (β − β̂λ1 )
n K
√

XX 1 ⊤
≤ [ΣβK ]lj |Zj | n Ip − W A (β ∗ − β̂λ1 )
−1
n l
j∈K l∈K
√

1 XX
≤ n Ip − W ⊤ A (β ∗ − β̂λ1 ) [ΣβK −1 ]lj |Zj | . (6.150)
n ∞ j∈K l∈K
√
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 )

We have from (23) of Theorem 3 of Chapter 3, under the given conditions =
∞
oP (1). Since k is fixed and we have from Lemma 4 j∈K l∈K [ΣβK −1 ]lj = OP (1) as n, p → ∞,
P P
the proof is complete.
Result 4.: By similar arguments as in (6.150), we have,
( )⊤ ( )
1 1
2 √ WK ⊤ η ΣβK −1 √ WK ⊤ δ ∗ − δ̂λ2
n n
( )
1
= Z ⊤ ΣβK −1/2 √ WK ⊤ δ ∗ − δ̂λ2
n
XX 1
≤ [ΣβK −1 ]lj |Zj | √ W ⊤ δ ∗
− δ̂ λ2
n l
j∈K l∈K
1 XX
≤ √ W ⊤ δ ∗ − δ̂λ2 [ΣβK −1 ]lj |Zj | . (6.151)
n ∞ j∈K l∈K

We have from (24) of Theorem 3 of Chapter 3, under the given conditions √1n W ⊤ δ ∗ − δ̂λ2 =
∞
oP (1). Since k is fixed and we have from Lemma 4 j∈K l∈K [ΣβK −1 ]lj = OP (1) as n, p → ∞,
P P

72
the proof is complete.
Result 5.: Expanding the quadratic form, we have
( )⊤ ( )
√

1 ⊤ 1
2 n Ip − W A (β ∗ − β̂λ1 ) ΣβK −1 √ WK ⊤ δ ∗ − δ̂λ2
n K n

XX 1 √ 1 ⊤

−1 ⊤ ∗
≤ [ΣβK ]lj √ W l δ − δ̂λ2 n Ip − W A (β ∗ − β̂λ1 )
n n j
j∈K l∈K
√

1 1 X X
≤ √ W ⊤ δ ∗ − δ̂λ2 n Ip − W ⊤ A (β ∗ − β̂λ1 ) [ΣβK −1 ]lj .
n ∞ n ∞ j∈K l∈K

We have from (23),(24) of Theorem 3 of Chapter 3, under the given conditions √1n W ⊤ δ ∗ − δ̂λ2 =
∞
√
n Ip − n1 W ⊤ A (β ∗ − β̂λ1 )

oP (1) and = oP (1). Since k is fixed and we have from Lemma 4
∞
P P −1
j∈K l∈K [ΣβK ]lj = OP (1) as n, p → ∞, the proof is complete.

Proof of Lemma3:
Result 1.: We will expand the quadratic form and utilize individual probabilistic rates derived in
Chapter 3 to prove the claim. We have
( )⊤ ( )
1 1
In − W A⊤ A(β ∗ − β̂λ1 ) ΣδL −1 In − W A⊤ A(β ∗ − β̂λ1 )
n L n L
( )( )
XX 1 1
= [ΣδL −1 ]ik In − W A⊤ A(β ∗ − β̂λ1 ) In − W A⊤ A(β ∗ − β̂λ1 )
n i n k
i∈L k∈L
( ) ( )
XX
−1 1 ⊤ ∗ 1 ⊤ ∗
≤ [ΣδL ]ik In − W A A(β − β̂λ1 ) In − W A A(β − β̂λ1 )
n i n k
i∈L k∈L
2
n 1 1 XX
≤ p In − W A⊤ A(β ∗ − β̂λ1 ) n2
[ΣδL −1 ]ik . (6.152)
p 1 − n/p n
∞ p2 (1−n/p) i∈L k∈L

1
[ΣδL −1 ]ik = OP (1). Now, we have from (25) of Theo-
P P
We have from Lemma4, n2 i∈L k∈L
p2 (1−n/p)

rem 3 of Chapter 3, under the given conditions In − n1 W A⊤ A(β ∗ − β̂λ1 )

= oP (1). Since k is
∞
fixed as n, p → ∞, the proof is complete.
Result 2.: We follow the same process as in the proof of Result 1. We have

⊤
( ) ( )
1 1
WL A⊤ δ ∗ − δ̂λ2 ΣδL −1 WL A⊤ δ ∗ − δ̂λ2
n n
( ) ( )
XX 1 1
≤ [ΣδL −1 ]ik Wi A⊤ δ ∗ − δ̂λ2 Wk A⊤ δ ∗ − δ̂λ2
n n
i∈L k∈L
2
n 1 1 XX
≤ p W A⊤ δ ∗ − δ̂λ2 n2
[ΣδL −1 ]ik (6.153)
p n/p n ∞ p2 (1−n/p) i∈L k∈L

⊤ ∗ − δ̂
We have from (26) of Theorem 3 of Chapter 3, under the given conditions √ n 1
W A δ λ2 =
p 1−n/p n
∞
1 P P −1
oP (1). Since k is fixed and we have from Lemma4, n2 i∈L k∈L [ΣδL ]ik = OP (1) as
p2 (1−n/p)

73
n, p → ∞, the proof is complete.
( )
ΣδL −1/2 In − n1 W A⊤ L η

Result 3.: Recall that Z = ∼ Nk (0, Il ). We have

( )⊤ ( )
1 1
2 In − W A⊤ η ΣδL −1
In − W A⊤ ∗
A(β − β̂λ1 )
n L n L
( )
⊤ −1 1 ⊤
= Z ΣδL In − W A A(β ∗ − β̂λ1 )
n L

XX
−1 1 ⊤
≤ [ΣδL ]ik |Zi | In − W A A(β ∗ − β̂λ1 )
n k
i∈L k∈L

1 X X
≤ In − W A⊤ A(β ∗ − β̂λ1 ) [ΣδL −1 ]ik |Zj |
n ∞ i∈L k∈L

n 1 ⊤ 1 XX
≤ p In − W A A(β ∗ − β̂λ1 ) 3 n 2 [ΣδL −1 ]ik . (6.154)
p 1 − n/p n 2
∞ p (1−n/p) i∈L k∈L

We have from (25) of Theorem 3 of Chapter 3, under the given conditions √ n In − n1 W A⊤ A(β ∗ − β̂λ1 )

p 1−n/p
∞
1 P P −1
oP (1). Since k is fixed and we have from Lemma4, n2 i∈L k∈L [ΣδL ]ik = OP (1) as
p2 (1−n/p)
n, p → ∞, the proof is complete.
Result 4.: By similar arguments as in (6.154), we have,
( )⊤ ( )
1 ⊤ −1 1 ⊤

∗

2 In − W A η ΣδL WL A δ − δ̂λ2
n L n
( )
−1/2 1

⊤ ⊤ ∗
= Z ΣδL WL A δ − δ̂λ2
n
XX 1
≤ [ΣδL −1/2 ]ik |Zi | Wk A⊤ δ ∗ − δ̂λ2
n
i∈L k∈L
np 1 ⊤

∗
XX
≤ 1 − n/p W A δ − δ̂λ2 [ΣδL −1/2 ]ik |Zj |
p n ∞ i∈L k∈L

n 1 1 XX
≤ p W A⊤ δ ∗ − δ̂λ2 3 n2
[ΣδL −1 ]ik . (6.155)
p 1 − n/p n ∞ p2 (1−n/p) i∈L k∈L

We have from (26) of Theorem 3 of Chapter 3, under the given conditions np 1 − n/p n1 W A⊤ δ ∗ − δ̂λ2
p
=
∞
1 P P −1
oP (1). Since k is fixed and we have from Lemma4, n2 i∈L k∈L [ΣδL ]ik = OP (1) as
p2 (1−n/p)
n, p → ∞, the proof is complete.
Result 5.: Expanding the quadratic form, we have
( )⊤ ( )
1 ⊤ ∗ −1 1 ⊤

∗

2 In − W A A(β − β̂λ1 ) ΣδL WL A δ − δ̂λ2
n L n

XX
−1 1 ⊤ 1
≤ [ΣδL ]ik In − W A A(β ∗ − β̂λ1 ) Wk A⊤ δ ∗ − δ̂λ2
n i n
i∈L k∈L

n 1 n 1
≤ p In − W A⊤ A(β ∗ − β̂λ1 ) p W A⊤ δ ∗ − δ̂λ2
p 1 − n/p n p 1 − n/p n
∞ ∞
1 XX
n2 [ΣδL −1 ]ik .
p2 (1−n/p) i∈L k∈L

74
We have from (25),(26) of Theorem 3 of Chapter 3, under the given conditions

√n In − n1 W A⊤ A(β ∗ − β̂λ1 ) √n 1 ⊤ ∗ − δ̂

= oP (1) and W A δ λ2 = oP (1).
p 1−n/p p 1−n/p n
∞ ∞
1 P P −1
Since k is fixed and we have from Lemma4, n2 i∈L k∈L [ΣδL ]ik = OP (1) as n, p → ∞,
p2 (1−n/p)
the proof is complete.

Proof of Lemma4:
Result 1: To prove this, we will exploit the singular value bounds of the Rademacher matrix. First
we bound the sum by the maximal element as follows:
XX
[ΣβK −1 ]lj ≤ k 2 ΣβK −1 ∞ (6.156)
j∈K l∈K

Now we bound ΣβK −1 ∞

≤ ΣβK −1 ∞
using the following inequality,

ΣβK −1 ∞
≤ ∥ΣβK −1 ∥2 (6.157)

Note that by definition, we have,

1 1
∥ΣβK −1 ∥2 = σmax (ΣβK −1 ) = = copt 2 (6.158)
σmin (ΣβK )
n σmin (A

Note that, for an arbitrary ϵ1 > 0, using Theorem 1.1. from [41] for the mean-zero sub-Gaussian
random matrix A, we have the following

copt p
P √ σmin (AK ) ≤ copt ϵ1 (1 − (k − 1)/n) ≤ (c6 ϵ1 )n−k+1 + (c5 )p . (6.159)
n
where c6 > 0 and c5 ∈ (0, 1) are constants dependent on the sub-Gaussian norm of the entries of ΣβK .
Let for some small constant ψ ∈ (0, 1) ϵ1 c6 ≜ ψ. We have
!
−1 c26
P σmax (ΣβK ) ≤ 2 p ≥ 1 − ((ψ)n−k+1 + (c5 )p ). (6.160)
2
copt ψ (1 − (k − 1)/n) 2

Therefore, using (6.160), (6.158), (6.157) and (6.156) we get,

 
XX c2 k2
P [ΣβK −1 ]lj ≤ 6
p  ≥ 1 − ((ψ)n−k+1 + (c5 )p ). (6.161)
ψ 2 c2 (1 − (k − 1)/n)2
j∈K l∈K opt

This completes the proof of Result 1.

Result 2: We approach this proof the same way we approach the proof of Result 1. Using the same
results as (6.156), (6.158) and (6.157), we have,
1 XX l2
n2
[ΣδL −1 ]ik ≤ 2 (6.162)
n2
p2 (1−n/p) i∈L k∈L σ
p2 (1−n/p) min
W A⊤ /n − In L

For an arbitrary ϵ1 > 0, using Theorem 1.1. from [41] for the mean-zero sub-Gaussian random matrix
A, we have the following

copt ⊤
p
P √ σmin (A ) ≤ copt ϵ1 (1 − n/p) ≤ (c6 ϵ1 )n−k+1 + (c5 )p . (6.163)
n
where c6 > 0 and c5 ∈ (0, 1) are constants dependent on the sub-Gaussian norm of the entries of ΣδL .
Since W A⊤ /n = copt AA⊤ /n, we have,

P σmin W A⊤ /n ≤ copt ϵ21 ( p/n − 1)2 ≤ (c6 ϵ1 )n−k+1 + (c5 )p .
p
(6.164)

75
Further since σmin W A⊤ /n − In = σmin W A⊤ /n −1 and σmin W A⊤ /n − In L ≥ σmin W A⊤ /n − In ,

we have from (6.164)

P σmin W A⊤ /n − In
p
≤ copt ϵ21 ( p/n − 1)2 − 1 ≤ (c6 ϵ1 )n−k+1 + (c5 )p . (6.165)
L

Let for some small constant ψ ∈ (0, 1) ϵ1 c6 ≜ ψ. Then, we have from (6.162) and (6.165)
!
1 XX
−1 l2
P n2
[ΣδL ]ik ≤ n2
p ≥ 1 − {(c6 ϵ1 )n−k+1 + (c5 )p }.
2 2
(c ϵ ( p/n − 1) − 1) 2
p2 (1−n/p) i∈L k∈L p2 (1−n/p) opt 1
(6.166)
This completes the proof. ■

6.2.5 Some useful lemmas for Drlt

Lemma 11 Let U and V be two n × p random matrices. Let ϑ ∈ R and w ∈ Rn . Then,

1. |ϑU |∞ = |ϑ||U |∞ .

2. |U + V |∞ ≤ |U |∞ + |V |∞ .

3. If |U |∞ = OP (h1 (n, p)) and |V |∞ = OP (h2 (n, p)), then |U +V |∞ ≤ OP (max{h1 (n, p), h2 (n, p)}).

4. ∥w⊤ V ∥∞ ≤ |V |∞ ∥w∥1 .

5. If |V |∞ = OP (h1 (n, p)) and ∥w∥1 = OP (hw (n, p)), then ∥w⊤ V ∥∞ ≤ OP (h1 (n, p)hw (n, p)). ■

Proof:
Result (1): We have, |ϑU |∞ = max |ϑuij | = max |ϑ||uij | = |ϑ||U |∞ .
i∈[n],j∈[p] i∈[n],j∈[p]
Result (2): We have, |U + V |∞ = max |uij + vij | ≤ max {|uij | + |vij |} ≤ max |uij | +
i∈[n],j∈[p] i∈[n],j∈[p] i∈[n],j∈[p]
max |vij | = |U |∞ + |V |∞ .
i∈[n],j∈[p]
Result (3): Given |U |∞ = OP (h1 (n, p)) and |V |∞ = OP (h2 (n, p)). From Part (2), we have,

|U +V |∞ ≤ |U |∞ +|V |∞ ≤ OP (h1 (n, p))+OP (h1 (n, p)) = OP (h1 (n, p)+h2 (n, p)) ≤ OP (max{h1 (n, p), h2 (n, p)}).

Result (4): For any j ∈ [p], we have

n
X n
X n
X n
X
⊤
|(w V )j | = | vij wi | ≤ |vij ||wi | ≤ max |vij ||wi | ≤ max |vij | |wi | = |V |∞ ∥w∥1 .
j∈[p] i∈[n],j∈[p]
i=1 i=1 i=1 i=1

Result (5): We have from Part (4), ∥w⊤ V ∥∞ ≤ |V |∞ ∥w∥1 = OP (h1 (n, p))OP (hw (n, p)) = OP (h1 (n, p)hw (n, p)).
■

Lemma 12 Let Xi , for i = 1, 2, . . . , k, be Gaussian random variables with mean 0 and variance σ 2 .
Then, we have
p
P max |Xi | ≥ 2σ log(k) ≤ 1/k. (6.167)
i∈[k]

Note that Lemma 12 does not require independence. For a proof see, e.g., [48].

76
6.3 Proofs of Theorems and Lemmas of Chapter 4
6.3.1 Proof of Theorem 10:
The optimisation problem given in Alg.3 is as follows:
p
X
min w.j ⊤ w.j /n (6.168)
W
j=1

subject to C0 : w.j ⊤ w.j /n ≤ 1 ∀ j ∈ [p]

1 ⊤
C1 : Ip − W A ≤ µ1 ,
n ∞

1 1 ⊤
C2 : In − W A A ≤ µ2 ,
p n ∞
⊤

n WA p
C3 : q − In ≤ µ3 ,
p 1 − np n n ∞

p
In order to prove that the optimal solution (6.168) is W = (1 − µ3 1 − n/p)A, we first consider am
equivalent relaxed version of this problem with only the constraint C3. The relaxed problem is given
as follows:
n
1X
min wi. ⊤ wi. (6.169)
W n
i=1
W A⊤

p
subject to − In ≤ µ3 1 − n/p.
p ∞

The objective function in (6.168) is equivalent to that of (6.169) as pj=1 w.j ⊤ w.j = trace(W ⊤ W ) =
P
Pn
trace(W W ⊤ ) = ⊤
i=1 wi. wi. . We write the relaxed problem in this form as this optimisation
problem in now separable with respect to the rows of W . The equivalent separable optimisation
problem of (6.169) is given as follows for all i ∈ [n].

min wi. ⊤ wi. /p (6.170)

wi.
⊤
p
subject to wi. A/p − ei ≤ µ3 1 − n/p
∞
p
where, ei is the ith row of In . Now, we show that wi. = (1 − µ3 1 − n/p)ai. is the optimal solution
of (6.170). Let us define ν(A) = maxi̸=j p1 |⟨ai. , ai. ⟩|. We have from Lemma 5, Result 3 of [6], for
random Rademacher matrix A, if n < p,
p 2
P ν(A) ≤ µ3 1 − n/p ≥ 1 − 2 . (6.171)
n
Primal feasibility: This choice of wi. is primal feasible since from (6.171), we have,
p
1 − µ3 1 − n/p ⊤ p p p p
ai. A − ei = max{|µ3 1 − n/p|, |(1−µ3 1 − n/p)µ3 1 − n/p|} = µ3 1 − n/p.
p
∞

with probability atleast 1 − n22 .

p
Primal objective function value: The primal objective function value is (1 − µ 1 − n/p)2 as
a⊤
i. ai. /p = 1.
The Fenchel dual problem We derive the dual of the problem in (6.170) using Fenchel duality.
This tells us that given an optimisation problem of the form

1 ⊺
min f (w) + g w A
w p

77
where f and g are convex, the Fenchel dual is

1
∗
max −f − Ay − g ∗ (−y)
y p
where f ∗ and g ∗ are the convex conjugates of f and g, respectively.
Therefore, we take,
( p
1 2 0 if ∥w − ei ∥∞ ≤ µ3 1 − n/p
f (w) = ∥w∥ and g(w) = .
p ∞ otherwise

Then
p
f ∗ (y) = sup⟨y, w⟩ − f (w) = ∥y∥2
w 4
and
g ∗ (y) = sup⟨y, w⟩ − g(w) =
p
sup √ ⟨y, w⟩ = yj + µ3 1 − n/p∥y∥1 .
w ∥w−ei ∥∞ ≤µ3 1−n/p

gives a dual problem in the form

1 ⊺
y AA⊤ y + yj − µ3 1 − n/p∥y∥1 .
p
sup −
y 4p
p
Dual feasibility: The point y = 2(1 − µ3 1 − n/p)ei is feasible for the dual (trivially).
Dual objective function value: The corresponding dual objective function value is
1 p p p p
− ∥ai. ∥2 4(1 − µ3 1 − n/p)2 + 2(1 − µ3 1 − n/p) − 2µ(1 − µ3 1 − n/p) = (1 − µ3 1 − n/p)2 .
4p
This establishes
p that an optimal solution for the primal of the relaxed p problem given in (6.169) is
(1 − µ3 1 − n/p)ai. and an optimal solution to the dual is 2(1 − µ3 1 − n/p)ei .
pNow, in order to complete the proof, all we need to show is that the solution wi. = (1 −
µ3 1 − n/p)ai. is feasible for constraints C0, C1 and C2. This holds as when the optimal solu-
tion of a relaxed problem belongs to the feasible space of a more constrained problem, then it is the
optimal solution of the constrained problem too.
Feasibility of C0: If we plug in the solution cA in the constraint C0, we get (1−µ3 1 − n/p)2 a⊤
p
.j a.j /n ≤
⊤
p 2
1 for all j ∈ [p]. Since a.j a.j /n = 1, we have (1 − µ3 1 − n/p) ≤ 1 which is true. Hence
p
(1 − µ3 1 − n/p)A is feasible wrt C0. p
Feasibility of C1: Now, ifwe plug in the solution (1 − µ3 1 − n/p)A in the constraint C1, we get
√ √
(1−µ3 1−n/p) ⊤ th element of the matrix I − 1−µ3 1−n/p A⊤ A
Ip − n A A ≤ µ 1 . For all j ∈ [p] the (j, j) p n
p ∞
is 1 − c. This implies, |µ3 1 − n/p| ≤ µ1 which is true for n < p. In case of the off-diagonal elements,
√
th (1−µ3 1−n/p) ⊤ p a⊤
.j a.l
for l, j ∈ [p], the (j, l) element of the matrix Ip − n A A is (1 − µ 3 1 − n/p) n .
a⊤ a.l
Since, −µ1 ≤ .jn ≤ µ1 with high probability from Lemma 5, Result 1 of [6], we have −µ1 ≤
p p
(1 − µ3p 1 − n/p)µ1 ≤ µ1 . Clearly, (1 − µ3 1 − n/p) satisfies this condition too. This implies
(1 − µ3 1 − n/p)A if feasible for constraint C1. p
Feasibility of C2: Next, if we plug in the solution (1 − µ 3 1 − n/p)A in the constraint C2, we get
√
(1−µ 1−n/p)
1
p In −
3
n AA⊤ A ≤ µ2 . For i ∈ [n] and j ∈ [p], the (i, j)th element of the matrix
√ ∞ √
1 (1−µ3 1−n/p) ⊤ aij (1−µ3 1−n/p) Pp Pn
p In − n AA A is given by p − np l=1 k=1 ail akl akj as shown in result
1 Pp Pn
2 of Lemma 5 of [6]. Since A is Rademacher, we have, −µ2 −1/n ≤ np l=1 k=1 ail akl akj ≤ µ2 +1/n
p
p Hence, we have, −µ2 + 1/n + 1/p ≤ (1 − µ3 1 − n/p)(µ2 + 1/n) ≤ µ2 + 1/n + 1.
with high probability.
Therefore, (1 − µ3 1 − n/p) is feasible the constraint C2.
This completes the proof. ■

78
6.3.2 Some useful Lemmas
Lemma 13 Let A, B, C be sets. The event (C \ B) ∩ A = ∅ is equivalent to the conditional statement
A ⊆ B | B ⊆ C.

Proof of Lemma 13
If (C \ B) ∩ A = ∅, then A ⊆ B | B ⊆ C.
Assume (C \ B) ∩ A = ∅. This implies that no element of A belongs to C \ B, i.e., for all x ∈ A:

x∈
/ C \ B =⇒ x ∈ B ∪ (U \ C),

where U is the universal set. Now, assume B ⊆ C. Since x ∈ B implies x ∈ C, the complement U \ C
does not contain elements of A. Thus, every x ∈ A must be in B, implying:

A ⊆ B.

Hence, A ⊆ B | B ⊆ C holds.
If A ⊆ B | B ⊆ C, then (C \ B) ∩ A = ∅. Assume B ⊆ C and A ⊆ B. If B ⊆ C, then every element
of B is in C, so A ⊆ B implies A ⊆ C. Additionally, since A ⊆ B, no element of A can be outside B.
Therefore, no element of A can belong to C \ B, and we have:

(C \ B) ∩ A = ∅.

This concludes the proof. ■

79
Bibliography

[1] List of countries implementing pool testing strategy against COVID-19. https://en.wikipedia.
org/wiki/List_of_countries_implementing_pool_testing_strategy_against_COVID-19.
Last retrieved, Oct 2021.

[2] M. Aldridge, O. Johnson, and J. Scarlett. Group testing: An information theory perspective.
Found. Trends Commun. Inf. Theory, 15:196–392, 2019.

[3] Matthew Aldridge and David Ellis. Pooled Testing and Its Applications in the COVID-19 Pan-
demic, pages 217–249. Springer International Publishing, 2022.

[4] A. Aldroubi, X. Chen, and A.M. Powell. Perturbations of measurement matrices and dictionaries
in compressed sensing. Appl. Comput. Harmon. Anal., 33(2), 2012.

[5] G. Atia and V. Saligrama. Boolean compressed sensing and noisy group testing. IEEE Trans.
Inf. Theory, 58(3), 2012.

[6] S. Banerjee, R. Srivastava, J. Saunderson, and A. Rajwade. Robust non-adaptive group testing
under errors in group membership specifications. 2024.

[7] S. H. Bharadwaja and C. R. Murthy. Recovery algorithms for pooled RT-qPCR based COVID-19
screening. IEEE Trans. Signal Process., 70:4353–4368, 2022.

[8] Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis and Nonlinear Optimization: Theory
and Examples. CMS Books in Mathematics. Springer, 2nd edition, 2006.

[9] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge University Press,
2004.

[10] Emmanuel J. Candès, Justin K. Romberg, and Terence Tao. Stable signal recovery from incom-
plete and inaccurate measurements. Comm. Pure Appl. Math., 59(8):1207–1223, 2006.

[11] G. Casella and R. Berger. Statistical Inference. Thomson Learning, 2002.

[12] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama. Non-adaptive probabilistic group testing with
noisy measurements: Near-optimal bounds with efficient algorithms. In ACCC, pages 1832–1839,
2011.

[13] M. Cheraghchi, A. Hormati, A. Karbasi, and M. Vetterli. Group testing with probabilistic tests:
Theory, design and application. IEEE Transactions on Information Theory, 57(10), 2011.

[14] A. Christoff et al. Swab pooling: A new method for large-scale RT-qPCR screening of SARS-
CoV-2 avoiding sample dilution. PLOS ONE, 16(2):1–12, 02 2021.

[15] S. Comess, H. Wang, S. Holmes, and C. Donnat. Statistical Modeling for Practical Pooled Testing
During the COVID-19 Pandemic. Statistical Science, 37(2):229 – 250, 2022.

[16] R. Dorfman. The detection of defective members of large populations. Ann. Math. Stat.,
14(4):436–440, 1943.

80
[17] Marco F Duarte, Mark A Davenport, Dharmpal Takhar, Jason N Laska, Ting Sun, Kevin F Kelly,
and Richard G Baraniuk. Single-pixel imaging via compressive sampling. IEEE signal processing
magazine, 25(2):83–91, 2008.

[18] E. Fenichel, R. Koch, A. Gilbert, G. Gonsalves, and A. Wyllie. Understanding the barriers to
pooled SARS-CoV-2 testing in the United States. Microbiology Spectrum, 2021.

[19] M. Fischler and R. Bolles. Random sample consensus: A paradigm for model fitting with ap-
plications to image analysis and automated cartography. Communications of the ACM, 24(6),
1981.

[20] D. Forsyth and J. Ponce. Computer Vision: A Modern Approach. Pearson, 2012.

[21] S. Fosson, V. Cerone, and D. Regruto. Sparse linear regression from perturbed data. Automatica,
122, 2020.

[22] Sabyasachi Ghosh, Rishi Agarwal, Mohammad Ali Rehan, Shreya Pathak, Pratyush Agarwal,
Yash Gupta, Sarthak Consul, Nimay Gupta, Ritesh Goenka, Ajit Rajwade, and Manoj Gopalkr-
ishnan. A compressed sensing approach to pooled RT-PCR testing for COVID-19 detection.
IEEE Open Journal of Signal Processing, 2:248–264, 2021.

[23] A. C. Gilbert, M. A. Iwen, and M. J. Strauss. Group testing and sparse signal recovery. In 42nd
Asilomar Conf. Signals, Syst. and Comput., pages 1059–1063, 2008.

[24] N. Grobe, A. Cherif, X. Wang, Z. Dong, and P. Kotanko. Sample pooling: burden or solution?
Clin. Microbiol. Infect., 27(9):1212–1220, 2021.

[25] T. Hastie, R. Tibshirani, and M. Wainwright. Statistical Learning with Sparsity: The LASSO
and Generalizations. CRC Press, 2015.

[26] A. Heidarzadeh and K. Narayanan. Two-stage adaptive pooling with RT-qPCR for COVID-19
screening. In ICASSP, 2021.

[27] M.A. Herman and T. Strohmer. General deviants: an analysis of perturbations in compressed
sensing. IEEE Journal on Sel. Topics Signal Process., 4(2), 2010.

[28] F. Hwang. A method for detecting all defective members in a population by group testing. J Am
Stat Assoc, 67(339):605–608, 1972.

[29] J.D. Ianni and W.A. Grissom. Trajectory auto-corrected image reconstruction. Magnetic Reso-
nance in Medicine, 76(3), 2016.

[30] T. Ince and A. Nacaroglu. On the perturbation of measurement matrix in non-convex compressed
sensing. Signal Process., 98:143–149, 2014.

[31] A. Javanmard and A. Montanari. Confidence intervals and hypothesis testing for high-dimensional
regression. J Mach Learn Res, 2014.

[32] A. Kahng and S. Reda. New and improved BIST diagnosis methods from combinatorial group
testing theory. IEEE Trans. Comp. Aided Design of Inetg. Circ. and Sys., 25(3), 2006.

[33] D. B. Larremore, B. Wilder, E. Lester, S. Shehata, J. M. Burke, J. A. Hay, M. Tambe, M. Mina,

and R. Parker. Test sensitivity is secondary to frequency and turnaround time for COVID-19
screening. Science Advances, 7(1), 2021.

[34] Y. Li and G. Raskutti. Minimax optimal convex methods for Poisson inverse problems under ℓq
-ball sparsity. IEEE Trans. Inf. Theory, 64(8):5498–5512, 2018.

[35] Yuan Li and Garvesh Raskutti. Minimax optimal convex methods for poisson inverse problems
under ℓq -ball sparsity. IEEE Transactions on Information Theory, 64(8):5498–5512, 2018.

81
[36] Hubert W Lilliefors. On the Kolmogorov-Smirnov test for normality with mean and variance
unknown. Journal of the American statistical Association, 62(318):399–402, 1967.

[37] Dengyu Liu, Jinwei Gu, Yasunobu Hitomi, Mohit Gupta, Tomoo Mitsunaga, and Shree K Na-
yar. Efficient space-time sampling with pixel-wise coded exposure for high-speed imaging. IEEE
transactions on pattern analysis and machine intelligence, 36(2):248–260, 2013.

[38] G. Lugosi. Concentration-of-measure inequalities, 2009. https://www.econ.upf.edu/ lu-

gosi/anu.pdf.

[39] A. Mazumdar and S. Mohajer. Group testing with unreliable elements. In ACCC, 2014.

[40] D. C. Montgomery, E. Peck, and G. Vining. Introduction to Linear Regression Analysis. Wiley,
2021.

[41] M.Rudelson and R.Vershynin. Smallest singular value of a random rectangular matrix. Comm.
Pure Appl. Math., 2009.

[42] Nam H. Nguyen and Trac D. Tran. Robust LASSO with missing and grossly corrupted observa-
tions. IEEE Trans. Inf. Theory, 59(4):2036–2058, 2013.

[43] H. Pandotra, E. Malhotra, A. Rajwade, and K. S. Gurumoorthy. Dealing with frequency pertur-
bations in compressive reconstructions with Fourier sensing matrices. Signal Process., 165:57–71,
2019.

[44] J. Parker, V. Cevher, and P. Schniter. Compressive sensing under matrix uncertainties: An
approximate message passing approach. In Asilomar Conference on Signals, Systems and Com-
puters, pages 804–808, 2011.

[45] M. Raginsky, R. Willett, Z. Harmany, and R. Marcia. Compressed sensing performance bounds
under Poisson noise. IEEE Trans. Signal Process., 58(8):3990–4002, 2010.

[46] R. Rohde. COVID-19 pool testing: Is it time to jump in? https://asm.org/Articles/2020/

July/COVID-19-Pool-Testing-Is-It-Time-to-Jump-In, 2020.

[47] Mark Rudelson and Roman Vershynin. The Littlewood–Offord problem and invertibility of ran-
dom matrices. Advances in Mathematics, 218(2):600–633, 2008.

[48] P. Massart S. Boucheron, G. Lugosi. Concentration inequality: A nonasymptotic theory of inde-

pendence. Oxford Claredon Press, 2012.

[49] N. Shental et al. Efficient high throughput SARS-CoV-2 testing to detect asymptomatic carriers.
Sci. Adv., 6(37), September 2020.

[50] J. Todd. Induced Norms, pages 19–28. Birkhäuser Basel, Basel, 1977.

[51] Sara Van de Geer, Peter Bühlmann, Ya’acov Ritov, and Ruben Dezeure. On asymptotically
optimal confidence regions and tests for high-dimensional models. The Annals of Statistics,
42(3):1166–1202, 2014.

[52] R. Vershynin. High-Dimensional Probability:An Introduction with Applications in Data Science.

Cambridge University Press, 2018.

[53] Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data

Science, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge
University Press, 2018.

[54] Katherine J. Wu. Why pooled testing for the coronavirus isn’t working in America. https://www.
nytimes.com/2020/08/18/health/coronavirus-pool-testing.html. Last retrieved October
2021.

82
[55] H. Zabeti et al. Group testing large populations for SARS-CoV-2. medRxiv, pages 2021–06, 2021.

[56] Cun-Hui Zhang and Stephanie S. Zhang. Confidence intervals for low-dimensional parameters
in high-dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 76(1):217–242, 2014.

[57] J. Zhang, L. Chen, P. Boufounos, and Y. Gu. On the theoretical analysis of cross validation in
compressive sensing. In ICASSP, 2014.

[58] H. Zhu, G. Leus, and G. Giannakis. Sparsity-cognizant total least-squares for perturbed com-
pressive sampling. IEEE Trans. Signal Process., 59(11), 2011.

Main
No ratings yet
Main
515 pages
Deep Learning MCQ
90% (73)
Deep Learning MCQ
34 pages
Machine Learning
No ratings yet
Machine Learning
674 pages
APS 4 Presentation
No ratings yet
APS 4 Presentation
74 pages
Stats 205 Notes
No ratings yet
Stats 205 Notes
99 pages
Mathematics of Quantum Computation and Quantum Technology 1st Edition Louis Kauffman - Read The Ebook Online or Download It To Own The Full Content
No ratings yet
Mathematics of Quantum Computation and Quantum Technology 1st Edition Louis Kauffman - Read The Ebook Online or Download It To Own The Full Content
81 pages
APS3 Shuvayan
No ratings yet
APS3 Shuvayan
71 pages
Mymodules - ICT1511-19-S1 - Online Assessment 2
No ratings yet
Mymodules - ICT1511-19-S1 - Online Assessment 2
18 pages
Parameter Estimation and Inverse Problems
67% (3)
Parameter Estimation and Inverse Problems
313 pages
AL-405 Machine Learning Lab Manual
No ratings yet
AL-405 Machine Learning Lab Manual
40 pages
OSU Adjustment Notes Part 1
No ratings yet
OSU Adjustment Notes Part 1
230 pages
Time Dependent
No ratings yet
Time Dependent
10 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
A Review of The Applications of Artificial Intelligence - 2024 - Energy Conversi
No ratings yet
A Review of The Applications of Artificial Intelligence - 2024 - Energy Conversi
24 pages
Robust Matrix Completion With Heavy-Tailed Noise
No ratings yet
Robust Matrix Completion With Heavy-Tailed Noise
68 pages
Article:: Reuse
No ratings yet
Article:: Reuse
144 pages
Stanford Statistics311 InformationTheoryAndStatistics
No ratings yet
Stanford Statistics311 InformationTheoryAndStatistics
304 pages
Week 6
No ratings yet
Week 6
34 pages
Econometric S
No ratings yet
Econometric S
1,341 pages
14 Aos1221
No ratings yet
14 Aos1221
37 pages
3 Driessen
100% (1)
3 Driessen
34 pages
YChen Thesis Final11
No ratings yet
YChen Thesis Final11
181 pages
Machine Learning
No ratings yet
Machine Learning
662 pages
TAR 2020 Reading 05
No ratings yet
TAR 2020 Reading 05
20 pages
MachineLearningPatternRecognition 18 Finalversion
No ratings yet
MachineLearningPatternRecognition 18 Finalversion
265 pages
High-Dimensional Statistics: Lecture Notes
No ratings yet
High-Dimensional Statistics: Lecture Notes
168 pages
Econometrics Simpler Note
No ratings yet
Econometrics Simpler Note
692 pages
Forecasting MiM Exercises Part3
No ratings yet
Forecasting MiM Exercises Part3
2 pages
IC-GASMOTDS-2025 Brochure 250710 130719
No ratings yet
IC-GASMOTDS-2025 Brochure 250710 130719
8 pages
2021 - Creel - Econometrics (Githuib Book)
No ratings yet
2021 - Creel - Econometrics (Githuib Book)
1,060 pages
Lesson 1.1 - AP Precalculus - Calc Medic
No ratings yet
Lesson 1.1 - AP Precalculus - Calc Medic
2 pages
Rig Notes 17
No ratings yet
Rig Notes 17
168 pages
Description and The First Use of Numpy Library
No ratings yet
Description and The First Use of Numpy Library
7 pages
Optim ML
No ratings yet
Optim ML
41 pages
Main
No ratings yet
Main
166 pages
SSRN Id3588594
No ratings yet
SSRN Id3588594
27 pages
Lectures On Randomized Numerical Linear Algebra: Petros Drineas Michael W. Mahoney
No ratings yet
Lectures On Randomized Numerical Linear Algebra: Petros Drineas Michael W. Mahoney
45 pages
Eecs127 Reader
No ratings yet
Eecs127 Reader
199 pages
Mat133 Reveiw Notes
No ratings yet
Mat133 Reveiw Notes
27 pages
Figure 1 Original Image
No ratings yet
Figure 1 Original Image
12 pages
Homework 3: Answer
No ratings yet
Homework 3: Answer
14 pages
STA2005S Regression
No ratings yet
STA2005S Regression
92 pages
Color Image Compression-Encryption Algorithm Based
No ratings yet
Color Image Compression-Encryption Algorithm Based
14 pages
The Handwritten Solutions To The First Five Questions, and The Report of Last Question
No ratings yet
The Handwritten Solutions To The First Five Questions, and The Report of Last Question
2 pages
Applying Statistical Learning Theory To Deep Learning
No ratings yet
Applying Statistical Learning Theory To Deep Learning
51 pages
E2UC403B)
No ratings yet
E2UC403B)
5 pages
Selected Theoretical Aspects of ML and Deep Learning
No ratings yet
Selected Theoretical Aspects of ML and Deep Learning
46 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Active Learning in Multimedia Annotation and Retrieval - A Survey
No ratings yet
Active Learning in Multimedia Annotation and Retrieval - A Survey
21 pages
FactorisingExercises132 PDF
No ratings yet
FactorisingExercises132 PDF
4 pages
Lecture Notes For Machine Learning Theory
No ratings yet
Lecture Notes For Machine Learning Theory
167 pages
Data Fitting and Uncertainty (A Practical Introduction To Weighted Least Squares and Beyond)
No ratings yet
Data Fitting and Uncertainty (A Practical Introduction To Weighted Least Squares and Beyond)
6 pages
Signal Flow Graph - GATE Study Material in PDF
No ratings yet
Signal Flow Graph - GATE Study Material in PDF
5 pages
Sliding Mode Controller For PWM Based Buck-Boost DC/DC Converter As State Space Averaging Method in Continuous Conduction Mode
No ratings yet
Sliding Mode Controller For PWM Based Buck-Boost DC/DC Converter As State Space Averaging Method in Continuous Conduction Mode
5 pages
Econometría
No ratings yet
Econometría
43 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Master Sarvi Tuukka 2020
No ratings yet
Master Sarvi Tuukka 2020
68 pages
Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 07 Lecture - 07
No ratings yet
Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week - 01 Module - 07 Lecture - 07
12 pages
ROB501 Textbook2022 03 21
No ratings yet
ROB501 Textbook2022 03 21
142 pages
Notes MSM
No ratings yet
Notes MSM
66 pages
Lecture 01 On Joint Distribution For Discrete RV - 04-09-19
No ratings yet
Lecture 01 On Joint Distribution For Discrete RV - 04-09-19
3 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
Fundamentals of Linear Algebra For Signal Processing 2022 09 22
No ratings yet
Fundamentals of Linear Algebra For Signal Processing 2022 09 22
321 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Reg Book Stat
No ratings yet
Reg Book Stat
79 pages
Graph Traversal: Text Depth-First Search Breadth-First Search
No ratings yet
Graph Traversal: Text Depth-First Search Breadth-First Search
41 pages
Notes 12j686o
No ratings yet
Notes 12j686o
272 pages
Stack Practice Programs
No ratings yet
Stack Practice Programs
19 pages
Eco No Metrics
No ratings yet
Eco No Metrics
1,045 pages
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
100% (1)
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
414 pages
Fundamentals of Mathematical Statistics 2020
No ratings yet
Fundamentals of Mathematical Statistics 2020
196 pages
Lecturenote - COL341 - 2010
No ratings yet
Lecturenote - COL341 - 2010
116 pages
Optimization For Machine Learning
No ratings yet
Optimization For Machine Learning
45 pages
Python Deep Learning Tutorial
0% (1)
Python Deep Learning Tutorial
17 pages
Creel M Econometrics
No ratings yet
Creel M Econometrics
479 pages
Applied Robust Statistics-David Olive
No ratings yet
Applied Robust Statistics-David Olive
588 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
Ebook Econometrics
No ratings yet
Ebook Econometrics
1,006 pages
Econometrics UAB
No ratings yet
Econometrics UAB
353 pages
Advanced Statistical Computing PDF
No ratings yet
Advanced Statistical Computing PDF
329 pages
Gauss Markov Book
No ratings yet
Gauss Markov Book
150 pages
EDA-Discrete Probability Distribution
No ratings yet
EDA-Discrete Probability Distribution
35 pages
Stat Computing
No ratings yet
Stat Computing
329 pages
Estimation and Detection Theory by Don H. Johnson
No ratings yet
Estimation and Detection Theory by Don H. Johnson
214 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Risk Management and System Safety
From Everand
Risk Management and System Safety
Leonam dos Santos Guimarães
5/5 (1)
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.