0% found this document useful (0 votes)
21 views8 pages

Gradient Descent With Sparsification: An Iterative Algorithm For Sparse Recovery With Restricted Isometry Property

Uploaded by

Ng Phúc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Gradient Descent With Sparsification: An Iterative Algorithm For Sparse Recovery With Restricted Isometry Property

Uploaded by

Ng Phúc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Gradient Descent with Sparsification: An iterative algorithm for

sparse recovery with restricted isometry property

Rahul Garg grahul@us.ibm.com


Rohit Khandekar rohitk@us.ibm.com
IBM T. J. Watson Research Center, 1101 Kitchawan Road, Route 134, Yorktown Heights, NY 10598

Abstract deconvolution and de-noising (Figueiredo & Nowak,


2005) and compressed sensing (Candès & Wakin,
We present an algorithm for finding an s-
2008). The recent results in the area of compressed
sparse vector x that minimizes the square-
sensing, especially those relating to the properties of
error ky − Φxk2 where Φ satisfies the re-
random matrices (Candès & Tao, 2006; Candès et al.,
stricted isometry property (RIP), with iso-
2006) has exploded the interest in this area which is
metric constant δ2s < 1/3. Our algorithm,
finding applications in diverse domains such as coding
called GraDeS (Gradient Descent with Spar-
and information theory, signal processing, artificial in-
sification) iteratively updates x as:
telligence, and imaging. Due to these developments,

1
 efficient algorithms to find sparse solutions are increas-
>
x ← Hs x + · Φ (y − Φx) ingly becoming very important.
γ
Consider a system of linear equations of the form
where γ > 1 and Hs sets all but s largest
magnitude coordinates to zero. GraDeS con- y = Φx (1)
verges to the correct solution in constant
number of iterations. The condition δ2s < where y ∈ <m is an m-dimensional vector of “mea-
1/3 is most general for which a near-linear surements”, x ∈ <n is the unknown signal to be recon-
time algorithm is known. In comparison, structed and Φ ∈ <m×n is the measurement matrix.
the best condition under which a polynomial- The signal x is represented in a suitable (possibly over-

time algorithm is known, is δ2s < 2 − 1. complete) basis and is assumed to be “s-sparse” (i.e.
at most s out of n components in x are non-zero). The
Our Matlab implementation of GraDeS out- sparse reconstruction problem is
performs previously proposed algorithms like
Subspace Pursuit, StOMP, OMP, and Lasso min ||x̃||0 subject to y = Φx̃ (2)
x̃∈<n
by an order of magnitude. Curiously,
our experiments also uncovered cases where where ||x̃||0 represents the number of non-zero entries
L1-regularized regression (Lasso) fails but in x̃. This problem is not only NP-hard (Natarajan,
GraDeS finds the correct solution. 1995), but also hard to approximate within a factor
1−
O(2log (m) ) of the optimal solution (Neylon, 2006).

1. Introduction
RIP and its implications. The above problem be-
Finding a sparse solution to a system of linear equa- comes computationally tractable if the matrix Φ sat-
tions has been an important problem in multiple do- isfies a restricted isometry property. Define isometry
mains such as model selection in statistics and ma- constant of Φ as the smallest number δs such that the
chine learning (Golub & Loan, 1996; Efron et al., 2004; following holds for all s-sparse vectors x ∈ <n
Wainwright et al., 2006; Ranzato et al., 2007), sparse
principal component analysis (Zou et al., 2006), image (1 − δs )kxk22 ≤ kΦxk22 ≤ (1 + δs )kxk22 (3)

Appearing in Proceedings of the 26 th International Confer- It has been shown (Candès & Tao, 2005; Candès, 2008)
ence on Machine Learning, Montreal, Canada, 2009. Copy- that ∗ ∗
right 2009 by the author(s)/owner(s). √ if y = Φx for some s-sparse vector x and δ2s <
2 − 1 or δs + δ2s + δ3s < 1, then the solution to the
Gradient Descent with Sparsification

program (2) is equivalent to the solution to the linear multiple of mn are needed.
program (4) given below.
Runtime in practice. A good asymptotic theo-
min ||x̃||1 subject to y = Φx̃ (4) retical bound is not very useful if the runtime con-
x̃∈<n stants are very big. From a practical perspective, it is
The above program can be solved in polynomial time very important to quantify the exact number of steps
using standard linear programming techniques. needed by the algorithm (e.g., number of iterations,
number of floating point operations per iteration).
Approaches to solve sparse regression problem.
In order for this theory to be applied in practice, effi- Guarantees on recovery. It is important to un-
cient algorithms to solve program (2) or (4) are needed. derstand the conditions under which the algorithm is
This is more so in some applications such as medical guaranteed to find the correct solution. While some al-
imaging, where the image sizes (n) are in the range gorithms do not offer any such guarantee, some others
107 to 109 . In these applications, almost linear time can possibly find the correct solution if some condi-
algorithm with small runtime-constants are needed. tions on the RIP constants δs , δ2s , δ3s , . . . are satisfied.

Algorithms to solve the sparse regression problems A comparison of a few key algorithms for sparse recov-
may be classified into “L0-minimization algorithms” ery is presented in Table 1. For a meaningful compari-
that directly attempt to solve the program (2), “L1- son, an attempt is made to report the exact constants
minimization algorithms” that find sparse solutions by in the runtime of these algorithms. However, when
solving the program (4) and joint encoding/recovery the constants are large or are not available, the order
techniques that design the measurement matrix Φ notation O(.) is used.
along with efficient algorithm for recovery. The encoding/recovery techniques that design the
Examples of L0-minimization algorithms include the measurement matrix Φ, are fast and guarantee exact
classical Matching Pursuit (MP) (Mallat & Zhang, recovery. However, the application of these techniques
1993), Orthogonal Matching Pursuit (OMP) (Tropp is restricted to compressed sensing domains where a
& Gilbert, 2007), stagewise OMP (StOMP) (Donoho good control is available on the measurement process.
et al., 2006), regularized OMP (ROMP) (Needell The L1-minimization √ techniques offer tight guarantees
& Vershynin, 2009), subspace pursuits (Dai & on recovery (δ2s < 2 − 1 follows from (Candès, 2008)
Milenkovic, 2008), CoSaMP (Needell & Tropp, 2008), and δs + δ2s + δ3s < 1 from (Candès & Tao, 2005)),
SAMP (Do et al., 2008) and iterative hard threshold- provided the algorithm converges to optimal solution
ing (IHTs) (Blumensath & Davies, 2008). to the program (4). These techniques are generally
The L1-minimization algorithms include the classi- applicable to a broader class of problems of optimizing
cal basis pursuit (Chen et al., 2001) algorithm, the a convex objective function with L1-penalty. For these
Lasso-modification to LARS (Efron et al., 2004), ran- techniques, typically the exact bound on the runtime
dom projections onto convex sets (Candès & Romberg, are either very large or not available.
2004), homotopy (Donoho & Tsaig, 2006), weighted The L0-minimization techniques are most promising
least squares (Jung et al., 2007), iterative algorithms because they are often based on the greedy approaches
based on gradient thresholding and other gradient that find the solution quickly. Traditional techniques
based approaches (Lustig, 2008; Ma et al., 2008). such as MP (Mallat & Zhang, 1993) or OMP (Tropp &
Joint encoding/recovery techniques in which the mea- Gilbert, 2007) add one variable at a time to the solu-
surement matrix Φ is designed along with the recovery tion leading to a runtime of O(mns). However, newer
algorithm include Sudocodes (Sarvotham et al., 2006), techniques such as StOMP, ROMP, subspace-pursuit
unbalanced expander matrices (Berinde et al., 2008), and IHTs add multiple variables to the solution and
among others. fall in the class of near linear-time algorithms with a
runtime of O(mn poly(log(mn))). Most of these tech-
While comparing these algorithms, the following key niques also have conditions on RIP constants δ under
properties need to be examined: which the exact recovery is guaranteed.
Worst case computational cost. Is there a upper Our results and techniques. We give the first near
bound on the number of steps needed by the algo- linear-time algorithm that is guaranteed to find solu-
rithm? For large scale problems where mn is in the tion to the program (2) under the condition δ2s < 1/3.
range 1010 − 1013 and s in the range 103 − 108 , algo- This is the most general condition under which the
rithms requiring O(mns) time are not useful. In such problem can be solved in near linear-time (the best
cases, algorithms with runtime bounded by a constant
Gradient Descent with Sparsification

Table 1. A comparison of algorithms for sparse recovery. Here m and n denote the number of rows and
columns of matrix Φ, K denotes the time taken in performing two matrix operations Φx and ΦT y. It is
equal to 2mn for dense matrices. It may be much less for sparse matrices or special transform which can
be computed efficiently (e.g., Fourier, Wavelet transforms). s denotes the sparsity of the solution to be con-
structed, L denotes the desired bit-precision of the solution and δt denote the isometry constants of Φ.

Algorithm Cost/ iter. Max. # iters. Recovery condition


Basis pursuit (Chen et al., 2001) O((m + n)3 ) NA δs + δ2s +√δ3s < 1
or δ2s < 2 − 1
LARS (Efron et al., 2004) K + O(s2 ) Unbounded same as above
Homotopy (Donoho & Tsaig, 2006) K s ΦTi Φj:i6=j ≤ 2s−1
1

Sudocodes (Sarvotham et al., 2006) O(s log(s) log(n)) NA Always


Unbalanced expander (Berinde et al., 2008) O(n log(n/s)) NA Always
OMP (Tropp & Gilbert, 2007) K + O(msL) s ΦTi Φj:i6=j < 1/(2s)
StOMP (Donoho et al., 2006) K + O(msL) O(1) one
ROMP (Needell & Vershynin, 2009) K + O(msL) s δ2s < √0.03
log(s)
Subspace pursuit (Dai & Milenkovic, 2008) K + O(msL) L/ log( √1−δ 3s
10δ3s
) δ3s < 0.06
SAMP (Do et al., 2008) K + O(msL) s δ3s < 0.06
CoSaMP (Needell & Tropp, 2008) K + O(msL) L δ4s < 0.1

IHTs (Blumensath & Davies, 2008) K L δ3s < 1/ 32
GraDeS (This paper) K 2L/ log( 1−δ 2s
2δ2s
) δ2s < 1/3


known so far was δ3s < 1/ 32 needed by the IHTs found that while Lasso provides the best theoretical
algorithm (Blumensath & Davies, 2008)),
√ and is re- conditions for exact recovery, its LARS-based imple-
markably close to the condition δ2s < 2 − 1 (Candès, mentation is first to fail as the sparsity is increased.
2008) (which requires computationally expensive lin-
ear programming solver). The algorithm is intuitive, 2. Algorithm GraDeS and its properties
easy to implement and has small runtime constants.
We begin with some notation. For a positive integer
The algorithm (given in Algorithm 1) is called GraDeS
n, let [n] = {1, 2, . . . , n}. For a vector x ∈ <n , let
or “Gradient Descent with Sparsification”. It starts
supp(x) denote the set of coordinates i ∈ [n] such that
from an arbitrary sparse x ∈ <n and iteratively moves
xi 6= 0, thus kxk0 = |supp(x)|. For a non-negative
along the gradient to reduce the error Ψ(x) = ky −
integer s, we say Pthat x is s-sparse if kxkP 0 ≤ s. We
Φxk2 by a step length 1/γ and then performs hard- n n 2 1/2
also use kxk1 = i=1 |xi | and kxk2 = ( i=1 xi )
thresholding to restore the sparsity of the current solu-
to denote the `1 and `2 norms of x respectively. For
tion. The gradient descent step reduces the error Ψ(x)
brevity, we use kxk to denote kxk2 .
by a constant factor, while the RIP of Φ implies that
the sparsification step does not increase the error Ψ(x)
Definition 2.1 Let Hs : <n → <n be a function that
by too much. An important contribution of the paper
sets all but s largest coordinates in absolute value to
is to analyze how the hard-thresholding function Hs
zero. More precisely, for x ∈ <n , let π be a permu-
acts w.r.t. the potential Ψ(x) (see Lemma 2.4). We
tation of [n] such that |xπ(1) | ≥ · · · ≥ |xπ(n) |. Then
believe that this analysis may be of independent in-
the vector Hs (x) is a vector x0 where x0π(i) = xπ(i) for
terest. Overall, a logarithmic number of iterations are
needed to reduce the error below a given threshold. i ≤ s and x0π(i) = 0 for i ≥ s + 1.

A similar analysis also holds for a recovery with noise, The above operator is called hard-thresholding and
where the “best” sparse vector x∗ is only approxi- gives the best s-sparse approximation of vector x, i.e.,
mately related to the observation y, i.e., y = Φx∗ + e Hs (x) = argminx0 ∈<n :kx0 k0 ≤s kx − x0 k2 where the min-
for some error vector e. imum is taken over all s-sparse vectors x0 ∈ <n .
We implemented the algorithm in Matlab and found Noiseless recovery. Our main result for the noiseless
that it outperformed the publicly available implemen- recovery is given below.
tations of several newly developed near-linear time al-
gorithms by an order of magnitude. The trends sug- Theorem 2.1 Suppose x∗ is an s-sparse vector sat-
gest a higher speedup for larger matrices. We also isfying (1) and the isometry constants of the ma-
Gradient Descent with Sparsification

Algorithm 1 Algorithm GraDeS (γ) for solving (1). Then, we move in the direction opposite to the gradi-
Initialize x ← 0. ent by a step length given by a parameter γ and per-
while Ψ(x) >  do form hard-thresholding in order to preserve sparsity.
Note that each iteration needs just two multiplications
by Φ or Φ> and linear extra work.
 
1
x ← Hs x + · Φ> (y − Φx) .
γ
We now prove Theorem 2.1. Fix an iteration and let x
end while be the current solution. Let x0 = Hs (x+ γ1 ·Φ> (y−Φx))
be the solution computed at the end of this iteration.
Let ∆x = x0 − x and let ∆x∗ = x∗ − x. Note that both
trix Φ satisfy δ2s < 1/3. Algorithm 1, GraDeS with ∆x and ∆x∗ are 2s-sparse. Now the reduction in the
γ = 1 + δ2s , computes an s-sparse vector x ∈ <n such potential Ψ in this iteration is given by
that ky − Φxk2 ≤  in
Ψ(x0 ) − Ψ(x)
kyk2
  
1
· log = −2Φ> (y − Φx) · ∆x + ∆x> Φ> Φ∆x
log((1 − δ2s )/2δ2s ) 
≤ −2Φ> (y − Φx) · ∆x + (1 + δ2s )k∆xk2 (5)
iterations. Each iteration computes one multiplication ≤ −2Φ> (y − Φx) · ∆x + γk∆xk2 (6)
of Φ and one multiplication of Φ> with vectors.
The inequality (5) follows from RIP. Let g = −2Φ> (y−
The following corollary follows immediately by setting Φx). Now we prove an important component of our
 = 2−2L (1 − δ2s ). analysis.
Corollary 2.2 Suppose there exists s-sparse vector x∗ Lemma 2.4 The vector ∆x achieves the minimum of
satisfying (1) and the isometry constants of the matrix g·v+γkvk2 over all vectors v such that x+v is s-sparse,
Φ satisfy δ2s < 1. The vector x∗ can be approximated 
i.e., ∆x = argminv∈<n :kx+vk0 ≤s g · v + γkvk2 .
up to L bits of accuracy in
  Proof. Let v = v ∗ denote the vector such that x + v
2(L + log kyk) + log(1/(1 − δ2s ))
is s-sparse and that F (v) = g · v + γkvk2 is minimized.
log((1 − δ2s )/2δ2s )
Let x00 = x + v ∗ . Let SO = supp(x) \ supp(x + v ∗ )
iterations of Algorithm 1 with γ = 1 + δ2s . be the old coordinates in x that are set to zero. Let
SN = supp(x+v ∗ )\supp(x) be the new coordinates in
Recovery with noise. Our result for the recovery x+v ∗ . Similarly, let SI = supp(x)∩supp(x+v ∗
P ) be the
with noise is as follows. ∗
common coordinates. Note that F (v ) = i∈[n] (gi ·
vi∗ + γ(vi∗ )2 ). Note that the value of gi · vi + γ(vi )2 is
Theorem 2.3 Suppose x∗ is an s-sparse vector satis-
minimized for vi = −gi /(2γ). Therefore we get that
fying y = Φx∗ + e for an error vector e ∈ <m and the
vi∗ = −gi /(2γ) for all i ∈ supp(x + v ∗ ) and vi∗ = −xi
isometry constant of the matrix Φ satisfies δ2s < 1/3.
for all i ∈ SO . Thus we have
There exists a constant D > 0 that depends only on
δ2s , such that Algorithm 1 with γ = 1 + δ2s , computes
an s-sparse vector x ∈ <n satisfying kx∗ − xk ≤ Dkek
in at most F (v ∗ )
g2
 
X X −gi
kyk2 (−gi · xi + γx2i ) + gi · + i
  
1 =
· log 2γ 4γ
log((1 − δ2s )/4δ2s ) kek2 i∈SO i∈SI ∪SN

iterations. X X −gi2
= (−gi · xi + γx2i ) +

2.1. Proof of Theorem 2.1 i∈SO i∈SI ∪SN
X g2 −gi2

n 2
X
For x ∈ < , let Ψ(x) = ky − Φxk be a potential = −gi · xi + γx2i + i +
function. Our algorithm 1 starts with some initial 4γ 4γ
i∈SO i∈SO ∪SI ∪SN
value of x that is s-sparse, say x = 0, and iteratively 2
−gi2

X gi X
reduces Ψ(x), while maintaining the s-sparsity, until = γ xi − +
Ψ(x) ≤ . In each iteration, we compute a gradient of 2γ 4γ
i∈SO i∈SO ∪SI ∪SN
Ψ(x) = ky − Φxk2 , X gi
2 X  2
gi
= γ xi − +γ −
∇Ψ(x) = −2Φ> (y − Φx). i∈SO
2γ 2γ
i∈SO ∪SI ∪SN
Gradient Descent with Sparsification

Thus, in order to minimize F (v ∗ ), we have to pick the Lemma 2.5 Let C > 0 be a constant satisfying δ2s <
coordinates i ∈ SO to be those with the least value of C2
3C 2 +4C+2 . Algorithm 1 computes an s-sparse vector
|xi − gi /(2γ)| and pick the coordinates j ∈ SN to be x ∈ <n such that ky − Φxk2 ≤ C 2 kek2 in
those with the largest value of gj /(2γ) = |xj −gj /(2γ)|.
Note that −gi /(2γ) = (1/γ) · Φ> (y − Φx) which is the kyk2
  
1
· log
expression used in the while loop of the Algorithm 1. log((1 − δ2s )/2δ2s · C 2 /(C + 1)2 ) C 2 kek2
Hence the proof is complete from the definition of Hs .
iterations if we set γ = 1 + δ2s .

From inequality (6), Lemma 2.4, and the fact that Note that the output x in the above lemma satisfies
x∗ = x + ∆x∗ is s-sparse, we get
kΦx∗ − Φxk2 ≤ ky − Φxk2 + 2(y − Φx)> (y − Φx∗ )
0
Ψ(x ) − Ψ(x) +ky − Φx∗ k2
> ∗ ∗ 2
≤ −2Φ (y − Φx) · ∆x + γk∆x k ≤ C kek2 + 2Ckek · kek + kek2
2

= −2Φ> (y − Φx) · ∆x∗ + (1 − δ2s )k∆x∗ k2 = (C + 1)2 kek2 .


+(γ − 1 + δ2s )k∆x∗ k2
This combined with RIP implies that kx∗ − xk2 ≤
≤ −2Φ> (y − Φx) · ∆x∗ + (∆x∗ )> Φ> Φ∆x∗ (C + 1)2 kek2 /(1 − δ2s ), thereby proving Theorem 2.3.
+(γ − 1 + δ2s )k∆x∗ k2 (7)
∗ ∗ 2 Proof of Lemma 2.5: We work with the same no-
= Ψ(x ) − Ψ(x) + (γ − 1 + δ2s )k∆x k tation as in Section 2.1. Let the current solution
γ − 1 + δ2s x satisfy Ψ(x) = ky − Φxk2 > C 2 kek2 , let x0 =
≤ −Ψ(x) + · kΦ∆x∗ k2 (8)
1 − δ2s Hs (x+ γ1 ·Φ> (y −Φx)) be the solution computed at the
γ − 1 + δ2s end of an iteration, let ∆x = x0 − x and ∆x∗ = x∗ − x.
≤ −Ψ(x) + · Ψ(x) (9)
1 − δ2s The initial part of the analysis is same as that in Sec-
tion 2.1. Using (10), we get
The inequalities (7) and (8) follow from RIP, while (9)
follows from the fact that kΦ∆x∗ k2 = Ψ(x). Thus we kΦ∆x∗ k2 = ky − Φx − ek2
get Ψ(x0 ) ≤ Ψ(x) · (γ − 1 + δ2s )/(1 − δ2s ). Now note = Ψ(x) − 2e> (y − Φx) + kek2
that δ2s < 1/3 implies that (γ − 1 + δ2s )/(1 − δ2s ) < 1,
2 1
and hence the potential decreases by a multiplicative ≤ Ψ(x) + ky − Φxk2 + 2 · Ψ(x)
factor in each iteration. If we start with x = 0, the C C
 2
initial potential is kyk2 . Thus after 1
≤ 1+ · Ψ(x).
C
kyk2
  
1
· log Using the above inequality in inequality (8), we get
log((1 − δ2s )/(γ − 1 + δ2s ) 
 2
iterations, the potential becomes less than . Setting γ − 1 + δ2s 1
Ψ(x0 ) ≤ · 1+ · Ψ(x).
γ = 1 + δ2s , we get the desired bounds. 1 − δ2s C
If the value of δ2s is not known, by setting γ = 4/3, 2
C
Our assumption δ2s < 3C 2 +4C+2 implies that γ−1+δ
1−δ2s ·
2s

the same algorithm computes the above solution in 2 2δ2s 2


1 + C1 = 1−δ · 1 + C1 < 1. This implies that as
 
2s
kyk2
  
1 long as Ψ(x) > C 2 kek2 , the potential Ψ(x) decreases
· log
log((1 − δ2s )/(1/3 + δ2s ))  by a multiplicative factor. Lemma 2.5 thus follows.

iterations.
3. Implementation Results
2.2. Proof of Theorem 2.3 To evaluate usefulness of GraDeS in practice, it was
The recovery under noise corresponds to the case compared with LARS-based implementation of Lasso
where there exists an s-sparse vector x∗ ∈ <n such (Efron et al., 2004) (referred to as Lasso/LARS), Or-
that thogonal Matching Pursuit (OMP), Stagewise OMP
(StOMP) and Subspace Pursuit. The algorithms were
y = Φx∗ + e (10)
selected to span the spectrum of the algorithms – on
for some error vector e ∈ <m . In order to prove The- one extreme, Lasso/LARS, that provides the best re-
orem 2.3, we prove the following stronger lemma. covery conditions but a poor bound on its runtime and
Gradient Descent with Sparsification

Lasso/LARS OMP StOMP Subspace Pursuit


GraDeS (3) GraDeS (4/3) Lasso/LARS OMP StOMP Subspace Pursuit
GraDeS (3) GraDeS (4/3)
110 90
6000 56.86 42.67 21.91 23.01 8.14 2.95
3000 38.71 22.04 9.57
Lasso/LARS 85
4000 66.28 51.36 23.29 23.72 10.06 3.78 8000 71.11 54.76 26.73 25.16 10.72 4.05
100
5000 OMP
69.54 62.13 29.45 22.71 10.41 3.71 80
10000 60.34 27.53 25.22 12.21 4.76
6000 77.64
StOMP 74.73 35.14 25.9 11.44 4.27 12000 66.89 32.84 23.82 15.01 5.93
75
90
7000 89.99
Subspace 86.84
Pursuit 41.97 30.08 12.24 4.23 14000 76.35 36.84 23.58 19.6 7.36
Lasso/LARS
8000 101.07 98.47 45.18 34.31 13.53 4.82 70
GraDeS (3) 16000 89.07 40.08 29.74 20.22 OMP 8.94
80 65
GraDeS (4/3) StOMP
60

Run time (sec)


Run time (sec)

70 55 Subspace Pursuit
50 GraDeS (3)
60
45 GraDeS (4/3)
50 40
35
40 30
30 25
20
20 15
10
10
5
0 0
3000 4000 5000 6000 7000 8000 6000 8000 10000 12000 14000 16000

Number of rows (m) Number of columns (n)

Figure 1. Dependence of the running times of different al- Figure 2. Dependence of the running times of different al-
gorithms on the number of rows (m). Here number of gorithms on the number of columns (n). Here number of
columns n = 8000 and sparsity s = 500. rows m = 4000 and sparsity s = 500.
Lasso/LARS OMP StOMP Subspace Pursuit
GraDeS (3) GraDeS (4/3)
120
100 11.47 11.05 6.65 1.26 7.95 2.65
in the other extreme, near linear-time algorithms with 115200
110300
24.65
39.06
Lasso/LARS
23.75
38.17
11.31
15.08
4.23
8.72
9.23
10.05
3.25
3.84
weaker recovery conditions but very good run times. 105400
100500
58.18
OMP
86.33
54.27
75.36
25.14
31.02
17.95
22.28
12.07
13.53
4.24
5.02
StOMP
95600 115.17 94.46 38.48 44.24 14.88 5.51
Subspace Pursuit
Matlab implementation of all the algorithms were used 90
85 GraDeS (3)
GraDeS (4/3)
for evaluation. The SparseLab (Donoho & Others, 80
75
Run time (sec)

2009) package, which contains the implementation of 70


65

Lasso/LARS, OMP and StOMP algorithms was used. 60


55
50
In addition, a publicly available optimized Matlab im- 45
40
plementation of Subspace Pursuit, provided by one 35
30
of its authors, was used. The algorithm GraDeS was 25
20
also implemented in Matlab for a fair comparison. 15
10
GraDeS(γ = 4/3) for which all our results hold was 5
0
evaluated along with GraDeS(γ = 3) which was a bit 100 200 300 400 500 600

slower but had better recovery properties. All the ex- Sparsity (s)

periments were run on a 3 GHz dual-core Pentium sys-


tem with 3.5 GB memory running the Linux operating Figure 3. Dependence of the running times of different al-
gorithms on the sparsity (s). Here number of rows m =
system. Care was taken to ensure that no other pro-
5000 and number of columns n = 10000.
gram was actively consuming the CPU time while the
experiments were in progress.
a factor 7 better than that of Subspace Pursuit for
First, a random matrix Φ of size m × n with (iid) nor-
m = 8000, n = 8000 and s = 500. GraDeS(3) takes
mally distributed entries and a random s-sparse vector
more time than GraDeS(4/3) as expected. It is sur-
x were generated. The columns of Φ were normalized
prising to observe that, contrary to the expectations,
to zero mean and unit norm. The vector y = Φx was
the runtime of GraDeS did not increase substantially
computed. Now the matrix Φ, the vector y and the pa-
as m was increased. It was found that GraDeS needed
rameter s were given to each of the algorithms as the
fewer iterations offsetting the additional time needed
input. The output of every algorithm was compared
to compute products of matrices with larger m.
with the correct solution x.
Figure 2 shows the runtime of these algorithms as the
Figure 1 shows the runtime of all the algorithms as
number of column, n is varied. Here, all the algorithms
a function of the number of rows m. The algo-
seem to scale linearly with n as expected. In this case,
rithms Lasso/LARS and OMP take the maximum time
Lasso/LARS did not give correct results for n > 8000
which increases linearly with m. This is followed by
and hence its runtime was omitted from the graph. As
StOMP and Subspace Pursuit that also show a linear
expected, the runtime of GraDeS is an order of magni-
increase in runtime as a function of m. The algorithm
tude smaller than that of OMP and Lasso/LARS.
GraDeS(4/3) has the smallest runtime which is a fac-
tor 20 better than the runtime of Lasso/LARS and Figure 3 shows the runtime of the algorithms as a func-
Gradient Descent with Sparsification

solution. However, instances for which Lasso/LARS


Table 2. A comparison of different algorithms for various
gave the correct sparse solution but GraDeS failed,
parameter values. An entry “Y” indicates that the algo-
rithm could recover a sparse solution, while an empty entry were also found.
indicates otherwise.
4. Conclusions

GraDeS (γ = 4/3)

Subspace Pursuit
GraDeS (γ = 3)
In summary, we have presented an efficient algorithm

Lasso/LARS
for solving the sparse reconstruction problem provided
the isometry constants of the constraint matrix sat-

StOMP
isfy δ2s < 1/3. Although the recovery conditions for

OMP
GraDeS are stronger than those for L1-regularized re-
m n s
gression (Lasso), our results indicate that whenever
3000 10000 2100
Lasso/LARS finds the correct solution, GraDeS also
3000 10000 1050 Y Y
finds it. Conversely, there are cases where GraDeS
3000 8000 500 Y Y Y
(and other algorithms) find the correct solution but
3000 10000 600 Y Y Y Y Lasso/LARS fails due to numerical issues. In the ab-
6000 8000 500 Y Y Y Y sence of efficient and numerically stable algorithms, it
3000 10000 300 Y Y Y Y Y is not clear whether L1-regularization offers any ad-
4000 10000 500 Y Y Y Y Y vantages to practitioners over simpler and faster algo-
8000 8000 500 Y Y Y Y Y Y rithms such as OMP or GraDeS when the matrix Φ
satisfies RIP. A systematic study is needed to explore
tion of the sparsity parameter s. The increase in the this. Finally, finding more general conditions than RIP
runtime of Lasso/LARS and OMP is super-linear (as under which the sparse reconstruction problem can be
opposed to a linear theoretical bound for OMP). Al- solved efficiently is a very challenging open question.
though, the theoretical analysis of StOMP and Sub-
space Pursuit shows very little dependence on s, their
actual run times do increase significantly as s is in-
References
creased. In contrast, the run times of GraDeS increase Berinde, R., Indyk, P., & Ruzic, M. (2008). Practical
only marginally (due to a small increase in the num- near-optimal sparse recovery in the L1 norm. Aller-
ber of iterations needed for convergence) as s is in- ton Conference on Communication, Control, and
creased. For s = 600, GraDeS(4/3) is factor 20 faster Computing. Monticello, IL.
than Lasso/LARS and a factor 7 faster than StOMP,
the second best algorithm after GraDeS. Blumensath, T., & Davies, M. E. (2008). Iterative
hard thresholding for compressed sensing. Preprint.
Table 2 shows the recovery results for these algorithms.
Quite surprisingly, although the theoretical recovery Candès, E., & Wakin, M. (2008). An introduction
conditions for Lasso are the most general (see Table 1), to compressive sampling. IEEE Signal Processing
it was found that the LARS-based implementation of Magazine, 25, 21–30.
Lasso was first to fail in recovery as s is increased.
Candès, E. J. (2008). The restricted isometry property
A careful examination revealed that as s was increased, and its implications for compressed sensing. Compte
the output of Lasso/LARS became sensitive to er- Rendus de l’Academie des Sciences, Paris, 1, 589–
ror tolerance parameters used in the implementation. 592.
With careful tuning of these parameters, the recovery
property of Lasso/LARS could be improved. The re- Candès, E. J., & Romberg, J. (2004). Practical signal
covery could be improved further by replacing some in- recovery from random projections. SPIN Confer-
cremental computations with slower non-incremental ence on Wavelet Applications in Signal and Image
computations. One such computation was incremen- Processing.
tal Cholesky factorization. These changes adversly im-
pacted the running time of the algorithm, increasing Candès, E. J., Romberg, J. K., & Tao, T. (2006).
its cost per iteration from K + O(s2 ) to 2K + O(s3 ). Robust uncertainty principles: Exact signal recon-
struction from highly incomplete frequency informa-
Even after these changes, some instances were discov- tion. IEEE Transactions on Information Theory, 52,
ered for which Lasso/LARS did not produce the cor- 489–509.
rect sparse (though it computed the optimal solution
of the program (4)), but GraDeS(3) found the correct Candès, E. J., & Tao, T. (2005). Decoding by linear
Gradient Descent with Sparsification

programming. IEEE Transactions on Information using total variation and wavelets. IEEE Conferer-
Theory, 51, 4203–4215. ence on Computer Vision and Pattern Recognition
(CVPR) (pp. 1–8).
Candès, E. J., & Tao, T. (2006). Near-optimal signal
recovery from random projections: Universal encod- Mallat, S. G., & Zhang, Z. (1993). Matching pur-
ing strategies? IEEE Transactions on Information suits with time-frequency dictionaries. IEEE Trans-
Theory, 52, 5406–5425. actions on Signal Processing, 3397–3415.

Chen, S. S., Donoho, D. L., & Saunders, M. A. (2001). Natarajan, B. K. (1995). Sparse approximate solutions
Atomic decomposition by basis pursuit. SIAM Re- to linear systems. SIAM Journal of Computing, 24,
view, 43, 129–159. 227–234.

Dai, W., & Milenkovic, O. (2008). Subspace Pursuit Needell, & Tropp, J. A. (2008). CoSaMP: Iterative
for Compressive Sensing: Closing the Gap Between signal recovery from incomplete and inaccurate sam-
Performance and Complexity. ArXiv e-prints. ples. Applied and Computational Harmonic Analy-
sis, 26, 301–321.
Do, T. T., Gan, L., Nguyen, N., & Tran, T. D. (2008).
Sparsity adaptive matching pursuit algorithm for Needell, D., & Vershynin, R. (2009). Uniform uncer-
practical compressed sensing. Asilomar Conference tainty principle and signal recovery via regularized
on Signals, Systems, and Computers. Pacific Grove, orthogonal matching pursuit. Foundations of Com-
California. putational Mathematics, 9, 317–334.

Donoho, D., & Others (2009). SparseLab: Seek- Neylon, T. (2006). Sparse solutions for linear predic-
ing sparse solutions to linear systems of equations. tion problems. Doctoral dissertation, Courant Insti-
http://sparselab.stanford.edu/. tute, New York University.

Donoho, D. L., & Tsaig, Y. (2006). Fast solution of Ranzato, M., Boureau, Y.-L., & LeCun, Y. (2007).
L1 minimization problems when the solution may Sparse feature learning for deep belief networks. In
be sparse. Technical Report, Institute for Computa- Advances in neural information processing systems
tional and Mathematical Engineering, Stanford Uni- 20 (NIPS), 1185–1192. MIT Press.
versity. Sarvotham, S., Baron, D., & Baraniuk, R. (2006). Su-
Donoho, D. L., Tsaig, Y., Drori, I., & Starck, J.-L. docodes - fast measurement and reconstruction of
(2006). Sparse solution of underdetermined linear sparse signals. IEEE International Symposium on
equations by stagewise orthogonal matching pursuit. Information Theory (ISIT) (pp. 2804–2808). Seat-
Preprint. tle, Washington.
Tropp, J. A., & Gilbert, A. C. (2007). Signal recovery
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R.
from random measurements via orthogonal match-
(2004). Least angle regression. Annals of Statistics,
ing pursuit. IEEE Trans. Info. Theory, 53, 4655–
32, 407–499.
4666.
Figueiredo, M. A. T., & Nowak, R. D. (2005). A bound
Wainwright, M. J., Ravikumar, P., & Lafferty, J. D.
optimization approach to wavelet-based image de-
(2006). High-dimensional graphical model selec-
convolution. IEEE International Conference on Im-
tion using `1 -regularized logistic regression. In Ad-
age Processing (ICIP) (pp. 782–785).
vances in neural information processing systems 19
Golub, G., & Loan, C. V. (1996). Matrix computa- (NIPS’06), 1465–1472. Cambridge, MA: MIT Press.
tions, 3rd ed. Johns Hopkins University Press.
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse
Jung, H., Ye, J. C., & Kim, E. Y. (2007). Improved principal component analysis. Journal of Computa-
k-t BLASK and k-t SENSE using FOCUSS. Phys. tional and Graphical Statistics, 15, 262–286.
Med. Biol., 52, 3201–3226.

Lustig, M. (2008). Sparse MRI. Ph.D Thesis, Stanford


University.

Ma, S., Yin, W., Zhang, Y., & Chakraborty, A. (2008).


An efficient algorithm for compressed MR imaging

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy