Toward Fast Transform Learning

Chabiron, Olivier; Malgouyres, François; Tourneret, Jean-Yves; Dobigeon, Nicolas

doi:10.1007/s11263-014-0771-z

Toward Fast Transform Learning

Published: 11 October 2014

Volume 114, pages 195–216, (2015)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

812 Accesses
Explore all metrics

Abstract

This paper introduces a new dictionary learning strategy based on atoms obtained by translating the composition of $K$ convolutions with $S$-sparse kernels of known support. The dictionary update step associated with this strategy is a non-convex optimization problem. We propose a practical formulation of this problem and introduce a Gauss–Seidel type algorithm referred to as alternative least square algorithm for its resolution. The search space of the proposed algorithm is of dimension $KS$, which is typically smaller than the size of the target atom and much smaller than the size of the image. Moreover, the complexity of this algorithm is linear with respect to the image size, allowing larger atoms to be learned (as opposed to small patches). The conducted experiments show that we are able to accurately approximate atoms such as wavelets, curvelets, sinc functions or cosines for large values of K. The proposed experiments also indicate that the algorithm generally converges to a global minimum for large values of $K$ and $S$.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Dictionary Learning for Sparse Signal Representation

Robust K-SVD: A Novel Approach for Dictionary Learning

Compressed Dictionary Learning

Article 09 March 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

All the signals in $\mathbb R^{\mathcal P}$ are extended by periodization to be defined at any point in $\mathbb Z^d$.
$\mathbb {R}^{\mathcal {P}}$ and $\mathbb R^S$ are endowed with the usual scalar product denoted $\langle . , . \rangle $ and the usual Euclidean norm denoted $\Vert \cdot \Vert _{2}$. We use the same notation whatever the vector space. We expect that the notation will not be ambiguous, once in context.
Usually, DL is applied to small images such as patches extracted from large images.
In the practical situations we are interested in, $\#{\mathcal P}\gg S$ and $S^3$ can be neglected when compared to $(K+S)S\#{\mathcal P}$.
For simplicity, in the formula below, we do not mention the mapping of $\mathbb R^S$ into $\mathbb R^{\mathcal P}$ necessary to build $h^k$.
In this case the comparison is relevant, because $\alpha $ is a Dirac delta function.
A sum of cosines of same frequency and different phases will yield a cosine of unchanged frequency.
We further assume that $\Vert f^k \Vert _{2}\ne 0$, for all $k\in \{1,\ldots ,K\}$, since the inequality is otherwise trivial.

References

Aharon, M., Elad, M., & Bruckstein, A. M. (2006). The K-SVD, an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.
Aldroubi, A., Unser, M., & Eden, M. (1992). Cardinal spline filters: Stability and convergence to the ideal sinc interpolator. Signal Processing, 28(2), 127–138.
Article MathSciNet Google Scholar
Attouch, H., Bolte, J., Redont, P., & Soubeyran, A. (2010). Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka–Łojasiewicz inequality. Mathematics of Operations Research, 35(2), 438–457. doi:10.1287/moor.1100.0449.
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., & Svaiter, B. (2013). Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Mathematical Programming, 137(1–2), 91–129.
Article MathSciNet MATH Google Scholar
Bengio, Y., & LeCun, Y. (2007). Scaling learning algorithms towards AI. Large-Scale Kernel Machines, 34, 1–41.
MATH Google Scholar
Bertsekas, D. (2003). Convex analysis and optimization. Belmont: Athena Scientific.
Google Scholar
Bolte, J., Sabach, S., & Teboulle, M. (2013). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, series A, 146, 1–16. doi:10.1007/s10107-013-0701-9.
Google Scholar
Cai, J. F., Ji, H., Shen, Z., & Ye, G. B. (2014). Data-driven tight frame construction and image denoising. Applied and Computational Harmonic Analysis, 37(1), 89–105.
Champagnat, F., Goussard, Y., & Idier, J. (1996). Unsupervised deconvolution of sparse spike trains using stochastic approximation. IEEE Transaction on Signal Processing, 44(12), 2988–2998.
Article Google Scholar
Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.
Article MathSciNet Google Scholar
Chouzenoux, E., Pesquet, J., & Repetti, A. (2013). A block coordinate variable metric forward-backward algorithm. Tech. Rep. 00945918, HAL.
Cohen, A., & Séré, E. (1996). Time-frequency localization by non-stationary wavelet packet. In M. T. Smith & A. Akansu (Eds.), Subband and Wavelet Transforms: Theory and Design. Boston: Kluwer Academic Publisher.
Google Scholar
Daubechies, I., Defrise, M., & De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57(11), 1413–1457.
Article MathSciNet MATH Google Scholar
De Lathauwer, L., De Moor, B., & Vandewalle, J. (2000). On the best rank-1 and rank-(r1, r2,., rn) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications, 21(4), 1324–1342.
Article MathSciNet MATH Google Scholar
Delsarte, P., Macq, B., & Slock, D. (1992). Signal adapted multiresolution transform for image coding. IEEE Transaction on Signal Processing, 42(11), 2955–2966.
Google Scholar
Dobigeon, N., & Tourneret, J. Y. (2010). Bayesian orthogonal component analysis for sparse representation. IEEE Transaction on Signal Processing, 58(5), 2675–2685.
Article MathSciNet Google Scholar
Duarte-Carvajalino, J. M., & Sapiro, G. (2009). Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization. IEEE Transaction on Image Processing, 18(7), 1395–1408.
Article MathSciNet Google Scholar
Elad, M. (2010). Sparse and redundant representations from theory to applications in signal and image processing. Berlin: Springer.
Book MATH Google Scholar
Engan, K., Aase, S. O., & Hakon Husoy, J. (1999). Method of optimal directions for frame design. Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), Washington, DC (pp. 2443–2446).
Fadili, J., Starck, J. L., Elad, M., & Donoho, D. (2010). MCALab: Reproducible research in signal and image decomposition and inpainting. IEEE Computing in Science and Engineering, 12(1), 44–62.
Article Google Scholar
Grippo, G. L., & Sciandrone, M. (2000). On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Operations Research Letters, 26(3), 127–136.
Article MathSciNet MATH Google Scholar
Jenatton, R., Mairal, J., Obozinski, G., & Bach, F. (2010). Proximal methods for sparse hierarchical dictionary learning. ICML.
Jenatton, R., Mairal, J., Obozinski, G., & Bach, F. (2011). Proximal methods for hierarchical sparse coding. Journal of Machine Learning Research, 12, 2297–2334.
MathSciNet MATH Google Scholar
Kail, G., Tourneret, J. Y., Dobigeon, N., & Hlawatsch, F. (2012). Blind deconvolution of sparse pulse sequences under a minimum distance constraint: A partially collapsed Gibbs sampler method. IEEE Transaction on Signal Processing, 60(6), 2727–2743.
Article MathSciNet Google Scholar
Lesage, S., Gribonval, R., Bimbot, F., & Benaroya, L. (2005). Learning Unions of Orthonormal Bases with Thresholded Singular Value Decomposition. Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), Philadelphia, PA (Vol. V, pp. 293–296).
Lewicki, M. S., & Sejnowski, T. J. (2000). Learning overcomplete representations. Neural Computation, 12(2), 337–365.
Article Google Scholar
Lu, W. S., & Antoniou, A. (2000). Design of digital filters and filter banks by optimization: A state of the art review. Proceedings of EUSIPCO 2000, Tampere, Finland (Vol 1, pp. 351–354).
Luo, Z. Q., & Tseng, P. (1992). On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1), 7–35.
Article MathSciNet MATH Google Scholar
Macq, B., & Mertes, J. (1993). Optimization of linear multiresolution transforms for scene adaptive coding. IEEE Transaction on Signal Processing, 41(12), 3568–3572.
Article MATH Google Scholar
Mailhé, B., Lesage, S., Gribonval, R., Bimbot, F., & Vandergheynst, P. (2008). Shift-invariant dictionary learning for sparse representations: extending K-SVD. Proceedings of the European Signal Processing Conference (EUSIPCO), Lausanne, Switzerland.
Mairal, J., Sapiro, G., & Elad, M. (2008). Learning multiscale sparse representations for image and video restoration. SIAM Multiscale Modeling and Simulation, 7(1), 214–241.
Article MathSciNet MATH Google Scholar
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 10–60.
MathSciNet Google Scholar
Mairal, J., Bach, F., & Ponce, J. (2012). Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 791–804.
Article Google Scholar
Malgouyres, F., & Zeng, T. (2009). A predual proximal point algorithm solving a non negative basis pursuit denoising model. International Journal of Computer Vision, 83(3), 294–311.
Article Google Scholar
Muller, M. E. (1959). A note on a method for generating points uniformly on n-dimensional spheres. Association for Computing Machinery, 2(4), 19–20.
Article MATH Google Scholar
Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research, 37(23), 3311–3325.
Article Google Scholar
Ophir, B., Lustig, M., & Elad, M. (2011). Multi-scale dictionary learning using wavelets. IEEE Journal of Selected Topics in Signal Processing, 5(5), 1014–1024.
Article Google Scholar
Painter, T., & Spanias, A. (2000). Perceptual coding of digital audio. Proceedings of IEEE, 88(4), 451–515.
Peyré, G., Fadili, J., & Starck, J. L. (2010). Learning the morphological diversity. SIAM Journal on Imaging Sciences, 3(3), 646–669.
Article MathSciNet MATH Google Scholar
Princen, J. P., & Bradley, A. B. (1986). Analysis/synthesis filter bank design based on time domain aliasing cancellation. IEEE Transaction on Acoustic, Speech, and Signal Processing, 34(5), 1153–1161.
Article Google Scholar
Quinsac, C., Dobigeon, N., Basarab, A., Tourneret, J. Y., & Kouamé, D. (2011). Bayesian compressed sensing in ultrasound imaging. Proceedings of the Third International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP11), San Juan, Puerto Rico.
Razaviyayn, M., Hong, M., & Luo, Z. Q. (2013). A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization, 23(2), 1126–1153.
Article MathSciNet MATH Google Scholar
Rigamonti, R., Sironi, A., Lepetit, V., & Fua, P. (2013). Learning separable filters. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon.
Rubinstein, R., Bruckstein, A. M., & Elad, M. (2010a). Dictionaries for sparse representation. Proceedings of the IEEE, 98(6), 1045–1057.
Article Google Scholar
Rubinstein, R., Zibulevsky, M., & Elad, M. (2010b). Double sparsity: Learning sparse dictionaries for sparse signal approximation. IEEE Transaction on Signal Processing, 58(3), 1553–1564.
Article MathSciNet Google Scholar
Sallee, P., & Olshausen, B. A. (2002). Advances in neural information processing systems (pp. 1327–1334)., Learning sparse multiscale image representations Cambridge: MIT Press.
Google Scholar
Starck, J. L., Fadili, J., & Murtagh, F. (2007). The undecimated wavelet decomposition and its reconstruction. IEEE Transaction on Image Processing, 16(2), 297–309.
Article MathSciNet Google Scholar
Thiagarajan, J., Ramamurthy, K., & Spanias, A. (2011). Multilevel dictionary learning for sparse representation of images. Proceedings of the IEEE Digital Signal Processing Workshop and IEEE Signal Process. Education Workshop (DSP/SPE), Sedona, Arizona (pp. 271–276).
Tsiligkaridis, T., Hero, A., & Zhou, S. (2013). On convergence properties of Kronecker graphical lasso algorithms. IEEE Transaction on Signal Processing, 61(7), 1743–1755.
Article MathSciNet Google Scholar
Uhl, A. (1996). Image compression using non-stationary and inhomogeneous multiresolution analyses. Image and Vision Computing, 14(5), 365–371.
Article Google Scholar
Whittaker, E. T. (1915). On the functions which are represented by the expansions of the interpolation theory. Proceedings of the Royal Society of Edinburgh, 35, 181–194.
Article Google Scholar
Wiesel, A. (2012). Geodesic convexity and covariance estimation. IEEE Transaction on Signal Processing, 60(12), 6182–6189.
Article MathSciNet Google Scholar

Download references

Acknowledgments

The authors would like to thank Jose Bioucas-Dias, Jalal Fadili, Rémi Gribonval and Julien Mairal for their fruitful remarks on this work. Olivier Chabiron is supported by ANR-11-LABX-0040-CIMI within the program ANR-11-IDEX-0002-02.

Author information

Authors and Affiliations

Institut de Recherche en Informatique de Toulouse, IRIT-CNRS UMR 5505, ENSEEIHT, Toulouse, France
Olivier Chabiron, Jean-Yves Tourneret & Nicolas Dobigeon
Institut de Mathématiques de Toulouse, IMT-CNRS UMR 5219, Université de Toulouse, Toulouse, France
François Malgouyres

Authors

Olivier Chabiron
View author publications
You can also search for this author inPubMed Google Scholar
François Malgouyres
View author publications
You can also search for this author inPubMed Google Scholar
Jean-Yves Tourneret
View author publications
You can also search for this author inPubMed Google Scholar
Nicolas Dobigeon
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Olivier Chabiron.

Additional information

Communicated by Julien Mairal , Francis Bach , Michael Elad.

This work was performed during the Thematic Trimester on image processing of the CIMI Excellence Laboratory which was held in Toulouse, France, during the period May–June–July 2013.

Appendix

1.1 Proof of Proposition 1

First notice that ${\mathcal D}$ is a compact set. Moreover, when (7) holds, the objective function of $(P_1)$ is coercive in $\lambda $. Thus, for any threshold $\mu $, it is possible to build a compact set such that the objective function evaluated at any $(\lambda ,\mathbf h)$ outside this compact set is larger than $\mu $. As a consequence, we can extract a converging subsequence from any minimizing sequence. Since the objective function of $(P_1)$ is continuous in a closed domain, any limit point of this subsequence is a minimizer of $(P_1)$.

1.2 Proof of Proposition 2

The proof of 1 hinges on formulating the expression of a stationary point of $(P_1)$, then showing that the Lagrange multipliers associated with the norm-to-one constraint for the $(h^k)_{1 \le k \le K}$ are all equal to $0$. First, considering the partial differential of the objective function of $(P_1)$ with respect to $\lambda $ and a Lagrange multiplier $\gamma _\lambda \ge 0$ for the constraint $\lambda \ge 0$, we obtain

$$\begin{aligned} \lambda \Vert \alpha *h^1 * \dots * h^K \Vert _{2}^2 - \left\langle \alpha * h^1 * \dots * h^K,u \right\rangle =\frac{\gamma _\lambda }{2}, \end{aligned}$$

(19)

and

$$\begin{aligned} \lambda \gamma _\lambda = 0. \end{aligned}$$

(20)

Then, considering Lagrange multipliers $\gamma _k\in \mathbb R$ associated with each constraint $\Vert h^k \Vert _{2}=1$, we have for all $k \in \{1,\dots ,K\}$

$$\begin{aligned} \lambda \tilde{H}^k * (\lambda \alpha *h^1 * \dots * h^K-u) = \gamma _k h^k, \end{aligned}$$

(21)

where $H^k$ is defined by (5). Taking the scalar product of (21) with $h^k$ and using both $\Vert h^k\Vert _2=1$ and (19), we obtain

$$\begin{aligned} \gamma _k = \lambda \frac{\gamma _\lambda }{2} =0, \quad \forall k \in \{1,\dots ,k\}. \end{aligned}$$

Hence, (21) takes the form, for all $k \in \{1,\dots ,K\}$

$$\begin{aligned} \lambda \tilde{H}^k * (\lambda \alpha *h^1 * \dots * h^K-u) = 0. \end{aligned}$$

(22)

When $\lambda >0$, this immediately implies that the kernels $\mathbf g$ defined by (8) satisfy

$$\begin{aligned}\frac{\partial E}{\partial h^k} \left( \mathbf h \right) = 0 ,\quad \forall k\in \{1,\ldots K\}, \end{aligned}$$

i.e., the kernels $\mathbf g \in (\mathbb {R}^{\mathcal {P}})^K$ form a stationary point of $(P_0)$.

The proof of the item 2 is straightforward since for any $(f^k)_{1 \le k \le K} \in (\mathbb {R}^{\mathcal {P}})^K$ satisfying the constraints of $(P_0)$ ^{Footnote 8}, we have

$$\begin{aligned}&\Vert \alpha *g^1*\ldots *g^K - u \Vert _{2}^2 \\&= \Vert \lambda \alpha * h^1*\ldots *h^K - u \Vert _{2}^2 \\&\le \left\| \left( \prod _{k=1}^K \Vert f^k \Vert _{2}\right) ~~\alpha * \frac{f^1}{\Vert f^1 \Vert _{2}}*\ldots *\frac{f^K}{\Vert f^K \Vert _{2}}- u\right\| _2^2 \\&\le \Vert \alpha *f^1*\ldots *f^K - u \Vert _{2}^2. \end{aligned}$$

As a consequence, the kernels $(g^k)_{1\le k\le K}$ defined by (8) form a solution of $(P_0)$.

1.3 Proof of Proposition 3

The first item of proposition 3 can be obtained directly since 1) the sequence of kernels generated by the algorithm belongs to ${\mathcal D}$ and ${\mathcal D}$ is compact, 2) the objective function of $(P_1)$ is coercive with respect to $\lambda $ when (13) holds, and 3) the objective function is continuous and decreases during the iterative process.

To prove the second item of proposition 3, we consider a limit point $(\lambda ^*,\mathbf h^*) \in \mathbb R\times {\mathcal D}$. We denote by $F$ the objective function of $(P_1)$ and denote by $(\lambda ^o,\mathbf h^o)_{o\in \mathbb N}$ a subsequence of $(\lambda ^n,\mathbf h^n)_{n\in \mathbb N}$ which converges to $(\lambda ^*,\mathbf h^*)$. The following statements are trivially true, since $F$ is continuous and $\left( F(\lambda ^n,\mathbf h^n)\right) _{n\in \mathbb N}$ decreases:

$$\begin{aligned} \lim _{o\rightarrow \infty } F\left( T(\mathbf h^o) \right) = \lim _{o\rightarrow \infty } F(\lambda ^o,\mathbf h^o) = F(\lambda ^*,\mathbf h^*) \end{aligned}$$

(23)

However, if for any $k$ inside $\{1,\ldots , K\}$, we have $C_k^Tu\ne 0$ and the matrix $C_k$ generated using $T_k(\mathbf h^*)$ is full column rank, then there exist an open neighborhood of $T_k(\mathbf h^*)$ such that these conditions remain true for the matrices $C_k$ generated from kernels $\mathbf h$ in this neighborhood. As a consequence, the $k$th iteration of the for loop is a continuous mapping on this neighborhood. Finally, we deduce that there is a neighborhood of $\mathbf h^*$ in which $T$ is continuous.

Since $T$ is continuous in the vicinity of $\mathbf h^*$ and $(\mathbf h^o)_{o\in \mathbb N}$ converges to $(\mathbf h^*)$, the sequence $(T(\mathbf h^o))_{o\in \mathbb N}$ converges to $T(\mathbf h^*)$ and (23) guarantees that

$$\begin{aligned} F\left( T(\mathbf h^*) \right) =F(\lambda ^*,\mathbf h^*) . \end{aligned}$$

As a consequence, denoting $\mathbf h^*=(h^{*,k})_{1\le k\le K}$, for every $k\in \{1,\ldots ,K\}$, $F(\lambda ^*, h^{*,k})$ is equal to the minimal value of $(P_k)$. Since $C_k$ is full column rank, we know that this minimizer is unique (see the end of Sect. 3.2) and therefore $(\lambda ^*, h^{*,k})$ is the unique minimizer of $(P_k)$. We can then deduce that $(\lambda ^*,\mathbf h^*)=T(\mathbf h^*)$.

Finally, we also know that $(\lambda ^*,\mathbf h^*)$ is a stationary point of $(P_k)$. Combining all the equations stating that, for any $k$, $(\lambda ^*, \mathbf h^{*,k})$ is a stationary point of $(P_k)$, we can find that $(\lambda ^*,\mathbf h^*)$ is a stationary point of $(P_1)$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chabiron, O., Malgouyres, F., Tourneret, JY. et al. Toward Fast Transform Learning. Int J Comput Vis 114, 195–216 (2015). https://doi.org/10.1007/s11263-014-0771-z

Download citation

Received: 02 December 2013
Accepted: 15 September 2014
Published: 11 October 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s11263-014-0771-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward Fast Transform Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Accelerated Dictionary Learning for Sparse Signal Representation

Robust K-SVD: A Novel Approach for Dictionary Learning

Compressed Dictionary Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proof of Proposition 1

1.2 Proof of Proposition 2

1.3 Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Toward Fast Transform Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Accelerated Dictionary Learning for Sparse Signal Representation

Robust K-SVD: A Novel Approach for Dictionary Learning

Compressed Dictionary Learning

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of Proposition 1

1.2 Proof of Proposition 2

1.3 Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.