Skip to main content
Log in

Toward Fast Transform Learning

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper introduces a new dictionary learning strategy based on atoms obtained by translating the composition of \(K\) convolutions with \(S\)-sparse kernels of known support. The dictionary update step associated with this strategy is a non-convex optimization problem. We propose a practical formulation of this problem and introduce a Gauss–Seidel type algorithm referred to as alternative least square algorithm for its resolution. The search space of the proposed algorithm is of dimension \(KS\), which is typically smaller than the size of the target atom and much smaller than the size of the image. Moreover, the complexity of this algorithm is linear with respect to the image size, allowing larger atoms to be learned (as opposed to small patches). The conducted experiments show that we are able to accurately approximate atoms such as wavelets, curvelets, sinc functions or cosines for large values of K. The proposed experiments also indicate that the algorithm generally converges to a global minimum for large values of \(K\) and \(S\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. All the signals in \(\mathbb R^{\mathcal P}\) are extended by periodization to be defined at any point in \(\mathbb Z^d\).

  2. \(\mathbb {R}^{\mathcal {P}}\) and \(\mathbb R^S\) are endowed with the usual scalar product denoted \(\langle . , . \rangle \) and the usual Euclidean norm denoted \(\Vert \cdot \Vert _{2}\). We use the same notation whatever the vector space. We expect that the notation will not be ambiguous, once in context.

  3. Usually, DL is applied to small images such as patches extracted from large images.

  4. In the practical situations we are interested in, \(\#{\mathcal P}\gg S\) and \(S^3\) can be neglected when compared to \((K+S)S\#{\mathcal P}\).

  5. For simplicity, in the formula below, we do not mention the mapping of \(\mathbb R^S\) into \(\mathbb R^{\mathcal P}\) necessary to build \(h^k\).

  6. In this case the comparison is relevant, because \(\alpha \) is a Dirac delta function.

  7. A sum of cosines of same frequency and different phases will yield a cosine of unchanged frequency.

  8. We further assume that \(\Vert f^k \Vert _{2}\ne 0\), for all \(k\in \{1,\ldots ,K\}\), since the inequality is otherwise trivial.

References

  • Aharon, M., Elad, M., & Bruckstein, A. M. (2006). The K-SVD, an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.

  • Aldroubi, A., Unser, M., & Eden, M. (1992). Cardinal spline filters: Stability and convergence to the ideal sinc interpolator. Signal Processing, 28(2), 127–138.

    Article  MathSciNet  Google Scholar 

  • Attouch, H., Bolte, J., Redont, P., & Soubeyran, A. (2010). Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka–Łojasiewicz inequality. Mathematics of Operations Research, 35(2), 438–457. doi:10.1287/moor.1100.0449.

    Article  MathSciNet  MATH  Google Scholar 

  • Attouch, H., Bolte, J., & Svaiter, B. (2013). Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Mathematical Programming, 137(1–2), 91–129.

    Article  MathSciNet  MATH  Google Scholar 

  • Bengio, Y., & LeCun, Y. (2007). Scaling learning algorithms towards AI. Large-Scale Kernel Machines, 34, 1–41.

    MATH  Google Scholar 

  • Bertsekas, D. (2003). Convex analysis and optimization. Belmont: Athena Scientific.

    Google Scholar 

  • Bolte, J., Sabach, S., & Teboulle, M. (2013). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, series A, 146, 1–16. doi:10.1007/s10107-013-0701-9.

    Google Scholar 

  • Cai, J. F., Ji, H., Shen, Z., & Ye, G. B. (2014). Data-driven tight frame construction and image denoising. Applied and Computational Harmonic Analysis, 37(1), 89–105.

  • Champagnat, F., Goussard, Y., & Idier, J. (1996). Unsupervised deconvolution of sparse spike trains using stochastic approximation. IEEE Transaction on Signal Processing, 44(12), 2988–2998.

    Article  Google Scholar 

  • Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.

    Article  MathSciNet  Google Scholar 

  • Chouzenoux, E., Pesquet, J., & Repetti, A. (2013). A block coordinate variable metric forward-backward algorithm. Tech. Rep. 00945918, HAL.

  • Cohen, A., & Séré, E. (1996). Time-frequency localization by non-stationary wavelet packet. In M. T. Smith & A. Akansu (Eds.), Subband and Wavelet Transforms: Theory and Design. Boston: Kluwer Academic Publisher.

    Google Scholar 

  • Daubechies, I., Defrise, M., & De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57(11), 1413–1457.

    Article  MathSciNet  MATH  Google Scholar 

  • De Lathauwer, L., De Moor, B., & Vandewalle, J. (2000). On the best rank-1 and rank-(r1, r2,., rn) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications, 21(4), 1324–1342.

    Article  MathSciNet  MATH  Google Scholar 

  • Delsarte, P., Macq, B., & Slock, D. (1992). Signal adapted multiresolution transform for image coding. IEEE Transaction on Signal Processing, 42(11), 2955–2966.

    Google Scholar 

  • Dobigeon, N., & Tourneret, J. Y. (2010). Bayesian orthogonal component analysis for sparse representation. IEEE Transaction on Signal Processing, 58(5), 2675–2685.

    Article  MathSciNet  Google Scholar 

  • Duarte-Carvajalino, J. M., & Sapiro, G. (2009). Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization. IEEE Transaction on Image Processing, 18(7), 1395–1408.

    Article  MathSciNet  Google Scholar 

  • Elad, M. (2010). Sparse and redundant representations from theory to applications in signal and image processing. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Engan, K., Aase, S. O., & Hakon Husoy, J. (1999). Method of optimal directions for frame design. Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), Washington, DC (pp. 2443–2446).

  • Fadili, J., Starck, J. L., Elad, M., & Donoho, D. (2010). MCALab: Reproducible research in signal and image decomposition and inpainting. IEEE Computing in Science and Engineering, 12(1), 44–62.

    Article  Google Scholar 

  • Grippo, G. L., & Sciandrone, M. (2000). On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Operations Research Letters, 26(3), 127–136.

    Article  MathSciNet  MATH  Google Scholar 

  • Jenatton, R., Mairal, J., Obozinski, G., & Bach, F. (2010). Proximal methods for sparse hierarchical dictionary learning. ICML.

  • Jenatton, R., Mairal, J., Obozinski, G., & Bach, F. (2011). Proximal methods for hierarchical sparse coding. Journal of Machine Learning Research, 12, 2297–2334.

    MathSciNet  MATH  Google Scholar 

  • Kail, G., Tourneret, J. Y., Dobigeon, N., & Hlawatsch, F. (2012). Blind deconvolution of sparse pulse sequences under a minimum distance constraint: A partially collapsed Gibbs sampler method. IEEE Transaction on Signal Processing, 60(6), 2727–2743.

    Article  MathSciNet  Google Scholar 

  • Lesage, S., Gribonval, R., Bimbot, F., & Benaroya, L. (2005). Learning Unions of Orthonormal Bases with Thresholded Singular Value Decomposition. Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), Philadelphia, PA (Vol. V, pp. 293–296).

  • Lewicki, M. S., & Sejnowski, T. J. (2000). Learning overcomplete representations. Neural Computation, 12(2), 337–365.

    Article  Google Scholar 

  • Lu, W. S., & Antoniou, A. (2000). Design of digital filters and filter banks by optimization: A state of the art review. Proceedings of EUSIPCO 2000, Tampere, Finland (Vol 1, pp. 351–354).

  • Luo, Z. Q., & Tseng, P. (1992). On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1), 7–35.

    Article  MathSciNet  MATH  Google Scholar 

  • Macq, B., & Mertes, J. (1993). Optimization of linear multiresolution transforms for scene adaptive coding. IEEE Transaction on Signal Processing, 41(12), 3568–3572.

    Article  MATH  Google Scholar 

  • Mailhé, B., Lesage, S., Gribonval, R., Bimbot, F., & Vandergheynst, P. (2008). Shift-invariant dictionary learning for sparse representations: extending K-SVD. Proceedings of the European Signal Processing Conference (EUSIPCO), Lausanne, Switzerland.

  • Mairal, J., Sapiro, G., & Elad, M. (2008). Learning multiscale sparse representations for image and video restoration. SIAM Multiscale Modeling and Simulation, 7(1), 214–241.

    Article  MathSciNet  MATH  Google Scholar 

  • Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 10–60.

    MathSciNet  Google Scholar 

  • Mairal, J., Bach, F., & Ponce, J. (2012). Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 791–804.

    Article  Google Scholar 

  • Malgouyres, F., & Zeng, T. (2009). A predual proximal point algorithm solving a non negative basis pursuit denoising model. International Journal of Computer Vision, 83(3), 294–311.

    Article  Google Scholar 

  • Muller, M. E. (1959). A note on a method for generating points uniformly on n-dimensional spheres. Association for Computing Machinery, 2(4), 19–20.

    Article  MATH  Google Scholar 

  • Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research, 37(23), 3311–3325.

    Article  Google Scholar 

  • Ophir, B., Lustig, M., & Elad, M. (2011). Multi-scale dictionary learning using wavelets. IEEE Journal of Selected Topics in Signal Processing, 5(5), 1014–1024.

    Article  Google Scholar 

  • Painter, T., & Spanias, A. (2000). Perceptual coding of digital audio. Proceedings of IEEE, 88(4), 451–515.

  • Peyré, G., Fadili, J., & Starck, J. L. (2010). Learning the morphological diversity. SIAM Journal on Imaging Sciences, 3(3), 646–669.

    Article  MathSciNet  MATH  Google Scholar 

  • Princen, J. P., & Bradley, A. B. (1986). Analysis/synthesis filter bank design based on time domain aliasing cancellation. IEEE Transaction on Acoustic, Speech, and Signal Processing, 34(5), 1153–1161.

    Article  Google Scholar 

  • Quinsac, C., Dobigeon, N., Basarab, A., Tourneret, J. Y., & Kouamé, D. (2011). Bayesian compressed sensing in ultrasound imaging. Proceedings of the Third International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP11), San Juan, Puerto Rico.

  • Razaviyayn, M., Hong, M., & Luo, Z. Q. (2013). A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization, 23(2), 1126–1153.

    Article  MathSciNet  MATH  Google Scholar 

  • Rigamonti, R., Sironi, A., Lepetit, V., & Fua, P. (2013). Learning separable filters. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon.

  • Rubinstein, R., Bruckstein, A. M., & Elad, M. (2010a). Dictionaries for sparse representation. Proceedings of the IEEE, 98(6), 1045–1057.

    Article  Google Scholar 

  • Rubinstein, R., Zibulevsky, M., & Elad, M. (2010b). Double sparsity: Learning sparse dictionaries for sparse signal approximation. IEEE Transaction on Signal Processing, 58(3), 1553–1564.

    Article  MathSciNet  Google Scholar 

  • Sallee, P., & Olshausen, B. A. (2002). Advances in neural information processing systems (pp. 1327–1334)., Learning sparse multiscale image representations Cambridge: MIT Press.

    Google Scholar 

  • Starck, J. L., Fadili, J., & Murtagh, F. (2007). The undecimated wavelet decomposition and its reconstruction. IEEE Transaction on Image Processing, 16(2), 297–309.

    Article  MathSciNet  Google Scholar 

  • Thiagarajan, J., Ramamurthy, K., & Spanias, A. (2011). Multilevel dictionary learning for sparse representation of images. Proceedings of the IEEE Digital Signal Processing Workshop and IEEE Signal Process. Education Workshop (DSP/SPE), Sedona, Arizona (pp. 271–276).

  • Tsiligkaridis, T., Hero, A., & Zhou, S. (2013). On convergence properties of Kronecker graphical lasso algorithms. IEEE Transaction on Signal Processing, 61(7), 1743–1755.

    Article  MathSciNet  Google Scholar 

  • Uhl, A. (1996). Image compression using non-stationary and inhomogeneous multiresolution analyses. Image and Vision Computing, 14(5), 365–371.

    Article  Google Scholar 

  • Whittaker, E. T. (1915). On the functions which are represented by the expansions of the interpolation theory. Proceedings of the Royal Society of Edinburgh, 35, 181–194.

    Article  Google Scholar 

  • Wiesel, A. (2012). Geodesic convexity and covariance estimation. IEEE Transaction on Signal Processing, 60(12), 6182–6189.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Jose Bioucas-Dias, Jalal Fadili, Rémi Gribonval and Julien Mairal for their fruitful remarks on this work. Olivier Chabiron is supported by ANR-11-LABX-0040-CIMI within the program ANR-11-IDEX-0002-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olivier Chabiron.

Additional information

Communicated by Julien Mairal , Francis Bach , Michael Elad.

This work was performed during the Thematic Trimester on image processing of the CIMI Excellence Laboratory which was held in Toulouse, France, during the period May–June–July 2013.

Appendix

Appendix

1.1 Proof of Proposition 1

First notice that \({\mathcal D}\) is a compact set. Moreover, when (7) holds, the objective function of \((P_1)\) is coercive in \(\lambda \). Thus, for any threshold \(\mu \), it is possible to build a compact set such that the objective function evaluated at any \((\lambda ,\mathbf h)\) outside this compact set is larger than \(\mu \). As a consequence, we can extract a converging subsequence from any minimizing sequence. Since the objective function of \((P_1)\) is continuous in a closed domain, any limit point of this subsequence is a minimizer of \((P_1)\).

1.2 Proof of Proposition 2

The proof of 1 hinges on formulating the expression of a stationary point of \((P_1)\), then showing that the Lagrange multipliers associated with the norm-to-one constraint for the \((h^k)_{1 \le k \le K}\) are all equal to \(0\). First, considering the partial differential of the objective function of \((P_1)\) with respect to \(\lambda \) and a Lagrange multiplier \(\gamma _\lambda \ge 0\) for the constraint \(\lambda \ge 0\), we obtain

$$\begin{aligned} \lambda \Vert \alpha *h^1 * \dots * h^K \Vert _{2}^2 - \left\langle \alpha * h^1 * \dots * h^K,u \right\rangle =\frac{\gamma _\lambda }{2}, \end{aligned}$$
(19)

and

$$\begin{aligned} \lambda \gamma _\lambda = 0. \end{aligned}$$
(20)

Then, considering Lagrange multipliers \(\gamma _k\in \mathbb R\) associated with each constraint \(\Vert h^k \Vert _{2}=1\), we have for all \(k \in \{1,\dots ,K\}\)

$$\begin{aligned} \lambda \tilde{H}^k * (\lambda \alpha *h^1 * \dots * h^K-u) = \gamma _k h^k, \end{aligned}$$
(21)

where \(H^k\) is defined by (5). Taking the scalar product of (21) with \(h^k\) and using both \(\Vert h^k\Vert _2=1\) and (19), we obtain

$$\begin{aligned} \gamma _k = \lambda \frac{\gamma _\lambda }{2} =0, \quad \forall k \in \{1,\dots ,k\}. \end{aligned}$$

Hence, (21) takes the form, for all \(k \in \{1,\dots ,K\}\)

$$\begin{aligned} \lambda \tilde{H}^k * (\lambda \alpha *h^1 * \dots * h^K-u) = 0. \end{aligned}$$
(22)

When \(\lambda >0\), this immediately implies that the kernels \(\mathbf g\) defined by (8) satisfy

$$\begin{aligned}\frac{\partial E}{\partial h^k} \left( \mathbf h \right) = 0 ,\quad \forall k\in \{1,\ldots K\}, \end{aligned}$$

i.e., the kernels \(\mathbf g \in (\mathbb {R}^{\mathcal {P}})^K\) form a stationary point of \((P_0)\).

The proof of the item 2 is straightforward since for any \((f^k)_{1 \le k \le K} \in (\mathbb {R}^{\mathcal {P}})^K\) satisfying the constraints of \((P_0)\) Footnote 8, we have

$$\begin{aligned}&\Vert \alpha *g^1*\ldots *g^K - u \Vert _{2}^2 \\&= \Vert \lambda \alpha * h^1*\ldots *h^K - u \Vert _{2}^2 \\&\le \left\| \left( \prod _{k=1}^K \Vert f^k \Vert _{2}\right) ~~\alpha * \frac{f^1}{\Vert f^1 \Vert _{2}}*\ldots *\frac{f^K}{\Vert f^K \Vert _{2}}- u\right\| _2^2 \\&\le \Vert \alpha *f^1*\ldots *f^K - u \Vert _{2}^2. \end{aligned}$$

As a consequence, the kernels \((g^k)_{1\le k\le K}\) defined by (8) form a solution of \((P_0)\).

1.3 Proof of Proposition 3

The first item of proposition 3 can be obtained directly since 1) the sequence of kernels generated by the algorithm belongs to \({\mathcal D}\) and \({\mathcal D}\) is compact, 2) the objective function of \((P_1)\) is coercive with respect to \(\lambda \) when (13) holds, and 3) the objective function is continuous and decreases during the iterative process.

To prove the second item of proposition 3, we consider a limit point \((\lambda ^*,\mathbf h^*) \in \mathbb R\times {\mathcal D}\). We denote by \(F\) the objective function of \((P_1)\) and denote by \((\lambda ^o,\mathbf h^o)_{o\in \mathbb N}\) a subsequence of \((\lambda ^n,\mathbf h^n)_{n\in \mathbb N}\) which converges to \((\lambda ^*,\mathbf h^*)\). The following statements are trivially true, since \(F\) is continuous and \(\left( F(\lambda ^n,\mathbf h^n)\right) _{n\in \mathbb N}\) decreases:

$$\begin{aligned} \lim _{o\rightarrow \infty } F\left( T(\mathbf h^o) \right) = \lim _{o\rightarrow \infty } F(\lambda ^o,\mathbf h^o) = F(\lambda ^*,\mathbf h^*) \end{aligned}$$
(23)

However, if for any \(k\) inside \(\{1,\ldots , K\}\), we have \(C_k^Tu\ne 0\) and the matrix \(C_k\) generated using \(T_k(\mathbf h^*)\) is full column rank, then there exist an open neighborhood of \(T_k(\mathbf h^*)\) such that these conditions remain true for the matrices \(C_k\) generated from kernels \(\mathbf h\) in this neighborhood. As a consequence, the \(k\)th iteration of the for loop is a continuous mapping on this neighborhood. Finally, we deduce that there is a neighborhood of \(\mathbf h^*\) in which \(T\) is continuous.

Since \(T\) is continuous in the vicinity of \(\mathbf h^*\) and \((\mathbf h^o)_{o\in \mathbb N}\) converges to \((\mathbf h^*)\), the sequence \((T(\mathbf h^o))_{o\in \mathbb N}\) converges to \(T(\mathbf h^*)\) and (23) guarantees that

$$\begin{aligned} F\left( T(\mathbf h^*) \right) =F(\lambda ^*,\mathbf h^*) . \end{aligned}$$

As a consequence, denoting \(\mathbf h^*=(h^{*,k})_{1\le k\le K}\), for every \(k\in \{1,\ldots ,K\}\), \(F(\lambda ^*, h^{*,k})\) is equal to the minimal value of \((P_k)\). Since \(C_k\) is full column rank, we know that this minimizer is unique (see the end of Sect. 3.2) and therefore \((\lambda ^*, h^{*,k})\) is the unique minimizer of \((P_k)\). We can then deduce that \((\lambda ^*,\mathbf h^*)=T(\mathbf h^*)\).

Finally, we also know that \((\lambda ^*,\mathbf h^*)\) is a stationary point of \((P_k)\). Combining all the equations stating that, for any \(k\), \((\lambda ^*, \mathbf h^{*,k})\) is a stationary point of \((P_k)\), we can find that \((\lambda ^*,\mathbf h^*)\) is a stationary point of \((P_1)\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chabiron, O., Malgouyres, F., Tourneret, JY. et al. Toward Fast Transform Learning. Int J Comput Vis 114, 195–216 (2015). https://doi.org/10.1007/s11263-014-0771-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0771-z

Keywords

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy