Skip to main content
Log in

Translation model based on discrete Fourier transform and Skipping Sub-Layer methods

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Machine translation quality has seen tremendous improvement since the development of neural machine translation. However, translation models are memory intensive, with expensive hardware facilities and slow training speed. To reduce memory requirements and speed up translation, we propose the Transformer Discrete Fourier method with Skipping Sub-Layer (TF-SSL), which incorporates the discrete Fourier transform and a Skipping Sub-Layer algorithm, after relative positional embedding for Chinese and English source sentences. The input sequence is based on a Transformer model in the relative positional embedding layer, and the text is transformed into word vectors with information encoding via the embedding matrix, so that the word vectors can effectively capture interdependences between the texts. We distribute the transform coefficient matrix after the 2D Fourier transform near the center of the Encoder layer with a short matrix of transform coefficients, which accelerates translation on a GPU. The accuracy and speed are improved by skipping the sub-layer method, and the sub-layer is randomly omitted to introduce disturbance to the training, thus imposing greater constraint effects on the sub-layers. We conduct the ablation study and comparative analyses. Results show that our approach achieves improvement in both BLEU scores and GFLOPS values compared to the baseline Transformer model and other deep learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data that support the findings of this study are openly available. WMT18 News Commentary v13 dataset at https://www.aclweb.org/anthology/volumes/W18-64/, OpenSubtitles2016 dataset at https://aclanthology.org/L16-1147/, WMT2017 dataset at https://aclanthology.org/volumes/W17-47/.

References

  1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł and Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30

  2. Bapna A, Chen MX, Firat O, Cao Y, and Wu Y (2018) Training deeper neural machine translation models with transparent attention. In: Empirical Methods in Natural Language Processing, pages 3028–3033

  3. Wu L, Wang Y, Xia Y, Tian F, Gao F, Qin T, Lai J and Liu T-Y (2019) Depth growing for neural machine translation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), pages 5558–5563

  4. Wu L, Wang Y, Xia Y, Tian F, Gao F, Qin T, Lai J and Liu T-Y (2019) Improving deep transformer with depth-scaled initialization and merged attention. In: Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 898–909, Hong Kong, China

  5. Xu H, Liu Q, van Genabith J, Xiong D and Zhang J (2020) Lipschitz-constrained regularization of self-attention mechanism for machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1348–1359

  6. Huang XS, Perez F, Ba J and Volkovs M (2020) A reinforcement learning approach. In Advances in Neural Information Processing Systems, Improving knowledge distillation with teacher assistant

  7. Clark K, Khandelwal U, Levy O and Manning CD (2019) What does BERT look at? An analysis of bert’s attention. In: Linzen T, Chrupala G, Belinkov Y and Hupkes D (eds) Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@ACL 2019, Florence, Italy, August 1, 2019, pages 276–286

  8. Tenney EPI, Das D (2019) Bert rediscovers the classical NLP pipeline. Assoc Comput Linguist 2019:4593–4601

    Google Scholar 

  9. Wu F, Fan A, Baevski A, Dauphin YN and Auli M (2019) Pay less attention with lightweight and dynamic convolutions. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019

  10. So D, Le Q and Liang C (2019) The evolved transformer. In: International Conference on Machine Learning, pages 5877–5886. PMLR

  11. Meng F, Zhang J (2019) Dtmt: a novel deep transition architecture for neural machine translation. Proc AAAI Conf Artif Intell 33:224–231

    Google Scholar 

  12. Chen K, Wang R, Utiyama M and Sumita E (2019) Neural machine translation with reordering embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1787–1799

  13. Liao B, Khadivi S and Hewavitharana S (2021) Back-translation for large-scale multilingual machine translation. arXiv preprint arXiv:2109.08712

  14. Abdulmumin I, Galadanci BS, Ahmad IS and Abdullahi RI (2021) Data selection as an alternative to quality estimation in self-learning for low resource neural machine translation. In: International Conference on Computational Science and Its Applications, pages 311–326. Springer

  15. Shi Y, Wang Y, Wu C, Yeh C-F, Chan J, Zhang F, Le D and Seltzer M (2021) Emformer: efficient memory transformer based acoustic model for low latency streaming speech recognition. In: ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6783–6787. IEEE

  16. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(04):303–314

    Article  MathSciNet  Google Scholar 

  17. Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39(03):930–945

    Article  MathSciNet  Google Scholar 

  18. Minami K-I, Nakajima H and Toyoshima T (1999) Real-time discrimination of ventricular tachyarrhythmia with Fourier-transform neural network. IEEE Trans Biomed Eng 179–185

  19. Gothwal H, Kedawat S, Kumar R (2011) Cardiac arrhythmias detection in an ECG beat signal using fast Fourier transform and artificial neural network. J Biomed Sci Eng 4(04):289

    Article  Google Scholar 

  20. Bíla J, Mironovova M (2015) Fast Fourier transform for feature extraction and neural network for classification of electrocardiogram signals. In: Future Generation Communication Technology (FGCT 2015), Luton, United Kingdom, pages 1–6

  21. Zhang KWZ, Wang Y (2013) Fault diagnosis and prognosis using wavelet packet decomposition, Fourier transform and artificial neural network. J Intell Manuf 24(06):1213–1227

    Article  Google Scholar 

  22. Choromanski K, Likhosherstov V, Dohan D, Song X, Davis J, Sarlós T, Belanger D, Colwell LJ and Weller A (2020) Masked language modeling for proteins via linearly scalable long-context transformers. Comput Res Repository

  23. Goodman ND, Tamkin A, Jurafsky D (2020) Language through a prism: a spectral approach for multiscale language representations. Adv Neural Inf Process Syst 33:5492–5504

    Google Scholar 

  24. Cohan A, Beltagy I, Peters ME (2020) Longformer: the long-document transformer. Adv Neural Inf Process Syst

  25. Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontañón S, Pham P, Ravula A, Wang Q, Yang L and Ahmed A (2020) Big bird: transformers for longer sequences. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6516–6532

  26. Tay Y, Dehghani M, Abnar S, Shen Y, Bahri D, Pham P, Rao J, Yang L, Ruder S and Metzler D (2020) Long range arena: a benchmark for efficient transformers. Comput Res Repository

  27. Goodman ND, Tamkin A, Jurafsky D (2020) Fast transformers with clustered attention. Adv Neural Inf Process Syst 33:21665–21674

    Google Scholar 

  28. Choromanski K, Likhosherstov V, Dohan D, Song X, Gane A, Sarlós T, Hawkins P, Davis J, Mohiuddin A, Kaiser L, Belanger D, Colwell LJ and Weller A. Rethinking attention with performers. Comput Res Repository (2020)

  29. Shaw P, Uszkoreit J and Vaswani A (2018) Self-attention with relative position representations. arXiv preprint arXiv:1803.02155

  30. Lee-Thorp J, Ainslie J, Eckstein I and Ontanon S (2021) Fnet: Mixing tokens with Fourier transforms. arXiv preprint arXiv:2105.03824

  31. Schmidhuber J, Greff K, Srivastava RK (2016) Highway and residual networks learn unrolled iterative estimation. Comput Res Repository

  32. Fan A, Grave E and Joulin A (2020) Reducing transformer depth on demand with structured dropout. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, pages 26–30

  33. Li B, Wang Z, Liu H, Quan D, Xiao T, Zhang C, Zhu J (2021) Learning light-weight translation models from deep transformer. Proc AAAI Conf Artif Intell 35:13217–13225

    Google Scholar 

  34. Papineni K, Roukos S, Ward T and Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318

  35. Shankar V (1991) A gigaflop performance algorithm for solving Maxwell’s equations of electromagnetics. In: Computational Fluid Dynamics Conference, page 1578

  36. Wei X, Yu H, Hu Y, Zhang Y, Weng R and Luo W (2020) Multiscale collaborative deep models for neural machine translation. In: Annual Meeting of the Association for Computational Linguistics, pages 414–426

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaoqian Zhong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Chen, S., Liu, Z. et al. Translation model based on discrete Fourier transform and Skipping Sub-Layer methods. Int. J. Mach. Learn. & Cyber. 15, 4435–4444 (2024). https://doi.org/10.1007/s13042-024-02156-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-024-02156-w

Keywords

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy