Translation model based on discrete Fourier transform and Skipping Sub-Layer methods

Li, Yuchen; Chen, Shuxu; Liu, Zhuoya; Che, Chao; Zhong, Zhaoqian

doi:10.1007/s13042-024-02156-w

Translation model based on discrete Fourier transform and Skipping Sub-Layer methods

Original Article
Published: 22 April 2024

Volume 15, pages 4435–4444, (2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

202 Accesses
1 Citation
Explore all metrics

Abstract

Machine translation quality has seen tremendous improvement since the development of neural machine translation. However, translation models are memory intensive, with expensive hardware facilities and slow training speed. To reduce memory requirements and speed up translation, we propose the Transformer Discrete Fourier method with Skipping Sub-Layer (TF-SSL), which incorporates the discrete Fourier transform and a Skipping Sub-Layer algorithm, after relative positional embedding for Chinese and English source sentences. The input sequence is based on a Transformer model in the relative positional embedding layer, and the text is transformed into word vectors with information encoding via the embedding matrix, so that the word vectors can effectively capture interdependences between the texts. We distribute the transform coefficient matrix after the 2D Fourier transform near the center of the Encoder layer with a short matrix of transform coefficients, which accelerates translation on a GPU. The accuracy and speed are improved by skipping the sub-layer method, and the sub-layer is randomly omitted to introduce disturbance to the training, thus imposing greater constraint effects on the sub-layers. We conduct the ablation study and comparative analyses. Results show that our approach achieves improvement in both BLEU scores and GFLOPS values compared to the baseline Transformer model and other deep learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Neural Machine Translation Model with Deep Encoding Information

Article 15 May 2021

Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese

Article 30 November 2022

English-to-Bangla Language Translation: A Comparative Study of GPT-3 and Attention-Based Encoder Decoder

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data that support the findings of this study are openly available. WMT18 News Commentary v13 dataset at https://www.aclweb.org/anthology/volumes/W18-64/, OpenSubtitles2016 dataset at https://aclanthology.org/L16-1147/, WMT2017 dataset at https://aclanthology.org/volumes/W17-47/.

References

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł and Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Bapna A, Chen MX, Firat O, Cao Y, and Wu Y (2018) Training deeper neural machine translation models with transparent attention. In: Empirical Methods in Natural Language Processing, pages 3028–3033
Wu L, Wang Y, Xia Y, Tian F, Gao F, Qin T, Lai J and Liu T-Y (2019) Depth growing for neural machine translation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), pages 5558–5563
Wu L, Wang Y, Xia Y, Tian F, Gao F, Qin T, Lai J and Liu T-Y (2019) Improving deep transformer with depth-scaled initialization and merged attention. In: Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 898–909, Hong Kong, China
Xu H, Liu Q, van Genabith J, Xiong D and Zhang J (2020) Lipschitz-constrained regularization of self-attention mechanism for machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1348–1359
Huang XS, Perez F, Ba J and Volkovs M (2020) A reinforcement learning approach. In Advances in Neural Information Processing Systems, Improving knowledge distillation with teacher assistant
Clark K, Khandelwal U, Levy O and Manning CD (2019) What does BERT look at? An analysis of bert’s attention. In: Linzen T, Chrupala G, Belinkov Y and Hupkes D (eds) Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@ACL 2019, Florence, Italy, August 1, 2019, pages 276–286
Tenney EPI, Das D (2019) Bert rediscovers the classical NLP pipeline. Assoc Comput Linguist 2019:4593–4601
Google Scholar
Wu F, Fan A, Baevski A, Dauphin YN and Auli M (2019) Pay less attention with lightweight and dynamic convolutions. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019
So D, Le Q and Liang C (2019) The evolved transformer. In: International Conference on Machine Learning, pages 5877–5886. PMLR
Meng F, Zhang J (2019) Dtmt: a novel deep transition architecture for neural machine translation. Proc AAAI Conf Artif Intell 33:224–231
Google Scholar
Chen K, Wang R, Utiyama M and Sumita E (2019) Neural machine translation with reordering embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1787–1799
Liao B, Khadivi S and Hewavitharana S (2021) Back-translation for large-scale multilingual machine translation. arXiv preprint arXiv:2109.08712
Abdulmumin I, Galadanci BS, Ahmad IS and Abdullahi RI (2021) Data selection as an alternative to quality estimation in self-learning for low resource neural machine translation. In: International Conference on Computational Science and Its Applications, pages 311–326. Springer
Shi Y, Wang Y, Wu C, Yeh C-F, Chan J, Zhang F, Le D and Seltzer M (2021) Emformer: efficient memory transformer based acoustic model for low latency streaming speech recognition. In: ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6783–6787. IEEE
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(04):303–314
Article MathSciNet Google Scholar
Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39(03):930–945
Article MathSciNet Google Scholar
Minami K-I, Nakajima H and Toyoshima T (1999) Real-time discrimination of ventricular tachyarrhythmia with Fourier-transform neural network. IEEE Trans Biomed Eng 179–185
Gothwal H, Kedawat S, Kumar R (2011) Cardiac arrhythmias detection in an ECG beat signal using fast Fourier transform and artificial neural network. J Biomed Sci Eng 4(04):289
Article Google Scholar
Bíla J, Mironovova M (2015) Fast Fourier transform for feature extraction and neural network for classification of electrocardiogram signals. In: Future Generation Communication Technology (FGCT 2015), Luton, United Kingdom, pages 1–6
Zhang KWZ, Wang Y (2013) Fault diagnosis and prognosis using wavelet packet decomposition, Fourier transform and artificial neural network. J Intell Manuf 24(06):1213–1227
Article Google Scholar
Choromanski K, Likhosherstov V, Dohan D, Song X, Davis J, Sarlós T, Belanger D, Colwell LJ and Weller A (2020) Masked language modeling for proteins via linearly scalable long-context transformers. Comput Res Repository
Goodman ND, Tamkin A, Jurafsky D (2020) Language through a prism: a spectral approach for multiscale language representations. Adv Neural Inf Process Syst 33:5492–5504
Google Scholar
Cohan A, Beltagy I, Peters ME (2020) Longformer: the long-document transformer. Adv Neural Inf Process Syst
Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontañón S, Pham P, Ravula A, Wang Q, Yang L and Ahmed A (2020) Big bird: transformers for longer sequences. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6516–6532
Tay Y, Dehghani M, Abnar S, Shen Y, Bahri D, Pham P, Rao J, Yang L, Ruder S and Metzler D (2020) Long range arena: a benchmark for efficient transformers. Comput Res Repository
Goodman ND, Tamkin A, Jurafsky D (2020) Fast transformers with clustered attention. Adv Neural Inf Process Syst 33:21665–21674
Google Scholar
Choromanski K, Likhosherstov V, Dohan D, Song X, Gane A, Sarlós T, Hawkins P, Davis J, Mohiuddin A, Kaiser L, Belanger D, Colwell LJ and Weller A. Rethinking attention with performers. Comput Res Repository (2020)
Shaw P, Uszkoreit J and Vaswani A (2018) Self-attention with relative position representations. arXiv preprint arXiv:1803.02155
Lee-Thorp J, Ainslie J, Eckstein I and Ontanon S (2021) Fnet: Mixing tokens with Fourier transforms. arXiv preprint arXiv:2105.03824
Schmidhuber J, Greff K, Srivastava RK (2016) Highway and residual networks learn unrolled iterative estimation. Comput Res Repository
Fan A, Grave E and Joulin A (2020) Reducing transformer depth on demand with structured dropout. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, pages 26–30
Li B, Wang Z, Liu H, Quan D, Xiao T, Zhang C, Zhu J (2021) Learning light-weight translation models from deep transformer. Proc AAAI Conf Artif Intell 35:13217–13225
Google Scholar
Papineni K, Roukos S, Ward T and Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318
Shankar V (1991) A gigaflop performance algorithm for solving Maxwell’s equations of electromagnetics. In: Computational Fluid Dynamics Conference, page 1578
Wei X, Yu H, Hu Y, Zhang Y, Weng R and Luo W (2020) Multiscale collaborative deep models for neural machine translation. In: Annual Meeting of the Association for Computational Linguistics, pages 414–426

Download references

Author information

Authors and Affiliations

Key Laboratory of Advanced Design and Intelligent Computing Ministry of Education, Dalian University, Dalian, Liaoning, 116622, China
Yuchen Li, Shuxu Chen, Zhuoya Liu, Chao Che & Zhaoqian Zhong

Authors

Yuchen Li
View author publications
You can also search for this author inPubMed Google Scholar
Shuxu Chen
View author publications
You can also search for this author inPubMed Google Scholar
Zhuoya Liu
View author publications
You can also search for this author inPubMed Google Scholar
Chao Che
View author publications
You can also search for this author inPubMed Google Scholar
Zhaoqian Zhong
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhaoqian Zhong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Y., Chen, S., Liu, Z. et al. Translation model based on discrete Fourier transform and Skipping Sub-Layer methods. Int. J. Mach. Learn. & Cyber. 15, 4435–4444 (2024). https://doi.org/10.1007/s13042-024-02156-w

Download citation

Received: 24 May 2023
Accepted: 16 March 2024
Published: 22 April 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s13042-024-02156-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Translation model based on discrete Fourier transform and Skipping Sub-Layer methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Neural Machine Translation Model with Deep Encoding Information

Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese

English-to-Bangla Language Translation: A Comparative Study of GPT-3 and Attention-Based Encoder Decoder

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Translation model based on discrete Fourier transform and Skipping Sub-Layer methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Neural Machine Translation Model with Deep Encoding Information

Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese

English-to-Bangla Language Translation: A Comparative Study of GPT-3 and Attention-Based Encoder Decoder

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.