Minimal gated unit for recurrent neural networks

Zhou, Guo-Bing; Wu, Jianxin; Zhang, Chen-Lin; Zhou, Zhi-Hua

doi:10.1007/s11633-016-1006-2

Minimal gated unit for recurrent neural networks

Research Article
Published: 11 June 2016

Volume 13, pages 226–234, (2016)
Cite this article

International Journal of Automation and Computing Aims and scope Submit manuscript

1916 Accesses
11 Altmetric
Explore all metrics

Abstract

Recurrent neural networks (RNN) have been very successful in handling sequence data. However, understanding RNN and finding the best practices for RNN learning is a difficult task, partly because there are many competing and complex hidden units, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU). We propose a gated unit for RNN, named as minimal gated unit (MGU), since it only contains one gate, which is a minimal design among all gated hidden units. The design of MGU benefits from evaluation results on LSTM and GRU in the literature. Experiments on various sequence data show that MGU has comparable accuracy with GRU, but has a simpler structure, fewer parameters, and faster training. Hence, MGU is suitable in RNN's applications. Its simple architecture also means that it is easier to evaluate and tune, and in principle it is easier to study MGU's properties theoretically and empirically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CS-RNN: efficient training of recurrent neural networks with continuous skips

Article 24 June 2022

Overview of Long Short-Term Memory Neural Networks

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
Article Google Scholar
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems 25, NIPS, Lake Tahoe, Nevada, USA, pp. 1097–1105, 2012.
Google Scholar
K. Cho, B. van Meriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Doha, Qatar, pp. 1724–1734, 2014.
Google Scholar
I. Sutskever, O. Vinyals, Q. V. Le. Sequence to sequence learning with neural networks. In Proceedings of Advances in Neural Information Processing Systems 27, NIPS, Montreal, Canada, pp. 3104–3112, 2014.
Google Scholar
D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations 2015, San Diego, USA, 2015.
Google Scholar
A. Graves, A. R. Mohamed, G. Hinton. Speech recognition with deep recurrent neural networks. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, IEEE, Vancouver, Canada, pp. 6645–6649, 2013.
Google Scholar
K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. S. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2048–2057, 2015.
Google Scholar
A. Karpathy, F. F. Li. Deep visual-semantic alignments for generating image descriptions. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3128–3137, 2015.
Google Scholar
R. Lebret, P. O. Pinheiro, R. Collobert. Phrase-based image captioning. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2085–2094, 2015.
Google Scholar
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 2625–2634, 2015.
Google Scholar
N. Srivastava, E. Mansimov, R. Salakhutdinov. Unsupervised learning of video representations using LSTMs. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 843–852, 2015.
Google Scholar
X. J. Shi, Z. R. Chen, H. Wang, D. Y. Yeung, W. K. Wong, W. C. Woo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of Advances in Neural Information Processing Systems 28, NIPS, Montreal, Canada, pp. 802–810, 2015.
Google Scholar
M. D. Zeiler, R. Fergus. Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference on Computer Vision, Lecture Notes in Computer Science, Springer, Zurich, Switzerland, vol. 8689, pp. 818–833, 2014.
Article Google Scholar
S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
Article Google Scholar
F. A. Gers, J. Schmidhuber, F. Cummins. Learning to forget: Continual prediction with LSTM. In Proceedings of the 9th International Conference on Artificial Neural Networks, IEEE, Edinburgh, UK, vol. 2, pp. 850–855, 1999.
Google Scholar
F. A. Gers, N. N. Schraudolph, J. Schmidhuber. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, vol. 3, pp. 115–143, 2003.
MathSciNet MATH Google Scholar
J. Chung, C. Gulcehre, K. Cho, Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014.
Google Scholar
R. Jozefowicz, W. Zaremba, I. Sutskever. An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2342–2350, 2015.
Google Scholar
K. Greff, R. K. Srivastava, J. Koutnk, B. R. Steunebrink, J. Schmidhuber. LSTM: A search space odyssey. arXiv: 1503.04069, 2015.
Google Scholar
Y. Bengio, P. Simard, P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, 1994.
Article Google Scholar
A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM networks. Neural Networks, vol. 18, no. 5–6, pp. 602–610, 2005.
Article Google Scholar
T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, M. Ranzato. Learning longer memory in recurrent neural networks. In Proceedings of International Conference on Learning Representations, San Diego, CA, 2015.
Google Scholar
Q. V. Le, N. Jaitly, G. E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv: 1504.00941, 2015.
Google Scholar
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL, Stroudsburg, USA, pp. 142–150, 2011.
Google Scholar
M. P. Marcus, B. Santorini, M. A. Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, vol. 19, no. 2, pp. 313–330, 1993.
Google Scholar
W. Zaremba, I. Sutskever, O. Vinyals. Recurrent neural network regularization. arXiv: 1409.2329, 2014.
Google Scholar
Z. Z. Wu, S. King. Investigating gated recurrent neural networks for speech synthesis. In Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Shanghai, China, 2016.
Google Scholar

Download references

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Guo-Bing Zhou, Jianxin Wu, Chen-Lin Zhang & Zhi-Hua Zhou

Authors

Guo-Bing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jianxin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chen-Lin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianxin Wu.

Additional information

This work was supported by National Natural Science Foundation of China (Nos. 61422203 and 61333014), and National Key Basic Research Program of China (No. 2014CB340501).

Recommended by Associate Editor Yi Cao

Guo-Bing Zhou received the B. Sc. degree in computer science from Nanjing University, China in 2013. He is currently a postgraduate student in Nanjing University and will receive the M. Sc. degree in July, 2016.

His research interest is machine learning.

ORCID iD: 0000-0001-9779-481X

Jianxin Wu received the Ph.D. degree in computer science from the Georgia Institute of Technology, USA in 2009. He is currently a professor in the Department of Computer Science and Technology at Nanjing University, China. He has served as an area chair for ICCV 2015 and senior PC member for AAAI 2016.

His research interests include computer vision and machine learning.

ORCID iD: 0000-0002-2085-7568

Chen-Lin Zhang is a candidate for the Bachelor's degree in the Department of Computer Science and Technology, Nanjing University, China.

His research interests include computer vision and machine learning.

Zhi-Hua Zhou is a professor, standing deputy director of the National Key Laboratory for Novel Software Technology, and Founding Director of the LAMDA Group at Nanjing University. He is a Fellow of the AAAI, IEEE, IAPR, IET/IEE, CCF, and an ACM Distinguished Scientist.

His research interests include artificial intelligence, machine learning and data mining.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, GB., Wu, J., Zhang, CL. et al. Minimal gated unit for recurrent neural networks. Int. J. Autom. Comput. 13, 226–234 (2016). https://doi.org/10.1007/s11633-016-1006-2

Download citation

Received: 06 April 2016
Accepted: 09 May 2016
Published: 11 June 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11633-016-1006-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimal gated unit for recurrent neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CS-RNN: efficient training of recurrent neural networks with continuous skips

Overview of Long Short-Term Memory Neural Networks

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Minimal gated unit for recurrent neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CS-RNN: efficient training of recurrent neural networks with continuous skips

Overview of Long Short-Term Memory Neural Networks

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.