Abstract
Recurrent neural networks (RNN) have been very successful in handling sequence data. However, understanding RNN and finding the best practices for RNN learning is a difficult task, partly because there are many competing and complex hidden units, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU). We propose a gated unit for RNN, named as minimal gated unit (MGU), since it only contains one gate, which is a minimal design among all gated hidden units. The design of MGU benefits from evaluation results on LSTM and GRU in the literature. Experiments on various sequence data show that MGU has comparable accuracy with GRU, but has a simpler structure, fewer parameters, and faster training. Hence, MGU is suitable in RNN's applications. Its simple architecture also means that it is easier to evaluate and tune, and in principle it is easier to study MGU's properties theoretically and empirically.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems 25, NIPS, Lake Tahoe, Nevada, USA, pp. 1097–1105, 2012.
K. Cho, B. van Meriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Doha, Qatar, pp. 1724–1734, 2014.
I. Sutskever, O. Vinyals, Q. V. Le. Sequence to sequence learning with neural networks. In Proceedings of Advances in Neural Information Processing Systems 27, NIPS, Montreal, Canada, pp. 3104–3112, 2014.
D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations 2015, San Diego, USA, 2015.
A. Graves, A. R. Mohamed, G. Hinton. Speech recognition with deep recurrent neural networks. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, IEEE, Vancouver, Canada, pp. 6645–6649, 2013.
K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. S. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2048–2057, 2015.
A. Karpathy, F. F. Li. Deep visual-semantic alignments for generating image descriptions. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3128–3137, 2015.
R. Lebret, P. O. Pinheiro, R. Collobert. Phrase-based image captioning. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2085–2094, 2015.
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 2625–2634, 2015.
N. Srivastava, E. Mansimov, R. Salakhutdinov. Unsupervised learning of video representations using LSTMs. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 843–852, 2015.
X. J. Shi, Z. R. Chen, H. Wang, D. Y. Yeung, W. K. Wong, W. C. Woo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of Advances in Neural Information Processing Systems 28, NIPS, Montreal, Canada, pp. 802–810, 2015.
M. D. Zeiler, R. Fergus. Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference on Computer Vision, Lecture Notes in Computer Science, Springer, Zurich, Switzerland, vol. 8689, pp. 818–833, 2014.
S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
F. A. Gers, J. Schmidhuber, F. Cummins. Learning to forget: Continual prediction with LSTM. In Proceedings of the 9th International Conference on Artificial Neural Networks, IEEE, Edinburgh, UK, vol. 2, pp. 850–855, 1999.
F. A. Gers, N. N. Schraudolph, J. Schmidhuber. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, vol. 3, pp. 115–143, 2003.
J. Chung, C. Gulcehre, K. Cho, Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014.
R. Jozefowicz, W. Zaremba, I. Sutskever. An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 2342–2350, 2015.
K. Greff, R. K. Srivastava, J. Koutnk, B. R. Steunebrink, J. Schmidhuber. LSTM: A search space odyssey. arXiv: 1503.04069, 2015.
Y. Bengio, P. Simard, P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, 1994.
A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM networks. Neural Networks, vol. 18, no. 5–6, pp. 602–610, 2005.
T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, M. Ranzato. Learning longer memory in recurrent neural networks. In Proceedings of International Conference on Learning Representations, San Diego, CA, 2015.
Q. V. Le, N. Jaitly, G. E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv: 1504.00941, 2015.
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, ACL, Stroudsburg, USA, pp. 142–150, 2011.
M. P. Marcus, B. Santorini, M. A. Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, vol. 19, no. 2, pp. 313–330, 1993.
W. Zaremba, I. Sutskever, O. Vinyals. Recurrent neural network regularization. arXiv: 1409.2329, 2014.
Z. Z. Wu, S. King. Investigating gated recurrent neural networks for speech synthesis. In Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Shanghai, China, 2016.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by National Natural Science Foundation of China (Nos. 61422203 and 61333014), and National Key Basic Research Program of China (No. 2014CB340501).
Recommended by Associate Editor Yi Cao
Guo-Bing Zhou received the B. Sc. degree in computer science from Nanjing University, China in 2013. He is currently a postgraduate student in Nanjing University and will receive the M. Sc. degree in July, 2016.
His research interest is machine learning.
ORCID iD: 0000-0001-9779-481X
Jianxin Wu received the Ph.D. degree in computer science from the Georgia Institute of Technology, USA in 2009. He is currently a professor in the Department of Computer Science and Technology at Nanjing University, China. He has served as an area chair for ICCV 2015 and senior PC member for AAAI 2016.
His research interests include computer vision and machine learning.
ORCID iD: 0000-0002-2085-7568
Chen-Lin Zhang is a candidate for the Bachelor's degree in the Department of Computer Science and Technology, Nanjing University, China.
His research interests include computer vision and machine learning.
Zhi-Hua Zhou is a professor, standing deputy director of the National Key Laboratory for Novel Software Technology, and Founding Director of the LAMDA Group at Nanjing University. He is a Fellow of the AAAI, IEEE, IAPR, IET/IEE, CCF, and an ACM Distinguished Scientist.
His research interests include artificial intelligence, machine learning and data mining.
Rights and permissions
About this article
Cite this article
Zhou, GB., Wu, J., Zhang, CL. et al. Minimal gated unit for recurrent neural networks. Int. J. Autom. Comput. 13, 226–234 (2016). https://doi.org/10.1007/s11633-016-1006-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-016-1006-2