End-to-End Sequence Labeling via Convolutional Recurrent Neural Network with a Connectionist Temporal Classification Layer

Huang, Xiaohui; Qiao, Lisheng; Yu, Wentao; Li, Jing; Ma, Yanzhou

doi:10.2991/ijcis.d.200316.001

End-to-End Sequence Labeling via Convolutional Recurrent Neural Network with a Connectionist Temporal Classification Layer

Regular paper
Open access
Published: 20 March 2020

Volume 13, pages 341–351, (2020)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computational Intelligence Systems Aims and scope Submit manuscript

End-to-End Sequence Labeling via Convolutional Recurrent Neural Network with a Connectionist Temporal Classification Layer

Download PDF

129 Accesses
12 Citations
Explore all metrics

Abstract

Sequence labeling is a common machine-learning task which not only needs the most likely prediction of label for a local input but also seeks the most suitable annotation for the whole input sequence. So it requires the model that is able to handle both the local spatial features and temporal-dependence features effectively. Furthermore, it is common for the length of the label sequence to be much shorter than the input sequence in some tasks such as speech recognition and handwritten text recognition. In this paper, we propose a kind of novel deep neural network architecture which combines convolution, pooling and recurrent in a unified framework to construct the convolutional recurrent neural network (CRNN) for sequence labeling tasks with variable lengths of input and output. Specifically, we design a novel CRNN to achieve the joint extraction of local spatial features and long-distance temporal-dependence features in sequence, introduce pooling along time to achieve a transform of long input to short output which will also reduce he model’s complexity, and adopt Connectionist Temporal Classification (CTC) layer to achieve an end-to-end pattern for sequence labeling. Experiments on phoneme sequence recognition and handwritten character sequence recognition have been conducted and the results show that our method achieves great performance while having a more simplified architecture with more efficient training and labeling procedure.

Article PDF

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

A. Graves, S. Fernández, J. Schmidhuber, Connectionist temporal classification: labeling unsegmented sequence data with recurrent neural networks, in Proceedings of the 23rd International Conference on Machine Learning, ACM, Pittsburgh, PA, USA, 2006, pp. 369–376.
J. Watanabe, S. Hori, T. Baskar, et al., Language model integration based on memory control for sequence to sequence speech recognition, in IEEE International Conference on Acoustics, Speech, and Signal Processing, Brighton, UK, 2019, pp. 6191–6195.
J. Sueiras, V. Ruiz, A. Sanchez, J. Velez, Offline continuous handwritten recognition using sequence to sequence neural networks, Neurocomputing. 289 (2018), 119–128.
Google Scholar
A. Graves, J. Schmidhuber, Multidimensional recurrent neural networks, in International Conference on Artificial Neural Networks, Porto, Portugal, 2007, pp. 549–558.
A. Naseer, K. Zafar, Meta features-based scale invariant OCR decision making using LSTM-RNN, Comput. Math. Organ. Theor. 5 (2019), 165–183.
Google Scholar
Y. Zhang, M. Pezeshki, Towards end-to-end speech recognition with deep convolutional neural networks, in Interspeech 2016, San Francisco, CA, USA, 2016, pp. 410–414.
M. Karafiát, M.K. Baskar, S. Watanabe, et al., Analysis of multilingual sequence-to-sequence speech recognition systems, in Interspeech 2019, Graz, Austria, 2019.
A. Graves, N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, in Proceedings of the 31st International Conference on Machine Learning (ICML-14), Beijing, China, 2014, vol. 32, pp. 1764–1772.
S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), 1735–1780.
Google Scholar
A. Graves, A.R. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2013), Vancouver, Canada, 2013, pp. 6645–6649.
H. El Bahi, A. Zatni, Text recognition in document images obtained by a smartphone based on deep convolutional and recurrent neural network, Multimed. Tools Appl. 78 (2019), 26453–26481.
Google Scholar
Q. Lu, Y. Xu, R. Yang, N. Li, C. Wang, Serial and parallel recurrent convolutional neural networks for biomedical named entity recognition, Database Syst. Adv. Appl. 11448 (2019), 439–443.
Google Scholar
D. de Benito-Gorron, A. Lozano-Diez, D.T. Toledano, et al., Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset, EURASIP J. Audio Speech Music Process. 2019 (2019).
Y. Wang, F. Liu, K. Zhang, G. Hou, Z. Sun, T. Tan, LFNet: a novel bidirectional recurrent convolutional neural network for light-field image super-resolution, IEEE Trans. Image Process. 27 (2018), 4274–4286.
D. Bahdanau, J. Chorowski, D. Serdyuk, et al., End-to-end attention-based large vocabulary speech recognition, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2016), Shanghai, China, 2016, pp. 4945–4949.
J.K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, Y. Bengio, Attention-based models for speech recognition, in Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, Canada, 2015, pp. 577–585.
B. Li, T. Liu, Z. Zhao, X. Du, Attention-based recurrent neural network for sequence labeling, Web and Big Data. 10987 (2018), 340–348.
Google Scholar
L. Kang, J.I. Toledo, P. Riba, M. Villegas, et al., Convolve, attend and spell: an attention-based sequence-to-sequence model for handwritten word recognition, in International Conference on Pattern Recognition, Beijing, China, 2018, vol. 11269, pp. 459–472.
T.N. Sainath, O. Vinyals, A. Senior, H. Sak, Convolutional, long short-term memory, fully connected deep neural networks, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015, pp. 4580–4584.
Y. Zhao, X. Jin, X. Hu, Recurrent convolutional neural network for speech processing, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2017), New Orleans, LA, USA, 2017.
Z. Zhang, D. Robinson, J. Tepper, Detecting hate speech on twitter using a convolution-GRU based deep neural network, in The 15th European Semantic Web Conference (ESWC 2018), Heraklion, Greece, 2018, vol. 10843, pp. 745–760.
B. Shi, X. Bai, C. Yao, An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition, IEEE Trans. Pattern Anal. Mach. Intell. 36 (2017), 2298–2304.
Google Scholar
O. Vinyals, A. Toshev, S. Bengio, et al., Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge, IEEE Trans. Pattern Anal. Mach. Intell. 39 (2016), 652–663.
Google Scholar
F. Zhou, R. Hang, Q. Liu, X. Yuan, Integrating convolutional neural network and gated recurrent unit for hyper-spectral image spectral-spatial classification, in Pattern Recognition and Computer Vision, Guangzhou, China, 2018, pp. 409–420.
X. Wang, W. Jiang, Z. Luo, Combination of convolutional and recurrent neural network for sentiment analysis of short texts, in The 26th international conference on Computational Linguistics, Osaka, Japan, 2016, pp. 2428–2437. https://www.aminer.cn/pub/58d83051d649053542fe99f1
M.Z. Alom, C. Yakopcic, M.S. Nasrin, et al., Breast cancer classification from histopathological images with inception recurrent residual convolutional neural network, J. Digit. Imaging. 32 (2019), 605–617.
Google Scholar
P. Xie, G. Wang, C. Zhang, et al., Bidirectional recurrent neural network and convolutional neural network (BiRCNN) for ECG beat classification, in The 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC2018), Honolulu, HI, USA, 2018, pp. 2555–2558.
S. Lai, L. Xu, K. Liu, J. Zhao, Recurrent convolutional neural networks for text classification, in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Inteligence, Austin, TX, USA, 2015, vol. 333, pp. 2267–2273.
K. Cho, B. Van Merrienboer, C. Gulcehre, et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proceeding of the Conference on Empirical Methods Natural Language Process, Doha, Qatar, 2014, pp. 1724–1734.
M. Schuster, K.K. Paliwal, Bidirectional recurrent neural networks, IEEE Trans. Signal Process. 45 (1997), 2673–2681.
Google Scholar
J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, DARPA TIMIT Acoustic-Phonemetic Continuous Speech Corpus CD-ROM.NIST Speech Disc 1-1.1, NASA STI/Recon Technical Report, vol. 93, 1993.
U. Marti, H. Bunke, The IAM-database: an English sentence database for off-line handwritten recognition, J. Doc. Anal. Recognit. 5 (2002), 39–46.
Google Scholar
E. Grosicki, H. El-Abed, ICDAR 2011-french handwriting recognition competition, in International Conference on Document Analysis and Recognition, Beijing, China, 2011, pp. 1459–1463.
K.F. Li, H.W. Hon, Speaker-independent phoneme recognition using hidden markov models, J. Acoustical Soc. Am. 84 (1988), 62.
Google Scholar
L. Toth, Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014, pp. 190–194.
A. Chowdhury, L. Vig, An efficient end-to-end neural model for handwritten text recognition, arXiv: 1807.07965 [cs.CL], 2018.
J. Puigcerver, Are multidimensional recurrent layers really necessary for handwritten text recognition?, in IEEE 2017 IAPR 14th International Conference on Document Analysis and Recognition (ICDAR 2017), Kyoto, Japan, 2017.
K. Dutta, P. Krishnan, M. Mathew, C.V. Jawahar, Improving CNN-RNN hybrid networks for handwriting recognition, in 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagara, NY, USA, 2018.
J. Michael, R. Labahn, T. Grüning, et al., Evaluating sequence-to-sequence models for handwritten text recognition, arXiv: 1903.07377v2 [cs.CV], 2019.
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE. 86 (1998), 2278–2324.
Google Scholar
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in The 3rd International Conference on Learning Representations (ICLR2015), San Diego, CA, USA, 2015. arXiv:1409.1556[cs.CV].

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Xiaohui Huang, Lisheng Qiao & Jing Li
Zhengzhou Information Science and Technology Institute, Zhengzhou, Henan, China
Xiaohui Huang, Wentao Yu & Yanzhou Ma

Authors

Xiaohui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lisheng Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Li
View author publications
You can also search for this author in PubMed Google Scholar
Yanzhou Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohui Huang.

Rights and permissions

This is an open access article distributed under the CC BY-NC 4.0 license (https://doi.org/creativecommons.org/licenses/by-nc/4.0/).

Reprints and permissions

About this article

Cite this article

Huang, X., Qiao, L., Yu, W. et al. End-to-End Sequence Labeling via Convolutional Recurrent Neural Network with a Connectionist Temporal Classification Layer. Int J Comput Intell Syst 13, 341–351 (2020). https://doi.org/10.2991/ijcis.d.200316.001

Download citation

Received: 23 September 2019
Accepted: 12 March 2020
Published: 20 March 2020
Issue Date: January 2020
DOI: https://doi.org/10.2991/ijcis.d.200316.001

Key words

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

End-to-End Sequence Labeling via Convolutional Recurrent Neural Network with a Connectionist Temporal Classification Layer

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

End-to-End Sequence Labeling via Convolutional Recurrent Neural Network with a Connectionist Temporal Classification Layer

Abstract

Article PDF

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.