0% found this document useful (0 votes)
31 views5 pages

Unconstrained Offline Handwritten Word

ME Refrerence Paper ieee

Uploaded by

padmanath2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views5 pages

Unconstrained Offline Handwritten Word

ME Refrerence Paper ieee

Uploaded by

padmanath2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEEE SIGNAL PROCESSING LETTERS, VOL. 26, NO.

4, APRIL 2019 597

Unconstrained Offline Handwritten Word


Recognition by Position Embedding
Integrated ResNets Model
Xiangping Wu , Qingcai Chen , Member, IEEE, Jinghan You, and Yulun Xiao

Abstract—The state-of-the-art methods usually integrate with [13]–[15]. In these tasks, the convolutional neural network
linguistic knowledge in the recognizer, which makes models more (CNN) is usually used to extract low/mid/high-level image
complicated and hard for resource-lacking languages. This let- features automatically. For example, Xie et al. [16] presented
ter proposes a new method for unconstrained offline handwritten
word recognition by combining position embeddings with resid- a multi-spatial-context fully convolutional recurrent network
ual networks (ResNets) and bidirectional long short-term memory (MC-FCRN). Jaderberg et al. [17] developed a character se-
(BiLSTM) networks. At first, ResNets are used to extract abun- quence model by using a CNN with multiple position-sensitive
dant features from the input image. Then, position embeddings character classifier. When an image contains a long sequence of
are used as indices of the character sequence corresponding to characters, they need to build a lot of classifiers. To improve the
a word. By combining the ResNets features with each position
capability of handling misalignment between inputs and target
embedding, the model generates different inputs for the BiLSTM
networks. Finally, the state sequence of the BiLSTM is used to labels, the long short-term memory (LSTM) [18] network com-
recognize corresponding characters. Without additional language bining with connectionist temporal classification (CTC) [19]
resource, the proposed model achieved the best result on two public is used for labeling sequence. Based on CTC model, Zhan
corpora, i.e., the 2017 ICDAR word-level information extraction in et al. [20] used ResNets [21] to extract features. The recurrent
historical handwritten records competition and the RIMES public neural network (RNN) was used to model the contextual infor-
dataset on character error rate.
mation and predict recognition sequences in Zhan’s model. Shi
Index Terms—Position embedding, residual networks, bidirec- et al. [22] proposed a novel end-to-end scene text recognition ar-
tional long short-term memory network, off-line handwritten word chitecture, which uses a convolutional recurrent neural network
recognition. (CRNN) with CTC. As one kind of important deep learning
mechanism, the attention model is successfully applied to text
I. INTRODUCTION recognition [23], [24]. Shi et al. [25] proposed a flexible rec-
tification mechanism based on the spatial transformer network
FF-LINE handwritten word or sentence recognition is
O still a very challenging problem, especially for those
languages that lack language resources. Traditionally, segmenta-
(STN) for irregular text recognition. And Wojna et al. [26] pre-
sented an end-to-end approach with a spatial attention mask for
scene text recognition.
tion is one of the key tasks for word recognition [1]–[4]. Models To deal with the diversity of writing styles and the similar-
based on hidden markov model (HMM) or neural network hid- ities between characters, the neural networks for handwriting
den markov model (NN-HMM) had been successfully applied recognition usually rely on constructing additional features and
on segmentation-free word recognition [5], [6]. The main is- lexicons. Chherawala et al. [27] achieved promising results with
sues of traditional methods include the overfitting and the long features such as histograms, direction distribution, and profiles.
distance dependency [7], [8]. Almazán et al. [28] proposed a word spotting and recognition
In recent years, deep learning has been introduced for recog- method by embedding both word images and text strings in a
nition tasks. Outstanding performance has been reached on common vectorial subspace. Based on the Almazán’s work, Poz-
handwriting recognition [9]–[12] and scene text recognition nanski et al. [29] presented a CNN-N-Gram method to estimate
its n-gram frequency profile by constructing a set of attributes.
Manuscript received October 13, 2018; revised January 3, 2019; accepted They utilized canonical correlation analysis (CCA) to match the
January 20, 2019. Date of publication January 29, 2019; date of current version predicted profiles to the true profiles of all words in a big lexicon.
March 12, 2019. This work was supported in part by the Natural Science Foun-
dation of China (Grants 61473101, 61872113, and 61573118); and in part by
Their system has been applied on several handwriting recogni-
Strategic Emerging Industry Development Special Funds of Shenzhen (Grants tion benchmarks and reached an obvious performance gain. The
JCYJ20170307150528934 and JCYJ20170811153836555). The associate edi- issue of this method is that it requires the construction of a large
tor coordinating the review of this manuscript and approving it for publication
was Dr. Yap-Peng Tan. (Corresponding author: Qingcai Chen.)
number of linguistic features, such as unigrams, bigrams, and
The authors are with the Shenzhen Chinese Calligraphy Digital Simu- trigrams. In the papers [28] and [29], the recognition tasks are
lation Engineering Laboratory, Harbin Institute of Technology (Shenzhen), performed to some extent like retrieval systems, which match
Shenzhen University Town, Shenzhen 518055, China (e-mail:, wxpleduole@ the word label from existing dictionaries and are named as the
gmail.com; qingcai.chen@hit.edu.cn; youjinghan2018@163.com; xiaoyulun@
stu.hit.edu.cn). lexicon-driven method. In order to avoid constructing a large
Digital Object Identifier 10.1109/LSP.2019.2895967 number of linguistic features and reduce the dependency on
1070-9908 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
598 IEEE SIGNAL PROCESSING LETTERS, VOL. 26, NO. 4, APRIL 2019

ResNets rather than deep convolution neural networks (DCNN)


to produce more sophisticated features. Then, to indicate the
position of each character contained in the word, a position
embedding is assigned and combined with the output of the
ResNets. Since the ResNets output is the same for all characters
in the word, the PE plays the role of attention mechanism. By
combining with PEs corresponding to different positions, the
different parts of the ResNets output are emphasized. Finally,
the ResNets output combined with PEs are input into the BiL-
STM sequentially. Each hidden state of the BiLSTM is used as
the recognition vector and is fed into the full-connected multiple
perceptrons, then the softmax layer conducts the classification
of each handwritten character.

B. ResNets
In this letter, 101-layer ResNets are employed to learn the
features of an image. The network architecture is based on the
earlier work of He et al. [21]. Considering an input sample x
(here x could be multiple color channels) and output vector y,
a building block of ResNets is defined as:

y = F(x, {Wi }) + x (1)

where the residual function F(x, {Wi }) represents the residual


mapping to be learned, which can have multiple convolutional
layers. When x and F have different dimensions, a linear pro-
jection Ws by the shortcut connections can be introduced to
Fig. 1. The architecture of PE-ResNets-BiLSTM model. match the dimensions:

dictionary, a lexicon-verification process is employed by Stuner y = F(x, {Wi }) + Ws x (2)


et al. [30]. They used a cohort of LSTM and a verification strat-
In this letter, we use a bottleneck building block that includes
egy. Its advantage is that the out of vocabulary (OOV) words
a stack of 3 layers. Its architecture is depicted in [21]. 1 × 1
could also be recognized [30]. Although it greatly reduces the
convolution of the head and end are used to reduce and restore
requirements on linguistic resources, the method of Stuner et al.
the dimensions, and 3 × 3 convolution in the middle layer is a
is integrated with a dictionary during modeling, which makes
bottleneck with smaller input/output dimensions.
the model complicated.
To further reduce the dependency on linguistic resources,
this letter proposes an unconstrained off-line handwritten word C. Position Embedding
recognition model. The main contributions of this letter include: To avoid the segmentation of handwritten word, a general way
1) Proposes a method to convert the position information of char- is to segment the feature representation of an image by column,
acters into position embeddings (PE), which helps the network then feed each slice into a LSTM network sequentially to gen-
automatically learn the character representation. 2) Proposes a erate character classification vectors [31], [32]. Even by using
novel handwritten word recognition model by integrating the CTC method, such hard segmentation may cause information
ResNets, position embeddings, and BiLSTM network. The pro- loss or abundant. It also cannot handle too short sequences or
posed model is greatly simplified while keeping the capability sequences that have more characters than the number of seg-
of segmentation-free and multilingual portable. 3) Experiments mented slices.
are conducted on public corpora for two languages, and the To address the above issues, this letter introduces the posi-
comparable results are achieved. By adding a simple postpro- tion embeddings [33]. For a word of length n, the characters
cessing model, the proposed method reached the state-of-the-art contained in it are indicated by their order in the sequence as
performance of CER on 2017 ICDAR IEHHR competition and i = 1, 2, . . . , n(n ≤ K). Here K denotes the maximum length
RIMES datasets respectively. of characters in an image. It is obvious that the features cor-
responding to a character are related to its index. But there is
II. PROPOSED METHOD no explicit boundary between two characters, especially after
the operations of feature extraction. To fully use the order in-
A. The Model Structure
formation of characters, while avoid making a hard decision of
The architecture of the position embedding integrated model character boundaries, we define the position embeddings (PE)
(denoted as PE-ResNets-BiLSTM model) is given in Fig. 1. At as a series of vectors that represent the positional index of the
first, the 101-layer ResNets [21] are used to generate a represen- character in the image. That is, representing each index i by a
tative vector for the image of a handwritten word. Here we use Q-dimensional position embedding Pi (i = 1, 2, . . . , K). The
WU et al.: UNCONSTRAINED OFFLINE HANDWRITTEN WORD RECOGNITION BY POSITION EMBEDDING INTEGRATED RESNETS MODEL 599

PE is then used to distinguish the part of features for a given written character sequence. Since in prediction stage, we will
character contained in a handwritten word. Though there are end the sequence while the first “null” label is encountered, we
many ways to combine a PE with the output of ResNets, the only count on the loss of the first “null” label in the training
simple and efficient way of concatenation is used. Let Fg de- stage.
notes the global feature vector output from the ResNets, which
is the same for all characters in the given handwritten word. D. The Model Training
Then the feature of each character can be expressed as
In order to overcome overfitting, we add L2 regularization
(i) with a weight decay of 0.0001 to the loss function expression.
Fc = Fg ⊕ Pi , i = 1, 2, . . . , K (3)
The network is trained by stochastic gradient descent (SGD)
where the ⊕ denotes the concatenating operation. There are usu- with a momentum set to 0.9. The learning rate is initially set
ally two ways to determine the value of each PE [33], [34], one to 0.1 and is divided by 10 per 50 iterations. Considering the
is to use randomly generated values, the other is to dynamically huge parameter space of the complex networks, data enhance-
learn from the model. This letter uses a mechanism of PE learn- ments are employed to expand training samples. For each in-
ing during recognizer training. As shown in Fig. 1, the character put image, we rotate, shear, and zoom, respectively. And the
feature vector Fc (i) is the ith input of the BiLSTM, and the corresponding parameter ranges are [−5◦ , +5◦ ], [−0.5, +0.5]
ith hidden state is corresponding to the ith classification vector and [0.8, 1.2]. By this method, each input image can gener-
used by multiple layer perceptrons and a softmax classifier. ate 12 additional images. Prediction-side data enhancement is
Given a handwritten word containing the character sequence performed in the same way. During the prediction stage, 13
C = {c1 , c2 , . . . , cn }, n ≤ K, we assume A is a set of all char- images were separately passed through the proposed network,
acter labels that the language needs to predict. The standard and then calculate the average of the softmax results of 13 im-
output label corresponding to the ith state is given as: ages. Finally, the sequence of characters with the highest con-
 fidence was taken as the predicted result of the test image. In
char, char ∈ A, i ≤ n
ci = (4) the training process, neither segmentation techniques nor lan-
∅, n < i ≤ K guage resources are used. We only use an official lexicon in the
post-processing stage, which further improves the recognition
where ∅ represents the “null” label, which is just a placeholder
performance by not requiring any additional language resource
while the real length of the given word is shorter than the max-
and keeping the simplicity of the recognition system. In this
imum length K. Assuming that the K prediction results are
step, we first do lexicon-free recognition, and then select the
denoted as S = {s1 , s2 , . . . , sK }, then the conditional proba-
most closed word from the lexicon according to the edit distance
bility is defined as:
metric. The maximal edit distance is set to 7 to limit the search

n 
K complexity.
p(C|I) = p(si = ci |I) p(sj = ∅|I) (5)
i=1 j =n +1 III. EXPERIMENTS
where the p(·) is the probability output of softmax classifiers. A. Dataset
We only extract the characters before the first ‘∅’ label as the
word prediction results. Here n is the number of the characters The experiments of this letter are constructed on two
contained in the ground truth label sequence. Given the train- benchmarks. For the 2017 ICDAR IEHHR competition [35],
ing dataset X = {I (d) , C (d) }, d = 0, 1, 2, . . . , |X|, where I (d) 125 pages of the Esposalles dataset [36], [37] are used for hand-
is the dth input image and C (d) is the ground truth label se- writing recognition and named entities recognition (NER). This
quence. Let L be the label set of the model, including character dataset consists of historical handwritten marriage records from
set A and ∅ label. yi is the one-hot encoding of the character the archives of the Cathedral of Barcelona in old Catalan. The
ci , which is a vector of the |L| dimension. Then the element of training set is composed of 968 marriage records with 31501
the vector yi can be expressed as isolated word images. The test set is composed of 253 marriage
 records. Since the test set is not publicly available, we randomly
1, ci = L[j] divide the training word images into equal five parts and per-
yij = (6) form a 5-fold cross-validation. The performance of our previous
0, ci = L[j]
competition system running on the test set is also given as the
Here L[j] denotes the j th class of the label set L. The loss comparison. In this competition, we had won and improved the
function is defined as: performance from the baseline of 70.18% given by the compe-
⎛ tition organizer to 91.97% on the test set.
|X | n |L|
1  ⎝  (d)
L= − yij ln pij (d) The RIMES [38] dataset was used for ICDAR 2011 compe-
|X| i=1 j =1 tition as an isolated word recognition task [39]. A dictionary
d=1
⎞ (7) composed of more than 5000 words is also provided. In this
|L|
 letter, we conduct experiments on the training and test sets.
+ yn +1,j (d) ln pn +1,j (d) ⎠ (n + 1)
j =1 B. Experimental Results
where the pij is the probability of ci = L[j]. In (7), the second In this letter, we use the same character error rate (CER)
term of the cross-entropy corresponds to the “null” of the hand- measure as in [29], which is based on the Levenshtein Distance.
600 IEEE SIGNAL PROCESSING LETTERS, VOL. 26, NO. 4, APRIL 2019

TABLE I
COMPARISON TO EXISTING METHODS IN CHARACTER ERROR RATE (%) ON
RIMES AND ESPOSALLES DATASET


The result is obtained on unpublished test set, which is not directly
comparable to our result. † The results listed from the 2nd to 6th raws
are reported in [29].

TABLE II
COMPARISON TO DIFFERENT VARIANTS OF THE FULL SYSTEM
IN CHARACTER ERROR RATE (%)
Fig. 2. Examples of recognition. Left characters are ground truth and right
are predictions. Red characters are recognized wrongly.

consistently improve performance. It is remarkable that, with-


out the lexicon for post-processing, the character error rate of
the proposed method is 2.78%. It outperforms all compared
methods, except the method of [29]. Experiments show that the
proposed method is suitable for the language-independent hand-
written word recognition task. On Esposalles dataset, we give
the performance of another baseline model, i.e., the v6 model
The maximum character sequence length of each handwritten in Table II. This model won the first place in the 2017 ICDAR
word is set to 15 and 20 on Esposalles dataset and RIMES word-level IEHHR competition. The baseline model uses only
dataset respectively. ResNets with position embeddings, and the position embed-
The state-of-the-art systems on the handwriting recognition dings are only randomly initialized once and are not changed
benchmarks RIMES and Esposalles are compared in Table I. during the whole training process.
It shows that, in both benchmarks, the proposed method out-
performs the compared systems on the CER measure. For C. Error Analysis
the RIMES dataset, we make the 12 times data enhancement,
Fig. 2 shows some wrong recognized samples of the pro-
while Poznanski et al. [29] used 36 times data enhancement.
posed method. Our error analysis shows that most of the errors
Compared to [29], we get the absolute performance gain of
are caused by uppercase/lowercase character recognition errors.
0.11% on CER, i.e., 5.79% of error dropping on the RIMES
Fig. 2(a) and Fig. 2(b) give such examples on Esposalles and
dataset.
RIMES datasets, respectively. Most of such errors could be cor-
On the Esposalles dataset, to the best of our knowledge, only
rected by linguistic rules in specific language. On Esposalles
Toledo et al. [40] have published the performance of word-level
dataset, as shown by Fig. 2(c), there are some erased charac-
handwritten recognition. Compared with their performance, we
ters been recognized. Fig. 2(d) shows the wrong recognition of
have the performance gain of 0.34% on CER, which corre-
accents characters on RIMES dataset.
sponds to the 40.96% of error rate dropping. It should be noted
that we use the 5-fold cross-validation on published Espos-
IV. CONCLUSION
alles dataset rather than the unpublished test set used by Toledo
et al. [40]. This letter presents a novel unconstrained off-line handwritten
In order to show the effectiveness of each part of our full word recognition method named PE-ResNets-BiLSTM. Exper-
system on the recognition results, we conducted more exper- iments conducted on handwriting recognition datasets of two
iments on several variations of the proposed method. The re- languages show that, without any language resource, the model
sults are shown in Table II. For each of the variation version, gets nearly the same performance as existing language resource
we keep all other sets the same as the full version, and only enriched models. By adding a simple post-processing based only
the mentioned part is removed. On Esposalles dataset, we ran- on a lexicon of the given language, the model reached the state-
domly selected one of the 5-fold cross-validation data for the of-the-art CER performance on both languages. These results
experiment. Table II shows that there are the same trends of prove the effectiveness and latent capability of the proposed
performance gain for each variation version of the system on method on handwriting recognition of other resource-lacking
both datasets. Data enhancement and post-processing modules languages.
WU et al.: UNCONSTRAINED OFFLINE HANDWRITTEN WORD RECOGNITION BY POSITION EMBEDDING INTEGRATED RESNETS MODEL 601

REFERENCES [21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
[1] M.-Y. Chen, A. Kundu, and J. Zhou, “Off-line handwritten word recogni- pp. 770–778.
tion using a hidden Markov model type stochastic network,” IEEE Trans. [22] B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network
Pattern Anal. Mach. Intell., vol. 16, no. 5, pp. 481–496, May 1994. for image-based sequence recognition and its application to scene text
[2] C.-L. Liu, H. Sako, and H. Fujisawa, “Effects of classifier structures and recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11,
training regimes on integrated segmentation and recognition of handwrit- pp. 2298–2304, Nov. 2017.
ten numeral strings,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, [23] F. Bai, Z. Cheng, Y. Niu, S. Pu, and S. Zhou, “Edit probability for scene
no. 11, pp. 1395–1407, Nov. 2004. text recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
[3] M. Kumar, M. Jindal, and R. Sharma, “Segmentation of isolated and 2018, pp. 1508–1516.
touching characters in offline handwritten Gurmukhi script recognition,” [24] Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou, “Focusing attention:
Int. J. Inf. Technol. Comput. Sci., vol. 6, no. 2, pp. 58–63, 2014. Towards accurate text recognition in natural images,” in Proc. IEEE Int.
[4] Y. Wang, X. Ding, and C. Liu, “Topic language model adaption for recogni- Conf. Comput. Vis., 2017, pp. 5086–5094.
tion of homologous offline handwritten Chinese text image,” IEEE Signal [25] B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, and X. Bai, “Aster: An at-
Process. Lett., vol. 21, no. 5, pp. 550–553, May 2014. tentional scene text recognizer with flexible rectification,” IEEE Trans.
[5] T.-H. Su, T.-W. Zhang, D.-J. Guan, and H.-J. Huang, “Off-line recognition Pattern Anal. Mach. Intell., to be published.
of realistic Chinese handwriting using segmentation-free strategy,” Pattern [26] Z. Wojna et al., “Attention-based extraction of structured information from
Recognit., vol. 42, no. 1, pp. 167–182, 2009. street view imagery,” in Proc. 14th IAPR Int. Conf. Doc. Anal. Recognit.,
[6] Z.-R. Wang, J. Du, W.-C. Wang, J.-F. Zhai, and J.-S. Hu, “A compre- 2017, pp. 844–850.
hensive study of hybrid neural network hidden Markov model for offline [27] Y. Chherawala, P. P. Roy, and M. Cheriet, “Combination of context-
handwritten chinese text recognition,” Int. J. Doc. Anal. Recognit., vol. 21, dependent bidirectional long short-term memory classifiers for robust of-
pp. 241–251, 2018. fline handwriting recognition,” Pattern Recognit. Lett., vol. 90, pp. 58–64,
[7] A. Graves and J. Schmidhuber, “Offline handwriting recognition with 2017.
multidimensional recurrent neural networks,” in Proc. Adv. Neural Inf. [28] J. Almazán, A. Gordo, A. Fornés, and E. Valveny, “Word spotting and
Process. Syst., 2009, pp. 545–552. recognition with embedded attributes,” IEEE Trans. Pattern Anal. Mach.
[8] T. Liu and J. Lemeire, “Efficient and effective learning of HMMS based Intell., vol. 36, no. 12, pp. 2552–2566, Dec. 2014.
on identification of hidden states,” Math. Probl. Eng., vol. 2017, 2017, [29] A. Poznanski and L. Wolf, “CNN-N-gram for handwriting word recogni-
Art. no. 7318940. tion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2305–
[9] J. Sueiras, V. Ruiz, A. Sanchez, and J. F. Velez, “Offline continuous 2314.
handwriting recognition using sequence to sequence neural networks,” [30] B. Stuner, C. Chatelain, and T. Paquet, “Cascading BLSTM networks for
Neurocomputing, vol. 289, pp. 119–128, 2018. handwritten word recognition,” in Proc. 23rd Int. Conf. Pattern Recognit.,
[10] X. Xiao, L. Jin, Y. Yang, W. Yang, J. Sun, and T. Chang, “Building fast 2016, pp. 3416–3421.
and compact convolutional neural networks for offline handwritten chinese [31] A. Ul-Hasan, S. B. Ahmed, F. Rashid, F. Shafait, and T. M. Breuel, “Of-
character recognition,” Pattern Recognit., vol. 72, pp. 72–81, 2017. fline printed Urdu Nastaleeq script recognition with bidirectional LSTM
[11] X.-Y. Zhang, F. Yin, Y.-M. Zhang, C.-L. Liu, and Y. Bengio, “Draw- networks,” in Proc. 12th Int. Conf. Doc. Anal. Recognit., 2013, pp. 1061–
ing and recognizing chinese characters with recurrent neural network,” 1065.
IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 849–862, [32] D. Ko, C. Lee, D. Han, H. Ohk, K. Kang, and S. Han, “Approach for
Apr. 2018. machine-printed Arabic character recognition: The-state-of-the-art deep-
[12] Q. Wang and Y. Lu, “A sequence labeling convolutional network and learning method,” Electron. Imag., vol. 2018, no. 2, pp. 1–8, 2018.
its application to handwritten string recognition,” in Proc. 26th Int. Joint [33] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Con-
Conf. Artif. Intell., 2017, pp. 2950–2956. volutional sequence to sequence learning,” in Int. Conf. Machine Learn.,
[13] R. Wang, N. Sang, and C. Gao, “Scene text identification by leveraging 2017, pp. 1243–1252.
mid-level patches and context information,” IEEE Signal Process. Lett., [34] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf.
vol. 22, no. 7, pp. 963–967, Jul. 2015. Process. Syst., 2017, pp. 5998–6008.
[14] X. Bai, C. Yao, and W. Liu, “Strokelets: A learned multi-scale mid-level [35] A. Fornés et al., “ICDAR2017 competition on information extraction in
representation for scene text recognition,” IEEE Trans. Image Process., historical handwritten records,” in Proc. 14th IAPR Int. Conf. Doc. Anal.
vol. 25, no. 6, pp. 2789–2802, Jun. 2016. Recognit., 2017, vol. 1, pp. 1389–1394.
[15] B. Su and S. Lu, “Accurate recognition of words in scenes without char- [36] D. Fernández-Mota, J. Almazán, N. Cirera, A. Fornés, and J. Lladós,
acter segmentation using recurrent neural network,” Pattern Recognit., “Bh2m: The barcelona historical, handwritten marriages database,” in
vol. 63, pp. 397–405, 2017. Proc. 22nd Int. Conf. Pattern Recognit., 2014, pp. 256–261.
[16] Z. Xie, Z. Sun, L. Jin, H. Ni, and T. Lyons, “Learning spatial-semantic [37] V. Romero et al., “The Esposalles database: An ancient marriage license
context with fully convolutional recurrent network for online handwritten corpus for off-line handwriting recognition,” Pattern Recognit., vol. 46,
chinese text recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1658–1669, 2013.
no. 8, pp. 1903–1917, Aug. 2018. [38] E. Augustin, M. Carré, E. Grosicki, J.-M. Brodin, E. Geoffrois, and F.
[17] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Synthetic Prêteux, “Rimes evaluation campaign for handwritten mail processing,”
data and artificial neural networks for natural scene text recognition,” in in Proc. Int. Workshop Frontiers Handwriting Recognit., 2006, pp. 231–
Workshop Deep Learn., NIPS, 2014. 235.
[18] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural [39] E. Grosicki and H. El-Abed, “ICDAR 2011-french handwriting recogni-
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. tion competition,” in Proc. Int. Conf. Doc. Anal. Recognit., 2011, pp. 1459–
[19] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist 1463.
temporal classification: Labelling unsegmented sequence data with re- [40] J. I. Toledo, S. Dey, A. Fornés, and J. Lladós, “Handwriting recognition by
current neural networks,” in Proc. 23rd Int. Conf. Mach. Learn, 2006, attribute embedding and recurrent neural networks,” in Proc. Doc. Anal.
pp. 369–376. Recognit.14th IAPR Int. Conf., 2017, vol. 1, pp. 1038–1043.
[20] H. Zhan, Q. Wang, and Y. Lu, “Handwritten digit string recognition by [41] B. Stuner, C. Chatelain, and T. Paquet, “Self-training of BLSTM with
combination of residual network and RNN-CTC,” in Proc. Int. Conf. lexicon verification for handwriting recognition,” in Proc. 14th IAPR Int.
Neural Inf. Process., 2017, pp. 583–591. Conf. Doc. Anal. Recognit., 2017, pp. 633–638.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy