Skip to content

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

License

Notifications You must be signed in to change notification settings

githubharald/CTCDecoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTC Decoding Algorithms with Language Model

Connectionist Temporal Classification (CTC) decoding algorithms are implemented as Python scripts. A minimalistic Language Model (LM) is used.

Run demo

Go to the src/ directory and run the script python main.py.

Expected results:

=====Mini example=====
TARGET       : "a"
BEST PATH    : ""
PREFIX SEARCH: "a"
BEAM SEARCH  : "a"
TOKEN        : "a"
PROB(TARGET) : 0.64
LOSS(TARGET) : 0.4462871026284195
=====Real example=====
TARGET        : "the fake friend of the family, like the"
BEST PATH     : "the fak friend of the fomly hae tC"
PREFIX SEARCH : "the fak friend of the fomcly hae tC"
BEAM SEARCH   : "the fak friend of the fomcly hae tC"
BEAM SEARCH LM: "the fake friend of the family, lie th"
TOKEN         : "the fake friend of the family fake the"
PROB(TARGET)  : 6.314726428865645e-13
LOSS(TARGET)  : 28.090721774903226
=====Real example: GPU=====
Compute for 1000 batch elements.
TARGET        : "the fake friend of the family, like the"
BEST PATH GPU : "the fak friend of the fomly hae tC"

Provided algorithms

  • Best Path Decoding: takes best label per time-step to compute best path, then removes repeated labels and CTC-blanks from this path. File: BestPath.py for CPU implementation and BestPathCL.py/BestPathCL.cl for GPU implementation [1]
  • Prefix Search Decoding: best-first search through tree of labelings. File: PrefixSearch.py [1]
  • Beam Search Decoding: iteratively searches for best labeling, optionally uses a character-level LM. File: BeamSearch.py [2]
  • Token Passing: searches for most probable word sequence. The words are constrained to those contained in a dictionary. Can be extended to use a word-level LM. File: TokenPassing.py [1]
  • Word Beam Search: TensorFlow implementation see repository CTCWordBeamSearch
  • Loss: calculates probability and loss of a given text in the RNN output. File: Loss.py [1]

Choosing the right algorithm

This paper compares beam search decoding and token passing. It gives suggestions when to use best path decoding, beam search decoding and token passing.

Testcases

The ground-truth text of the Real example testcase is "the fake friend of the family, like the" and is a sample from the IAM Handwriting Database [4]. RNN output was generated by a partially trained TensorFlow model inspired by CRNN [3] which essentially is a larger version of the SimpleHTR model. The visualization below shows the input image and the RNN output matrix with 100 time-steps and 80 classes (the last one being the CTC-blank). Each column sums to 1 and each entry represents the probability of seeing a label at a given time-step.

img

Illustration of the Mini example testcase: the RNN output matrix contains 2 time-steps (t0 and t1) and 3 labels (a, b and - representing the CTC-blank). Best path decoding (see left figure) takes the most probable label per time-step which gives the path "--" and therefore the recognized text "" with probability 0.6*0.6=0.36. Beam search, prefix search and token passing calculate the probability of labelings. For the labeling "a" these algorithms sum over the paths "-a", "a-" and "aa" (see right figure) with probability 0.6*0.4+0.4*0.6+0.4*0.4=0.64. The only path which gives "" still has probability 0.36, therefore "a" is the result returned by beam search, prefix search and token passing.

ctc

Data files

  • data/rnnOutput.csv: output of RNN layer (softmax not yet applied), which contains 100 time-steps and 80 label scores per time-step.
  • data/corpus.txt: the text from which the language model is generated. For this testcase it simply contains the words from the ground-truth text in random order.

Notes

The provided Python scripts are intended for tests and experiments. For productive use I recommend implementing these algorithms in C++ (for performance reasons). A C++ implementation can easily be integrated into deeplearning-frameworks such as TensorFlow (see CTCWordBeamSearch for an example).

A GPU implementation is provided for best path decoding which requires pyopencl installed and setting useGPU = True in main.py.

The provided loss function is of course not a decoding algorithm by itself. However, as shown by Shi [3], it is possible to use the loss value to decode words: first, best path decoding is used to give a first approximation. A dictionary is then queried to yield all word candidates which are similar to this approximation. Finally, the loss is calculated for each of these candidates and the best scoring word is taken as the result.

References

[1] Graves - Supervised sequence labelling with recurrent neural networks

[2] Hwang - Character-level incremental speech recognition with recurrent neural networks

[3] Shi - CRNN: https://github.com/bgshih/crnn

[4] Marti - IAM dataset: http://www.fki.inf.unibe.ch/databases/iam-handwriting-database

About

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •  
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy