Use log probabilities in beam search. #23

a-sneddon · 2021-07-26T06:05:20Z

Implemented log-space operations in beam search to avoid underflow when dealing with regular probabilities.

githubharald · 2021-07-26T13:28:54Z

ctc_decoder/beam_search.py

-        child_beam.pr_text = parent_beam.pr_text * bigram_prob  # probability of char sequence
+        bigram_prob = lm_factor * np.log(lm.get_char_bigram(c1, c2))
+        if parent_beam.is_empty():
+            child_beam.pr_text = bigram_prob  # first char in beam


no special case needed (no if else), as an empty beam is initialized with pr_text=log(1)=0, so adding parent_beam.pr_text + bigram_prob is OK

Thanks for picking this up.

githubharald · 2021-07-26T13:29:28Z

ctc_decoder/beam_search.py

@@ -75,8 +81,8 @@ def beam_search(mat: np.ndarray, labels: str, beam_width: int = 25, lm: Optional
    last = BeamState()
    labeling = ()
    last.entries[labeling] = BeamEntry()
-    last.entries[labeling].pr_blank = 1
-    last.entries[labeling].pr_total = 1
+    last.entries[labeling].pr_blank = LOG_ZERO


should be log(1)==0 instead of LOG_ZERO which is -inf

That's true, that allows the below if statements to be cleaned up also, thanks for picking this up.

githubharald · 2021-07-26T13:31:30Z

ctc_decoder/beam_search.py

            # in case of non-empty beam
            if labeling:
                # probability of paths with repeated last char at the end
-                pr_non_blank = last.entries[labeling].pr_non_blank * mat[t, labeling[-1]]


"cannot add to -inf" -> I think this is due to initialization to log(0) instead of log(1).
-inf + x == -inf, which makes sense, because 0*x=0.

githubharald · 2021-07-26T13:31:57Z

ctc_decoder/beam_search.py


            # probability of paths ending with a blank
-            pr_blank = last.entries[labeling].pr_total * mat[t, blank_idx]
+            if last.entries[labeling].pr_total == LOG_ZERO:


same as above

githubharald · 2021-07-26T13:32:58Z

ctc_decoder/beam_search.py


            # probability of paths ending with a blank
-            pr_blank = last.entries[labeling].pr_total * mat[t, blank_idx]
+            if last.entries[labeling].pr_total == LOG_ZERO:
+                pr_blank = np.log(mat[t, blank_idx]) # cannot add to -inf


directly using np.log creates warning outputs for np.log(0).
Define some wrapper function which takes care of that:

def log(x): with np.errstate(divide='ignore'): return np.log(x)

githubharald · 2021-07-26T13:36:16Z

thanks for your contribution. Using log-probs is a good idea 👍.
There are a few minor things I suggest to change, to have as few if-else branches directly in the main decoding function, and to avoid having numpy print warning messages to the console.

githubharald · 2021-07-26T20:06:11Z

I anyway will push some changes to the repo, so I'll integrate your changes and will do the small modifications myself.
Thanks again for your contribution 👍 .

a-sneddon · 2021-07-27T00:36:05Z

Thanks @githubharald for accepting the pull request. I've taken a look at the changes you've implemented since and they look good, thanks for your improvements and for maintaining the repo.

Use log probabilities in beam search.

2e2ad04

githubharald reviewed Jul 26, 2021

View reviewed changes

githubharald merged commit 8766399 into githubharald:master Jul 26, 2021

a-sneddon deleted the log-probs branch July 27, 2021 00:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use log probabilities in beam search. #23

Use log probabilities in beam search. #23

Uh oh!

a-sneddon commented Jul 26, 2021

Uh oh!

githubharald Jul 26, 2021

Uh oh!

a-sneddon Jul 27, 2021

Uh oh!

githubharald Jul 26, 2021

Uh oh!

a-sneddon Jul 27, 2021

Uh oh!

githubharald Jul 26, 2021 •

edited

Loading

Uh oh!

githubharald Jul 26, 2021

Uh oh!

githubharald Jul 26, 2021

Uh oh!

githubharald commented Jul 26, 2021

Uh oh!

githubharald commented Jul 26, 2021

Uh oh!

a-sneddon commented Jul 27, 2021

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Use log probabilities in beam search. #23

Use log probabilities in beam search. #23

Uh oh!

Conversation

a-sneddon commented Jul 26, 2021

Uh oh!

githubharald Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

a-sneddon Jul 27, 2021

Choose a reason for hiding this comment

Uh oh!

githubharald Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

a-sneddon Jul 27, 2021

Choose a reason for hiding this comment

Uh oh!

githubharald Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

githubharald Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

githubharald Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

githubharald commented Jul 26, 2021

Uh oh!

githubharald commented Jul 26, 2021

Uh oh!

a-sneddon commented Jul 27, 2021

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

githubharald Jul 26, 2021 •

edited

Loading