HMM Isolated Word Recognition
HMM Isolated Word Recognition
Recommended texts:
Spoken Language Processing, Chapter 8
Speech and Language Processing, Appendix A
Statistical recognition
• I. Recognition of isolated words
– Training: creation of a model for each word.
– Recognition: To determine the best match
model/utterance
Model W1
Model W2 Training
speech
Model WN
Parameterizer recognized
speech
Classifier Decision
Recognition
Hidden Markov Model (HMM)
T11 T22 T33
T12 T23
T13
Markov chain
Data distributions
HMMs
HMM as a statistical model
• An HMM is a statistical model for a time-varying process
• The process is always in one of a countable number of states at
any time
HMM assumed to be
generating data
state sequence
state distributions
Observation
sequence
HMMs
HMMs are abstractions
• The states are not directly observed
– Here states of the process are analogous to configurations of the vocal tract that
produces the signal
– We only hear the speech; we do not see the vocal tract
– i.e. the states are hidden
• In equispaced intervals, the system may change its state with probability aij
aij = P(qt = j | qt-1= i) aij>0,
Ergodic 1
Phonotactic recognition
a12
a
13
a a31
21
a
23
a22 2 3 a 33
a32
a a a a
1 12 2 23 3 34 4 (n-1)n n
Problems to be solved
• Evaluation: Given a sequence of observations O = o1
o2...oT and a model l, what is the probability P(O|l)
that the model generates the observations?.
• Decoding: Given a sequence of observations O = o1
o2...oT and a model l, find the optimum sequence of
states Q=q1,...,qT
• Training: Given the model l=(p,A,B) and a set of
training sequences, how to adjust the model
parameters l to maximize the joint probability P(O|l)
Evaluation
Given the sequence O = o1 o2...oT and the model
l= (p,A,B), calculate P(O)
Solution: Let’s suppose a sequence of states:
Q= q1q2...qT, that have created the observations.
P(Q) = π q aq q ...aq
1 1 2 T −1qT
• Viterbi’s algorithm
Sequence (backtracking)
qt* = yt+1( qt+1*) t = T-1, T-2, ...1
State Initialization
4
3
2
1
1 2 T signal
Initialization: For every
state, we calculate the
pjbj(O1) probability of generating
the first observation.
State Recursion
4
3
2
1
1 2 T signal
dt-1(4) a42 In t, we calculate the highest
a32 probability of arriving at the state
n from the others and generating
a22 dt(2) the observation Ot
dt-1(1) a12 In yt(n) we note down the state
from which we achieve the
highest probability
b2(Ot)
Ending: backtracking
State
4
3
2
1
1 2 T signal
Sequence (backtracking)
qt* = yt+1( qt+1*) t = T-1, T-2, ...1
Training or parameter
estimation
HMM λ0 P(O|λ0)
speech
HMM λM P(O|λM)
• Each HMM model is trained with its own set of recordings of isolated words.
• Each test recording has a single isolated word
• P(O|λi) can be computed by the forward algorithm. In practice, the Viterbi
algorithm ( max P(Q,O|λi) ) is faster and gives the same accuracy.