0% found this document useful (0 votes)
13 views23 pages

HMM Isolated Word Recognition

The document discusses Hidden Markov Models (HMMs) for the recognition of isolated words, detailing the training and recognition processes, as well as the structure and components of HMMs. It covers the evaluation, decoding, and training problems associated with HMMs, including algorithms like the Forward-Backward and Viterbi algorithms for efficient computation. The document emphasizes the abstraction of states in HMMs and their application in speech recognition tasks.

Uploaded by

constanzaelf07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views23 pages

HMM Isolated Word Recognition

The document discusses Hidden Markov Models (HMMs) for the recognition of isolated words, detailing the training and recognition processes, as well as the structure and components of HMMs. It covers the evaluation, decoding, and training problems associated with HMMs, including algorithms like the Forward-Backward and Viterbi algorithms for efficient computation. The document emphasizes the abstraction of states in HMMs and their application in speech recognition tasks.

Uploaded by

constanzaelf07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Hidden Markov Models I

Recognition of isolated words

Recommended texts:
Spoken Language Processing, Chapter 8
Speech and Language Processing, Appendix A
Statistical recognition
• I. Recognition of isolated words
– Training: creation of a model for each word.
– Recognition: To determine the best match
model/utterance

• II. Large Vocabulary Speech Recognition (LVSR)


– Subword units (phones) in context, e.g., m-a+r
– Phonetic dictionary
– Language model P(house | the white)
Isolated words. General scheme

Model W1

Model W2 Training

speech
Model WN
Parameterizer recognized
speech

Classifier Decision

Recognition
Hidden Markov Model (HMM)
T11 T22 T33

T12 T23

T13

• Generic representation of a statistical model for processes that


generate time series. The HMM is a sequence model.
• The “segments” in the time series are referred to as states: the
process passes through these states to generate time series
• The entire structure may be viewed as a generalization of DTW
models
Bhiksha Raj and Rita Singh (CMU)
Hidden Markov Models

• A Hidden Markov Model consists of two components


– A state/transition backbone that specifies how many states there
are, and how they can follow one another
– A set of probability distributions, one for each state, which
specifies the distribution of all vectors in that state

Markov chain

Data distributions

HMMs
HMM as a statistical model
• An HMM is a statistical model for a time-varying process
• The process is always in one of a countable number of states at
any time

• When the process visits in any state, it generates an


observation by a random draw from a distribution associated
with that state

• The process constantly moves from state to state. The


probability that the process will move to any state is
determined solely by the current state
– i.e. the dynamics of the process are Markovian

• The entire model represents a probability distribution over the


sequence of observations
– It has a specific probability of generating any particular sequence
HMMs
– The probabilities of all possible observation sequences sums to 1
How an HMM models a process

HMM assumed to be
generating data

state sequence

state distributions

Observation
sequence

HMMs
HMMs are abstractions
• The states are not directly observed
– Here states of the process are analogous to configurations of the vocal tract that
produces the signal
– We only hear the speech; we do not see the vocal tract
– i.e. the states are hidden

• The interpretation of states is not always obvious


– The vocal tract actually goes through a continuum of configurations
– The model represents all of these using only a fixed number of states

• The model abstracts the process that generates the data


– The system goes through a finite number of states
– When in any state it can either remain at that state, or go to another with some
probability
– When at any states it generates observations according to a distribution associated with
that state

17 March 2007 HMMs


HMM Parameters
0.6 0.7
• The topology of the HMM
0.4

– No. of states and allowed


transitions 0.5 0.3

– E.g. here we have 3 states and


cannot go from the blue state to
the red 0.5

• The transition probabilities æ .6 .4 0 ö


ç ÷
– Often represented as a matrix as T = ç 0 .7 .3 ÷
here ç .5 0 .5 ÷
è ø
– Tij is the probability that when in
state i, the process will move to j
• The probability of beginning at a
particular state
• The state output distributions

17 March 2007 HMMs


HMM state output distributions
• The state output distribution represents the distribution of data produced from
any state. We can have:
• Discrete probabilities (DHMM) , e.g., Vector Quantization (k-means)
• Continuous probabilities , e.g., Gaussian Mixture Model (GMM), DNN

Bhiksha Raj and Rita Singh (CMU)


Discrete Hidden Markov models
a 22
a11 a33
a13
a12 b3(ot)
b1(ot) a23
1 a21
2 a32
3
• N states {1...N} b2(ot) a31
• At instant of time t the system will be in a specific state si

• In equispaced intervals, the system may change its state with probability aij
aij = P(qt = j | qt-1= i) aij>0,

• Every time a transition occurs, the system generates an observation from a


finite alphabet that depends on the state to which has moved.
b j (Ot ) = P (Ot qt = j )
• There is no longer a one-to-one correspondence between the observation
sequence and the state sequence, so you cannot unanimously determine the
state sequence for a given observation sequence; i.e., the state sequence is
not observable and therefore hidden.
Types of models
a
11

Ergodic 1
Phonotactic recognition
a12
a
13

a a31
21

a
23

a22 2 3 a 33

a32

Left-right or Backis’ Phonetic units modeling


a a a a a
11 22 33 44 nn

a a a a
1 12 2 23 3 34 4 (n-1)n n
Problems to be solved
• Evaluation: Given a sequence of observations O = o1
o2...oT and a model l, what is the probability P(O|l)
that the model generates the observations?.
• Decoding: Given a sequence of observations O = o1
o2...oT and a model l, find the optimum sequence of
states Q=q1,...,qT
• Training: Given the model l=(p,A,B) and a set of
training sequences, how to adjust the model
parameters l to maximize the joint probability P(O|l)
Evaluation
Given the sequence O = o1 o2...oT and the model
l= (p,A,B), calculate P(O)
Solution: Let’s suppose a sequence of states:
Q= q1q2...qT, that have created the observations.

P(Q) = π q aq q ...aq
1 1 2 T −1qT

P(O,Q) = π q bq (o1 )aq q bq (o2 )...aq bq (oT )


1 1 1 2 2 T −1qT T

P(O) = ∑ π q bq (o1 )aq q bq (o2 )...aq b (oT )


1 1 1 2 2 T −1qT qT
all Q
Efficient solution
• The direct method requires 2TNT operations
1sec:100 observations and 5 states: 1072 calculations

• Forward backward algorithm


– Accumulates operations in sequences with repeated paths
– #operations N2T = 2500 operations
Most likely sequence of states

• Given a sequence of observations, searches, in


every instant t, the most likely sequence of states

• Viterbi’s algorithm

• #operations: N2T (multiplications or additions)


Viterbi’s algorithm
Sequence of states with the highest probability
dt(j) = max P(q1 q2 ...qt-1 qt = j , o1 o2 ...ot )
q1,q2...qt-1

Initialization d1(i) = pi bi(o1)


y1(i) = 0 1£i£N

Recursion: dt(j) = max [dt-1(i) aij] bj(ot) 2£t£T


1£i£N

yt(j) = arg max [dt-1(i) aij] 1£j£N


1£i£N

Ending P* = max [dT(i)]


1£i£N

qT*=arg max [dT(i)]


1£i£N

Sequence (backtracking)
qt* = yt+1( qt+1*) t = T-1, T-2, ...1
State Initialization

4
3
2
1
1 2 T signal
Initialization: For every
state, we calculate the
pjbj(O1) probability of generating
the first observation.
State Recursion
4
3
2
1
1 2 T signal
dt-1(4) a42 In t, we calculate the highest
a32 probability of arriving at the state
n from the others and generating
a22 dt(2) the observation Ot
dt-1(1) a12 In yt(n) we note down the state
from which we achieve the
highest probability
b2(Ot)
Ending: backtracking

State

4
3
2
1
1 2 T signal

Sequence (backtracking)
qt* = yt+1( qt+1*) t = T-1, T-2, ...1
Training or parameter
estimation

• Given the sequence O = o1 o2... oT adjust the model


l= (p,A,B) that maximizes P(O|l)
• Iterative solution: Baum Welch
– start from an initial model
– pass the training sequence and re-estimate the model
parameters
– P(O|l) > P(O| l)
Initialization
• Manual
• Automatic
– Uniform matrix A
– Segmentation of the training sequences into number of
states
– Grouping of the spectra in every state
– Estimation of B
– Iterate until convergence
• Iterate: using Viterbi algorithm to segment the training sequence into states
improves the initialization
Isolated words. General scheme

HMM λ0 P(O|λ0)
speech

O = o1 o2...oT HMM λ1 P(O|λ1)


Parameterizer

HMM λM P(O|λM)

• Each HMM model is trained with its own set of recordings of isolated words.
• Each test recording has a single isolated word
• P(O|λi) can be computed by the forward algorithm. In practice, the Viterbi
algorithm ( max P(Q,O|λi) ) is faster and gives the same accuracy.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy