0% found this document useful (0 votes)

199 views13 pages

Speech Recognition Architecture

This document discusses the key components of an automatic speech recognition system, including pre-processing of acoustic signals, feature extraction, acoustic modeling, language modeling, pattern classification, and part-of-speech tagging. It describes how speech is converted to digital signals and analyzed frame-by-frame to extract features before being matched to acoustic models using techniques like hidden Markov modeling. Language models then incorporate structural constraints to distinguish words with similar pronunciations.

Uploaded by

Dhrumil Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

199 views13 pages

Speech Recognition Architecture

Uploaded by

Dhrumil Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Speech Recognition

Architecture
• The fundamental aspect of speech recognition is the translation of sound into text and
commands.
• Speech recognition is the process by which computer maps an acoustic speech signal to some
form of abstract meaning of the speech.

Automatic Speech Recognition System

• Pre-processing/Digital Processing:
• The recorded acoustic signal is an analog signal.
• An analog signal cannot directly transfer to the ASR systems.
• So these speech signals need to transform in the form of digital signals and then only they can
be processed.
• These digital signals are move to the first order filters to spectrally flatten the signals.
• This procedure increases the energy of signal at higher frequency.
• Feature Extraction
• Feature extraction step finds the set of parameters of utterances that have acoustic correlation
with speech signals and these parameters are computed through processing of the acoustic
waveform.
• These parameters are known as features.
• The main focus of feature extractor is to keep the relevant information and discard irrelevant
one.
• To act upon this operation, feature extractor divides the acoustic signal into 10-25 ms.
• Data acquired in these frames is multiplied by window function.
• There are many types of window functions that can be used such as hamming Rectangular,
Blackman, Welch or Gaussian etc. In this way features have been extracted from every frame.
• There are several methods for feature extraction such as Mel-Frequency Cepstral Coefficient
(MFCC), Linear Predictive Cepstral Coefficient (LPCC), Perceptual Linear Prediction (PLP),
wavelet and RASTA-PLP (Relative Spectral Transform) Processing etc.
• Acoustic Modeling
• The connection between the acoustic information and phonetics is established.
• Acoustic model plays important role in performance of the system and responsible for
computational load.
• Training establishes co-relation between the basic speech units and the acoustic observations.
• Training of the system requires creating a pattern representative for the features of class using
one or more patterns that correspond to speech sounds of the same class.
• Many models are available for acoustic modeling out of them Hidden Markov Model (HMM)
is widely used and accepted as it is efficient algorithm for training and recognition
• Language Modeling
• A language model contains the structural constraints available in the language to generate the
probabilities of occurrence.
• It induces the probability of a word occurrence after a word sequence.
• The language model distinguishes word and phrase that has similar sound.
• For example, in American English, the phrases like “recognize speech" and "wreck a nice
beach" have same pronunciation but mean very different things.
• These ambiguities are easier to resolve when evidence from the language model is incorporated
with the pronunciation model and the acoustic model.
• Pattern Classification
• Pattern Classification (or recognition) is the process of comparing the unknown test pattern with
each sound class reference pattern and computing a measure of similarity between them.
• After completing training of the system at the time of testing patterns are classified to recognize
the speech.
• Part of Speech Tagging
• Part-of-speech (POS) tagging is a process in natural language processing (NLP) where each
word in a text is labeled with its corresponding part of speech.
• This can include nouns, verbs, adjectives, and other grammatical categories.
• POS tagging is useful for a variety of NLP tasks, such as information extraction, named entity
recognition, and machine translation.
• It can also be used to identify the grammatical structure of a sentence and to disambiguate
words that have multiple meanings.
• POS tagging is typically performed using machine learning algorithms, which are trained on a
large annotated corpus of text.
• The algorithm learns to predict the correct POS tag for a given word based on the context in
which it appears.
• Let’s take an example,
• Text: “The cat sat on the mat.”
• POS tags:
• The: determiner
• cat: noun
• sat: verb
• on: preposition
• the: determiner
• mat: noun
• Use of Parts of Speech Tagging in NLP
• To understand the grammatical structure of a sentence
• To disambiguate words with multiple meanings
• To improve the accuracy of NLP tasks
• To facilitate research in linguistics
• Steps Involved in the POS tagging
• Collect a dataset of annotated text
• Preprocess the text
• Divide the dataset into training and testing sets
• Train the POS tagger
• Test the POS tagger
• Fine-tune the POS tagger
• Use the POS tagge
• Implement Parts-Of-Speech tags using Spacy in Python
pip install spacy
python -m spacy download en_core_web_sm
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is planning to buy Indian startup for $1 billion")
for token in doc:
print(token, "|", token.pos_,"|", spacy.explain(token.pos_),"|",token.tag_,
spacy.explain(token.tag_))
• token.pos_ will give the POS tag of the specific token
• o/p:
• Output
• Apple | PROPN | proper noun | NNP noun, proper singular
• is | AUX | auxiliary | VBZ verb, 3rd person singular present
• planning | VERB | verb | VBG verb, gerund or present participle
• to | PART | particle | TO infinitival "to"
• buy | VERB | verb | VB verb, base form
• Indian | ADJ | adjective | JJ adjective (English), other noun-modifier (Chinese)
• startup | NOUN | noun | NN noun, singular or mass
• for | ADP | adposition | IN conjunction, subordinating or preposition
• $ | SYM | symbol | $ symbol, currency
• 1 | NUM | numeral | CD cardinal number
• billion | NUM | numeral | CD cardinal number
• Defining a tag set
• We have to define an inventory of labels for the word classes (i.e. the tag set)
-Most taggers rely on models that have to be trained on annotated (tagged) corpora. Evaluation
also requires annotated corpora.
-Since human annotation is expensive/time-consuming, the tag sets used in a few existing
labeled corpora become the de facto standard.
-Tag sets need to capture semantically or syntactically important distinctions that can easily be
made by trained human annotators.
Word classes
Open classes:
Nouns, Verbs, Adjectives, Adverbs
Closed classes:
Auxiliaries and modal verbs, Prepositions, Conjunctions Pronouns, Determiners,Particles,
Numerals
• Defining a tag set
• Tag sets have different granularities: Brown corpus (Francis and Kucera 1982): 87 tags Penn
• Treebank (Marcus et al. 1993): 45 tags Simplified version of Brown tag set (de facto standard
for English now)
NN: common noun (singular or mass): water, book
NNS: common noun (plural): books Prague
• Dependency Treebank (Czech): 4452 tags
Complete morphological analysis: AAFP3----3N----: nejnezajímavějším
Adjective Regular Feminine Plural Dative….Superlative
• How much ambiguity is there?
Most word types are unambiguous:
Number of tags per word type:
• NB: These numbers are based on word/tag combinations in the corpus. Many combinations
that don’t occur in the corpus are equally correct.
• But a large fraction of word tokens are ambiguous Original Brown corpus: 40% of tokens
are ambiguous
• Qualitative evaluation
• Generate a confusion matrix (for development data): How often was a word with tag i
mistagged as tag j:
• See what errors are causing problems: -Noun (NN) vs ProperNoun (NNP) vs Adj (JJ) -
Preterite (VBD) vs Participle (VBN) vs Adjective (JJ)

NLP Iat QB
No ratings yet
NLP Iat QB
10 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
Unit - 3 NLP - R20
No ratings yet
Unit - 3 NLP - R20
21 pages
UNIT_5_DL
No ratings yet
UNIT_5_DL
11 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Lec3-posner intro
No ratings yet
Lec3-posner intro
30 pages
Cs8080informationretrievaltechniquesunit Ipptpdfversion 220423092105
No ratings yet
Cs8080informationretrievaltechniquesunit Ipptpdfversion 220423092105
240 pages
NLP Unit-2 Notes
No ratings yet
NLP Unit-2 Notes
45 pages
2 18 Covariance
No ratings yet
2 18 Covariance
34 pages
Semantic Information Retrieval
No ratings yet
Semantic Information Retrieval
168 pages
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
No ratings yet
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
36 pages
CHP - 1 - Fundamentals of Digital Image Min
No ratings yet
CHP - 1 - Fundamentals of Digital Image Min
15 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
103 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
9 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
Langauage Model
No ratings yet
Langauage Model
148 pages
Chapter 6
100% (1)
Chapter 6
28 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
IS 7118 Unit-9 Semantics
No ratings yet
IS 7118 Unit-9 Semantics
82 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
10 pages
Data Visualization PDF
No ratings yet
Data Visualization PDF
90 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
Semantic Analysis: Natural Language Processing (CSE 5321)
No ratings yet
Semantic Analysis: Natural Language Processing (CSE 5321)
35 pages
Completed Final UNIT-V 9.10.17
100% (1)
Completed Final UNIT-V 9.10.17
74 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
Shivangi Tyagi (NLP Assignments)
No ratings yet
Shivangi Tyagi (NLP Assignments)
60 pages
Completed Unit II 17.7.17
No ratings yet
Completed Unit II 17.7.17
113 pages
It6005 Digital Image Processing - QB
No ratings yet
It6005 Digital Image Processing - QB
21 pages
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
0% (1)
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
2 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
NLP End Sem Paper - Evaluation Scheme
No ratings yet
NLP End Sem Paper - Evaluation Scheme
14 pages
NLP UNIT 1 (Ques Ans Bank)
No ratings yet
NLP UNIT 1 (Ques Ans Bank)
20 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
2 pages
Dependency Parsing: Pawan Goyal
No ratings yet
Dependency Parsing: Pawan Goyal
38 pages
Question Paper Code: X11182
No ratings yet
Question Paper Code: X11182
2 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
NLP Unit 5
No ratings yet
NLP Unit 5
10 pages
Model Question Paper
0% (1)
Model Question Paper
2 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
14 pages
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
(A) What Is Traditional Model of NLP?: Unit - 1
No ratings yet
(A) What Is Traditional Model of NLP?: Unit - 1
18 pages
Question Bank
No ratings yet
Question Bank
13 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Lecture NLP
100% (1)
Lecture NLP
38 pages
Solutions To NLP I Mid Set A
100% (1)
Solutions To NLP I Mid Set A
8 pages
Com713 Advanced Data Structures and Algorithms
No ratings yet
Com713 Advanced Data Structures and Algorithms
13 pages
Unit-I Introduction To Image Processing
No ratings yet
Unit-I Introduction To Image Processing
23 pages
Natural Language Processing
100% (1)
Natural Language Processing
21 pages
Unit I
No ratings yet
Unit I
30 pages
17EC72 DIP Question Bank
No ratings yet
17EC72 DIP Question Bank
12 pages
Basic Relationship Between Pixels
No ratings yet
Basic Relationship Between Pixels
22 pages
01cs6105 s1 Advanced Data Structures and Algorithms
No ratings yet
01cs6105 s1 Advanced Data Structures and Algorithms
2 pages
Information Retrieval Systems (A70533)
No ratings yet
Information Retrieval Systems (A70533)
11 pages
Ch11 3 Tries
No ratings yet
Ch11 3 Tries
11 pages
CS6007 Information Retrieval
No ratings yet
CS6007 Information Retrieval
8 pages
Cp7004 Image Processing and Analysis 1
No ratings yet
Cp7004 Image Processing and Analysis 1
8 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Speech Recognition Architecture

Uploaded by

Speech Recognition Architecture

Uploaded by

Speech Recognition

Automatic Speech Recognition System

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.