0% found this document useful (0 votes)

74 views29 pages

Unit 2 Sound or Audio System

The document outlines the fundamentals of sound and audio systems, including the definition of sound, its properties such as frequency and amplitude, and the science of acoustics. It discusses speech generation techniques, including concatenative synthesis, parametric synthesis, and neural network-based models, highlighting advancements in natural-sounding speech. Additionally, it covers speech analysis, transmission methods, and the importance of efficient coding for maintaining sound quality during transmission.

Uploaded by

Arun shrestha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views29 pages

Unit 2 Sound or Audio System

Uploaded by

Arun shrestha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Unit 2

Sound/Audio System
syllabus
Concept of sound system, Music and speech, speech analysis, speech
transformation

Prepared by: Er. Hemanta Bohora

Introduction
Sound is a type of energy that travels through matter in waves, usually through air, but
also through liquids and solids. At its core, sound is produced when an object vibrates,
causing the surrounding molecules to vibrate as well. This vibration travels in waves until
it reaches a listener's ear, where it’s perceived as sound.
• Definition: Sound is energy that moves in waves through a medium like air, water, or solids.
• Frequency: Determines pitch; higher frequency means a higher pitch, and lower frequency
means a lower pitch
• Creation: It’s generated by vibrations in an object, which cause surrounding molecules to vibrate
and transmit the sound wave.
• Amplitude: Relates to volume; higher amplitude results in louder sounds, while lower amplitude
results in quieter sounds.
• Human Perception: Sound waves reach the ear, causing the eardrum to vibrate, which the
brain interprets as sound.
• Medium Requirement: Sound needs a medium (like air or water) to travel; it cannot move
through a vacuum.
• Field of Study (Acoustics): Acoustics is the science of sound, exploring its production,
transmission, and effects in various applications.
.
.
.
Speech Generation
• Speech Generation is the process of creating spoken language from
text or other forms of input.
• It encompasses technologies like text-to-speech (TTS) systems,
conversational AI, and voice cloning, and is critical in applications such
as virtual assistants, accessibility tools, and robotics.
• The field has seen significant advancements with the introduction of
neural network-based models like WaveNet and Tacotron, which
produce highly natural, expressive speech.
• Challenges remain in making speech generation more natural, context-
sensitive, and scalable, especially for multilingual or emotionally varied
speech. Despite these challenges, speech generation continues to
improve, enhancing human-computer interaction and accessibility
across numerous industries
Techniques Used in Speech Generation
1.Concatenative Synthesis:
1. A popular method for generating speech where prerecorded human speech segments (like
syllables or phonemes) are stitched together to form sentences.
2. Limitations: While this approach can produce highly natural-sounding speech, it requires a
large database of recorded voice data and may sound robotic if not done correctly.
2.Parametric Synthesis:
1. Speech generation models that create audio waves based on parameters like pitch,
duration, and voice quality. Formant synthesis and HMM-based synthesis are examples of
parametric synthesis methods.
2. Limitations: Though efficient, the speech generated is often less natural-sounding
compared to concatenative synthesis.
3.Neural Network-Based Models:
1. WaveNet (by DeepMind): A deep neural network that directly generates raw audio waves, achieving much
higher naturalness in speech than traditional methods.
2. Tacotron and Tacotron 2: Text-to-speech systems that convert text into spectrograms, which are then turned
into speech waveforms using another model (such as a vocoder).
3. Prosody Prediction: Neural networks are also employed to predict and generate natural-sounding prosody
(intonation, pitch, rhythm) in generated speech.
4. End-to-End Models:
1. Recent advances involve end-to-end neural models that can directly generate
speech from text without the need for separate steps like phonetic transcription,
linguistic analysis, or waveform generation. These models learn the entire process
in one unified pipeline.
2. Example: FastSpeech and FastSpeech 2 are end-to-end systems that produce
speech with high efficiency and naturalness.
5. Voice Cloning:
1. Voice cloning involves generating speech that mimics a specific person’s voice using
a limited amount of recorded data. This is typically done through deep learning
models that capture the unique characteristics of a person’s voice and speech
patterns.
2. Applications: Personalized virtual assistants, content creation, and accessibility
tools.
Basic Notations:
- The lowest periodic spectral component of the speech signal is called the
fundamental frequency. It is presented in a voiced sound.
- A phone is the smallest speech unit, such as the m of mat and b of bat in
English.
- Allophones mark the variants of phone. For example, the aspirated p of pit and
the unaspirated p of spit are allophones of the English phenome P.
- The morph marks the smallest unit which carries a meaning itself.
- A voiced sound is generated through the vocal cords; m, v and l are examples
of voiced sound. The pronunciation of a voiced depends strongly on each
speaker.
- During the generation of unvoiced sound, the vocal cords are opened f and s is
unvoiced sound.
- Vowels – a speech created by the relatively free passage of breath through the
larynx and oral charity. Example, a, e, I, o and u
- Consonants – a speech sound produced by a partial or complete obstruction of
the air stream by any of the various contradictions of the speech organs.
Example, m from mother, ch from chew.
Speech Analysis:
Speech analysis/input deals with the research areas which are as follows:

(1) Who?
-Human speech has certain characteristics determined by a speaker. Hence
speech analysis can serve to analyze who is speaking i.e. to recognize a
speaker for his/her identification and verification.
(2) What?
- Another main task of speech analysis is to analyze what has been said i.e. to
recognize and understand the speech signal itself.
(3) How?
Another area of speech analysis tries to research speech patterns with respect
to how a certain statement was said.
Figure: - Speech recognition system: task division into system components, using the
basic principle “Data Reduction through Property Extraction”

Speech Transmission:
- The area of speech transmission deals with efficient coding of the speech
signal allow speech / sound transmission at low transmission rates over
networks.
- The goal is to provide the receiver with the same speech/sound quality as was
generated at the sender side.
Some Techniques for Speech Transmission:
(1) Pulse Code Modulation:
A straight forward technique for digitizing an analog signal is pulse code
modulation. It meets the right quality demand stereo audio signals in the data rate
used for CD. Its rate is 176400 bytes/s.
(2) Source Encoding:

Figure: - Component of a speech transmission system using source encoding

In source encoding transmission depends on the original signal has certain characteristics that can
be exploited in compression.
(3) Recognition-Synthesis Method:

Figure: - Component of a recognition Synthesis for speech transmission

This method conducts a speech analysis and speech synthesis during reconstruction speech
elements are characterized by bits and transmitted over multimedia system. The data rate defines
the quality.
Example:
Calculate the file size in bytes for 60 second recording at 44.1 KHz, 8 bits resolution stereo
sound.
Sound Types and their number of Channel

DSP II - DVP - CDP 2pp
No ratings yet
DSP II - DVP - CDP 2pp
141 pages
SP - 3301PPT
No ratings yet
SP - 3301PPT
152 pages
Final PPT On Speech Processing
50% (2)
Final PPT On Speech Processing
20 pages
Speech Processing
No ratings yet
Speech Processing
70 pages
Unit I Introduction To Wireless Communication - Compressed
No ratings yet
Unit I Introduction To Wireless Communication - Compressed
41 pages
Question
100% (1)
Question
17 pages
Rapha Dauda Chapter One To Four - 034731
No ratings yet
Rapha Dauda Chapter One To Four - 034731
40 pages
Rapha Dauda One To Five - 043847
No ratings yet
Rapha Dauda One To Five - 043847
41 pages
Test 1
No ratings yet
Test 1
77 pages
U 4
No ratings yet
U 4
8 pages
L09 Dipole
No ratings yet
L09 Dipole
27 pages
Grapheme To Phoneme Rules For Text To Speech Synthesis in Malayalam 27 MARCH 17
100% (1)
Grapheme To Phoneme Rules For Text To Speech Synthesis in Malayalam 27 MARCH 17
7 pages
Introduction To Linguistics 14
No ratings yet
Introduction To Linguistics 14
27 pages
1709 07552 PDF
No ratings yet
1709 07552 PDF
138 pages
Human Computer Interfacing'
100% (1)
Human Computer Interfacing'
10 pages
Phonetics 2
No ratings yet
Phonetics 2
14 pages
Annexure-1 - Format
No ratings yet
Annexure-1 - Format
13 pages
TTS SRM Speech
No ratings yet
TTS SRM Speech
38 pages
Technical Proposal 1 Aug
No ratings yet
Technical Proposal 1 Aug
25 pages
SP Assign - 2
No ratings yet
SP Assign - 2
9 pages
Noc18 Ee27 Assignment6
No ratings yet
Noc18 Ee27 Assignment6
3 pages
Unit 5 Speech Processing
No ratings yet
Unit 5 Speech Processing
8 pages
Sony dsx-ms60 Ver-1.0 SM (ET)
No ratings yet
Sony dsx-ms60 Ver-1.0 SM (ET)
42 pages
Speech Synthesis
No ratings yet
Speech Synthesis
8 pages
Method To Study Speech Synthesis
No ratings yet
Method To Study Speech Synthesis
43 pages
Wireless Networking Syllabus
No ratings yet
Wireless Networking Syllabus
2 pages
Speech Perception and Its Disorders
No ratings yet
Speech Perception and Its Disorders
34 pages
Speech Processing: Review # (Or) Seminar #
No ratings yet
Speech Processing: Review # (Or) Seminar #
49 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
Unit 4 Ppttsa
No ratings yet
Unit 4 Ppttsa
19 pages
Speech and Audio Processing and Coding
No ratings yet
Speech and Audio Processing and Coding
52 pages
Microwave Devices and Systems: Introduction To Microwave Engineering
No ratings yet
Microwave Devices and Systems: Introduction To Microwave Engineering
66 pages
Major Project - I Final Submission Report: DSP Tools in Wireless Communication
No ratings yet
Major Project - I Final Submission Report: DSP Tools in Wireless Communication
36 pages
Henry 4K-2 RF Linear Amplifier
100% (1)
Henry 4K-2 RF Linear Amplifier
35 pages
Hecahesh PDF 1623690949
No ratings yet
Hecahesh PDF 1623690949
21 pages
Atoll 3.1.2 UMTS HSPA
No ratings yet
Atoll 3.1.2 UMTS HSPA
48 pages
Human-Robot Communication: Supervisor: Prof. Nejat Biomechantronics Lab Progress Report
No ratings yet
Human-Robot Communication: Supervisor: Prof. Nejat Biomechantronics Lab Progress Report
23 pages
Unit V Application
No ratings yet
Unit V Application
13 pages
Introduction To Digital Speech Processing
No ratings yet
Introduction To Digital Speech Processing
42 pages
Manuale Ricevente R3008SB
No ratings yet
Manuale Ricevente R3008SB
2 pages
Voice Cloning
No ratings yet
Voice Cloning
4 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Arib Std-T108 Test Report
No ratings yet
Arib Std-T108 Test Report
49 pages
Chapter-3: Theory of TTS
No ratings yet
Chapter-3: Theory of TTS
26 pages
EEE 6211 Digital Speech Processing: Course Instructor Dr. Mohammad Ariful Haque Professor, Dept. of EEE, BUET
No ratings yet
EEE 6211 Digital Speech Processing: Course Instructor Dr. Mohammad Ariful Haque Professor, Dept. of EEE, BUET
16 pages
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
No ratings yet
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
17 pages
Text-to-Speech (TTS) System
No ratings yet
Text-to-Speech (TTS) System
11 pages
PL Features Parameters
No ratings yet
PL Features Parameters
13 pages
Gen Script
No ratings yet
Gen Script
5 pages
Arabic Text To Speech Synthesizer
No ratings yet
Arabic Text To Speech Synthesizer
14 pages
L04 PDF
No ratings yet
L04 PDF
22 pages
Coimbatore Brief
No ratings yet
Coimbatore Brief
6 pages
Imp Tts
No ratings yet
Imp Tts
4 pages
pATCH ANTENNA
No ratings yet
pATCH ANTENNA
9 pages
Synopsis
No ratings yet
Synopsis
11 pages
Articles: Speech Synthesis 1 Prosody (Linguistics) 11 Tone (Linguistics) 13
No ratings yet
Articles: Speech Synthesis 1 Prosody (Linguistics) 11 Tone (Linguistics) 13
26 pages
HG3052 CourseOutline SpeechSynthesisRecognition AY2019-20 SEM1 Update Sep10
No ratings yet
HG3052 CourseOutline SpeechSynthesisRecognition AY2019-20 SEM1 Update Sep10
6 pages
ASTRIX 5 - Radio Planning Tools: White Paper Version 1.0
No ratings yet
ASTRIX 5 - Radio Planning Tools: White Paper Version 1.0
14 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Speechsynthesis
No ratings yet
Speechsynthesis
6 pages
Speech Synthesis
No ratings yet
Speech Synthesis
4 pages
Digital Speech Processing
No ratings yet
Digital Speech Processing
7 pages
Synthesis: Models of Speech
No ratings yet
Synthesis: Models of Speech
6 pages
Text To Speech Synthesis TTS
No ratings yet
Text To Speech Synthesis TTS
7 pages
Bhaashika: Telugu Tts System: Dr. K.V.N.Sunitha
No ratings yet
Bhaashika: Telugu Tts System: Dr. K.V.N.Sunitha
9 pages
Ijisr 15 139 02 PDF
No ratings yet
Ijisr 15 139 02 PDF
7 pages
Keller 01 Naturalness
No ratings yet
Keller 01 Naturalness
12 pages
The Main Principles of Text-to-Speech Synthesis System: January 2010
No ratings yet
The Main Principles of Text-to-Speech Synthesis System: January 2010
8 pages
Apache 6 - DS - en
No ratings yet
Apache 6 - DS - en
4 pages
Mass Media and Communication
No ratings yet
Mass Media and Communication
2 pages
Speech Recognition - Specific Task of Speech Recognition: Abstract
No ratings yet
Speech Recognition - Specific Task of Speech Recognition: Abstract
7 pages
Marathi Speech Synthesis A Review
No ratings yet
Marathi Speech Synthesis A Review
4 pages
Andrew Csh-6516a-Vt
No ratings yet
Andrew Csh-6516a-Vt
3 pages
Voice Digitization and On
No ratings yet
Voice Digitization and On
2 pages
The PC Interfaced Voice Recognition System Is To Implement A Password For Authentication
No ratings yet
The PC Interfaced Voice Recognition System Is To Implement A Password For Authentication
7 pages
Speech Signals Processing
No ratings yet
Speech Signals Processing
7 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
CV Hozefa Kapasi PDF
No ratings yet
CV Hozefa Kapasi PDF
4 pages
TTS Notes
No ratings yet
TTS Notes
3 pages
Text To Speech Conversion: Muhammad Amar (19L-1916)
No ratings yet
Text To Speech Conversion: Muhammad Amar (19L-1916)
4 pages
Channel Estimation Algorithms For OFDM Systems
No ratings yet
Channel Estimation Algorithms For OFDM Systems
6 pages
Antenna TEV
No ratings yet
Antenna TEV
2 pages
NTG-1 Instruction Manual: (Emc, LVD)
No ratings yet
NTG-1 Instruction Manual: (Emc, LVD)
8 pages
5G Mobility and Traffic Management Guideline: 2/154 43-LZA 701 6017 Uen E
100% (1)
5G Mobility and Traffic Management Guideline: 2/154 43-LZA 701 6017 Uen E
147 pages
Festival Hindi Pxc3893287
No ratings yet
Festival Hindi Pxc3893287
6 pages
Tdj-709018Dei-65F: Xpol 790 960Mhz 65° 17.5dbi 0° 8° Manual or by Optional Replaceable Rcu (Remote Control Unit) Antenna
No ratings yet
Tdj-709018Dei-65F: Xpol 790 960Mhz 65° 17.5dbi 0° 8° Manual or by Optional Replaceable Rcu (Remote Control Unit) Antenna
1 page
D M E DME: Istance Easuring Quipment
No ratings yet
D M E DME: Istance Easuring Quipment
17 pages
Question Paper Code:: (10×2 20 Marks)
100% (1)
Question Paper Code:: (10×2 20 Marks)
3 pages
Silent Speech Interface: Fundamentals and Applications
From Everand
Silent Speech Interface: Fundamentals and Applications
Fouad Sabry
No ratings yet
Manual R
100% (1)
Manual R
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 2 Sound or Audio System

Uploaded by

Unit 2 Sound or Audio System

Uploaded by

Unit 2

Prepared by: Er. Hemanta Bohora

Figure: - Component of a speech transmission system using source encoding

Figure: - Component of a recognition Synthesis for speech transmission

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.