0% found this document useful (0 votes)

27 views29 pages

Chapter 2

multimedia

Uploaded by

aaa9800000786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views29 pages

Chapter 2

multimedia

Uploaded by

aaa9800000786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Sound/ Audio System

Chapter 2

1
Basic concept of sound
• Sound is a physical phenomenon produced by the vibration of matter
and transmitted as waves.
• The perception of sound by human beings involves three systems:
• the source which emit sound.
• the medium through which the sound propagates.
• the detector which receives and interpret the sound.
• The simplest sound we can hear is sine wave.

2
fig: sine wave

Air Pressure

Amplitude

Time

One period
3
Attributes of sound
• period: the interval at which a periodic signals regularly.
• Pitch: A perception of sound by human beings.
• measures how 'high' is the sound as it is perceived by a listener.
• Frequency: it measures a physical property of a wave.
• it is the reciprocal value of a period f=1/p.
• the unit is Hertz( Hz) or Kilohertz (Khz).
• Amplitude: the amplitude of a sound is the measure of the displacement of
the air pressure wave from its mean state.
• it is also called loudness.
• Threshold of feeling: if the intensity of a sound is 1 watt/m2, we must start
to feel the sound.
• The ear may by damaged. This is known as threshold of feeling.

4
• Threshold of hearing: If the intensity is 10-12watt/m2, we may just able
to hear it.
• this is known as threshold of hearing.
• Bandwidth and Dynamic:
• Dynamic range means the change in sound levels.
• eg: large orchestra can reach 130 db at its climax and drop to as low as 30 db.
so range is 100db.

• bandwidth is range of frequencies a device can produce or a human can hear.

5
Computer Representation of Sound
• The continuous sound wave can not be directly represented in
computer .
• A computer Measure the amplitude of the waveform at regular time
interval to produce series of numbers. Each of these measurements is
a sample.

6
Sampling Rate
• 1- ab The rate at which a continuous waveform is sampled is called
sampling rate.

7
Quantization
• Also called resolution , a number of bits used to represent sample.
• Lower sound the quantization , lower the quality of sound.

8
MUSIC (MIDI Basic Concepts )
• MIDI is a set of specifications the using in building instruments so that the
instruments of different manufactures can, without difficulty, communicate
musical information between one another .
• MIDI- Music Instrument Digital Interface .
• MIDI interface has two different components : .
• Hardware
• which connects equipment
• Specifies MIDI cable.
• Deals with electronic signals
• stipulates MIDI pat is built in to an instrument
• Data format –
• Encodes the information travelling through the hardware.
• Encoding includes
• Instrument specifications.
• Notion of the beginning to end of a note.
• Basic Frequency - Sound volume.

9
MIDI Devices
• Musical instrument satisfying both component is called MIDI device .
eg, a synthesizer
• Components of synthesizers.
• Sound Generators.
• Micro processors
• keyboard
• control Panel
• Auxiliary Controller.
• Memory

10
Speech
• Speech signals have two properties which can be used in speech processing.
• Voiced speech signals show during certain time intervals almost periodic
behavior.
• The spectrum audio signals shows characteristics, maxima, which are mostly 3-5
frequency bands.
• Generated Speech must be Speech Generation understandable & most sound
natural
• speech output system could transfer text into speech automatcally without any
lengthy process.

11
Speech Generation
• basic notions.
• Phone: Smallest speech unit, such as m of mat and b of bat in English that
distinguishes one word from another in language.
• Speech can be generated by following ways
• a. Reproduced speech output:
• The easiest method of speech generation / output is to we prerecorded speech s play it back
in a timely fashion.
• Speech can be stored as PCM ( Pulse code Modulation Sample).
• Time - Dependent Sound Concatenation
• Speech Speech generation / output can also be by sound concatenation in a timely fashion.
speech units are composed like building blocks where composition con occur at the different.
• In the simplest case , the individual phones are understood as speech units.

12
r ^
K m r^

K
^m

r
fig: Phone sound fig: Diphone
concatenation concatenation

crumb
text
sentence
part
word
kr^m syllable kr^m

fig: Syllable sound vowels constants

constants

fig: Word Sound

Concatenation 13
• The phone sound concatenation shows the problem transition between individual phone .
• This is called co-articulation which is the mutual sound influence throughout several sounds.
• To overcome this problem , diphone is considered.
• Two phones constitute a diphone.
• second figure shows word crumb which consists of ordered set of diphones.
• In the next level, speech is generated through syllables to make transition problem easier.
• Third figure shows syllable sound of the wad crumb .
• The best pronunciation of a wad is achieved through storage of whole word .
• This leads towards synthesis by the speech sequence as shown in last figure.
• Additionally, prosody should be considered during speech generation /output.
• Prosody means the stress & melody course .
• example pronunciation of question strongly differs from a statement

14
Frequency - Dependent Sound
Concatenation.
• Speech generation output can also be based on a frequency -dependent sound concatenation.
• Formants are frequency maxima in the spectrum of the speech signal.
• Formant synthesis simulates the vocal tract through fiter.
• The characteristics values are the filter's middle frequencies and their bandwidth Individual
speech elements (e.g phones) are defined through the characteristics values of the formants.
• The transitions ,known as co-articulation, present the most critical problem.
• additionally, the respective prosody has to be determined .
• The method used fur sound synthesize in order to simulate human speech is called linear
predictive coding (LPC) method .
• Using speech synthesis, an existent text can be transformed into an acoustic signal
• Figure shows the typical the system .
• formant: concentration of acoustic energy.
• co-articulation: idea that speech sound is affected by other speed sound oround it, and each
sound slightly damages according of its environment.

15
fig: components of a speech synthesis with
time-dependent sound concatenation

Letter to phone
rules and dictionary sound transfer
of expections

speech
text sound script
Transcription synthesis

16
• In the first step the transcription is pefromed where text is translated
into sound script.
• Most transcription method work with Letter. to -phone rules and a
Dictionary of Exceptions stored in a library .
• In the second step the sound script is translated into speech signal.
• Time for frequency dependent concatenation can follow

17
Speech Analysis
• speech analysis or input deals with the research areas shown in
below figure:

18
Speech Analysis

WHO? WHAT? HOW?

verification Identification Recognition understands

19
• Human speech has certain characteristics determined by a speaker.
• So speech analysis can server to analyze who is speaking i.e to
recognize a speaker for his/her identification and verification.
• Another task is to analyze what has been said i.e to recognize and
understand the speech signal itself.
• Another area of speech analysis tries to research speech patterns
with respect to how a certain statement was said.
• eg a speaker sentence sounds differently if a person is angry or happy.

20
fig: components of speech recognition and
understanding
sound pattern
syntax semantics
word models

understood
Acoustic syntactical semantic speech
and analysis analysis
phonetic
analysis

21
• The speech recognition and understanding system applies the
principle of " Data Reduction through extraction" several times as
follows.
• In the first step, the principle is applied to a sound pattern and / or word
model. An accoutical and phonetical analysis peformed.
• In the second step speech unit go through syntactical analysis. In this step the
errors in the previous unit can be recognized.
• In the third step semantics of the previously recognized language is dealed.
Here, the decision error of previous It can be recognized and corrected.

22
Problems in speech recognition
&understanding.
• Room acoustics with existent environmental noise.
• Word boundaries must be determined.
• For the comparison of the speech elements to the existing pattern ,
time normalization is necessary.
• The same word can be spoken quickly or slowly.

23
Speech Transmission
• The area of speech transmission deals with efficient coding of the
speech signal to allow speech/sound transmission at low transmission
rates over network.
• The goal is to provide the receiver with the same speech/sound
quality as was generated at sender side.
• This section includes some principles that are connected to speech
generation and recognition.

24
• Signal Form coding:
• this kind of coding considers no speech specific properties and parameters.
• here the goal is to achieve the most efficient coding of the audio signal.
• the data rate of a PCM-coded stereo audio signal with CD quality requirement is :
rate=2*44100/s *16/8
= 176400 bytes/s.
• Source coding:
• Parameterized system work with source coding algorithms.
• here, the speech specific characteristics are used for data reduction.
• channel vo-coder is an example of such a parameterized system.
• the channel vo-coder is an extension of a sub-channel coding.
• the signal is divided into a set of frequency channels during speech analysis because
only certain frequency maxima are relevant to speech.

25
fig: source coding in parameterized system: components of
speech transmission system.

analog speech signal

A/D

Speech
analysis
coded speech signal

Reconstruction

analog speech signal

D/A

26
• Recognition/synthesis method
• there have been attempts to reduce the transmission rate using pure
recognition/synthesis methods.
• Speech analysis (recognition) follows on the sender side of a speech
transmission system and speech synthesis (generation) on the receiver side.

27
fig: Recognition/ synthesis system components of
speech transmission system

Analog Speech signal

Speech Recognition

Coded Speech signal

Speech Synthesis
Analog Speech signal

28
• Achieved quality:
• the essential question regarding speech and audio transmission with respect
to multimedia system is how to achieve minimal data rate for a given quality.

Text To Speech Conversion
50% (2)
Text To Speech Conversion
13 pages
14ec3029 Speech and Audio Signal Processing
No ratings yet
14ec3029 Speech and Audio Signal Processing
30 pages
Speech Signals Processing
No ratings yet
Speech Signals Processing
7 pages
Software Requirement and Design Specification (SRDS)
No ratings yet
Software Requirement and Design Specification (SRDS)
21 pages
Unit 2 Sound or Audio System
No ratings yet
Unit 2 Sound or Audio System
29 pages
Introduction To Digital Speech Processing
No ratings yet
Introduction To Digital Speech Processing
42 pages
Digital Speech Processing
No ratings yet
Digital Speech Processing
7 pages
Mul c2
No ratings yet
Mul c2
86 pages
Speech Signal Analysis and Coding: Dr. Arun Kumar
No ratings yet
Speech Signal Analysis and Coding: Dr. Arun Kumar
52 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
Feature Extraction Using PCA
No ratings yet
Feature Extraction Using PCA
36 pages
Speech Recognition UTHM
No ratings yet
Speech Recognition UTHM
30 pages
HG3052 CourseOutline SpeechSynthesisRecognition AY2019-20 SEM1 Update Sep10
No ratings yet
HG3052 CourseOutline SpeechSynthesisRecognition AY2019-20 SEM1 Update Sep10
6 pages
Speech and Audio Processing and Coding
No ratings yet
Speech and Audio Processing and Coding
52 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Chapter 1: Introduction To Audio Signal Processing: KH Wong
100% (1)
Chapter 1: Introduction To Audio Signal Processing: KH Wong
55 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Digital Speech Processing
No ratings yet
Digital Speech Processing
46 pages
Keller 01 Naturalness
No ratings yet
Keller 01 Naturalness
12 pages
Synthesis: Models of Speech
No ratings yet
Synthesis: Models of Speech
6 pages
PCP Notes Speech Processing Jan08
No ratings yet
PCP Notes Speech Processing Jan08
35 pages
Introduction To Multimedia. Analog-Digital Representation
100% (1)
Introduction To Multimedia. Analog-Digital Representation
29 pages
Test 1
No ratings yet
Test 1
77 pages
Unit 4 Ppttsa
No ratings yet
Unit 4 Ppttsa
19 pages
Speech and Audio Coding
No ratings yet
Speech and Audio Coding
16 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
34 pages
Audio and Audio Compression
No ratings yet
Audio and Audio Compression
27 pages
Human Speech Communication
No ratings yet
Human Speech Communication
44 pages
Text, Speech and Phono
No ratings yet
Text, Speech and Phono
2 pages
Speech Recognition1
100% (1)
Speech Recognition1
39 pages
Speechsynthesis
No ratings yet
Speechsynthesis
6 pages
TTS Notes
No ratings yet
TTS Notes
3 pages
Unit V Application
No ratings yet
Unit V Application
13 pages
Speech Recognition, Digitization, Generation
100% (6)
Speech Recognition, Digitization, Generation
12 pages
Text To Speech Synthesis TTS
No ratings yet
Text To Speech Synthesis TTS
7 pages
UNIT - 4 - Speech Coding in GSM
No ratings yet
UNIT - 4 - Speech Coding in GSM
13 pages
Speech Processing in Multimedia
No ratings yet
Speech Processing in Multimedia
23 pages
The Diagram Outlines The Key Steps Involved in Co
No ratings yet
The Diagram Outlines The Key Steps Involved in Co
20 pages
Applications PDF
No ratings yet
Applications PDF
32 pages
SP Assign - 2
No ratings yet
SP Assign - 2
9 pages
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
No ratings yet
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
7 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Audio Compression
No ratings yet
Audio Compression
53 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
29 pages
Mult 6 Sound Audio
No ratings yet
Mult 6 Sound Audio
29 pages
Ost PROJECT
No ratings yet
Ost PROJECT
8 pages
Method To Study Speech Synthesis
No ratings yet
Method To Study Speech Synthesis
43 pages
Speech Coding
100% (3)
Speech Coding
36 pages
Codec 2
No ratings yet
Codec 2
30 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
30 pages
4 Chapter Audio and Video Compression
No ratings yet
4 Chapter Audio and Video Compression
122 pages
Dolby Digital
100% (2)
Dolby Digital
85 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Introduction (UCS749)
No ratings yet
Introduction (UCS749)
72 pages
Audio-Video I: P Kadebu
No ratings yet
Audio-Video I: P Kadebu
50 pages
Speech Conductor
No ratings yet
Speech Conductor
26 pages
Final PPT On Speech Processing
50% (2)
Final PPT On Speech Processing
20 pages
Chapter 2 SOUND AUDIO Systems
No ratings yet
Chapter 2 SOUND AUDIO Systems
58 pages
Sound Design and Mixing in Reason
From Everand
Sound Design and Mixing in Reason
Andrew Eisele
3/5 (2)
The Impulse Response Bible
From Everand
The Impulse Response Bible
Past To Future
No ratings yet
So,You Want To Be An Audio Engineer: A Complete Beginners Guide For Selecting Audio Gear
From Everand
So,You Want To Be An Audio Engineer: A Complete Beginners Guide For Selecting Audio Gear
Kevin Parker
5/5 (1)
Literature Review On Voice Morphing
100% (1)
Literature Review On Voice Morphing
8 pages
A Survey On Voice Cloning and Automated Video Dubbing Systems
No ratings yet
A Survey On Voice Cloning and Automated Video Dubbing Systems
5 pages
Design of Smart Mirror Based On Raspberry Pi
No ratings yet
Design of Smart Mirror Based On Raspberry Pi
5 pages
Assistive Technologies
No ratings yet
Assistive Technologies
12 pages
NAO Technical Brochure
No ratings yet
NAO Technical Brochure
18 pages
Virtual Assistance Project Brief
No ratings yet
Virtual Assistance Project Brief
8 pages
Detection of Fake AudioA Deep
No ratings yet
Detection of Fake AudioA Deep
11 pages
MAD - Ch5 & 6 Notes
No ratings yet
MAD - Ch5 & 6 Notes
85 pages
Ai Tools
No ratings yet
Ai Tools
31 pages
Automatic Visual To Tactile Translation, Part I - Human Factors, Access Methods and Image Manipulation
No ratings yet
Automatic Visual To Tactile Translation, Part I - Human Factors, Access Methods and Image Manipulation
16 pages
Murray 1993 Toward The Simulation of Emotion in
No ratings yet
Murray 1993 Toward The Simulation of Emotion in
13 pages
F5-TTS A Fairytaler That Fakes Fluent and Faithful Speech With Flow Matching 2410.06885v1
No ratings yet
F5-TTS A Fairytaler That Fakes Fluent and Faithful Speech With Flow Matching 2410.06885v1
18 pages
B.M.S. College of Engineering: Department of Machine Learning
No ratings yet
B.M.S. College of Engineering: Department of Machine Learning
27 pages
FPP CV
No ratings yet
FPP CV
21 pages
Data Lex 4ai Website Content
No ratings yet
Data Lex 4ai Website Content
13 pages
Language Technology Programme For Icelandic 2024 2026 Web
No ratings yet
Language Technology Programme For Icelandic 2024 2026 Web
50 pages
ASR For L2 Japanese
No ratings yet
ASR For L2 Japanese
17 pages
Towards An AI To Win Ghana's National Science and Maths Quiz
No ratings yet
Towards An AI To Win Ghana's National Science and Maths Quiz
8 pages
1 TRA Project Sheet BSE 3 A Project List File Fall 2024
No ratings yet
1 TRA Project Sheet BSE 3 A Project List File Fall 2024
33 pages
Synthetic Speech Detection Through Short Term and Long-Term Prediction Traces
100% (1)
Synthetic Speech Detection Through Short Term and Long-Term Prediction Traces
14 pages
Smart Mailing System For Blind People
No ratings yet
Smart Mailing System For Blind People
5 pages
Genesys PureEngage Solution Overview
No ratings yet
Genesys PureEngage Solution Overview
69 pages
FORM 3 Output Devices Test
No ratings yet
FORM 3 Output Devices Test
2 pages
Backend Project Ideas
No ratings yet
Backend Project Ideas
33 pages
High-Quality Text-To-Speech Synthesis: An Overview
No ratings yet
High-Quality Text-To-Speech Synthesis: An Overview
21 pages
Thesis About Voice Command
100% (2)
Thesis About Voice Command
8 pages
SYGIC User Manual 18.1 IOS 2624 - 16780 - en - 1563895372 PDF
No ratings yet
SYGIC User Manual 18.1 IOS 2624 - 16780 - en - 1563895372 PDF
64 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Sound/ Audio System

• bandwidth is range of frequencies a device can produce or a human can hear.

fig: Syllable sound vowels constants

fig: Word Sound

WHO? WHAT? HOW?

verification Identification Recognition understands

analog speech signal

analog speech signal

Analog Speech signal

Coded Speech signal

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.