Chapter 2
Chapter 2
Chapter 2
1
Basic concept of sound
• Sound is a physical phenomenon produced by the vibration of matter
and transmitted as waves.
• The perception of sound by human beings involves three systems:
• the source which emit sound.
• the medium through which the sound propagates.
• the detector which receives and interpret the sound.
• The simplest sound we can hear is sine wave.
2
fig: sine wave
Air Pressure
Amplitude
Time
One period
3
Attributes of sound
• period: the interval at which a periodic signals regularly.
• Pitch: A perception of sound by human beings.
• measures how 'high' is the sound as it is perceived by a listener.
• Frequency: it measures a physical property of a wave.
• it is the reciprocal value of a period f=1/p.
• the unit is Hertz( Hz) or Kilohertz (Khz).
• Amplitude: the amplitude of a sound is the measure of the displacement of
the air pressure wave from its mean state.
• it is also called loudness.
• Threshold of feeling: if the intensity of a sound is 1 watt/m2, we must start
to feel the sound.
• The ear may by damaged. This is known as threshold of feeling.
4
• Threshold of hearing: If the intensity is 10-12watt/m2, we may just able
to hear it.
• this is known as threshold of hearing.
• Bandwidth and Dynamic:
• Dynamic range means the change in sound levels.
• eg: large orchestra can reach 130 db at its climax and drop to as low as 30 db.
so range is 100db.
5
Computer Representation of Sound
• The continuous sound wave can not be directly represented in
computer .
• A computer Measure the amplitude of the waveform at regular time
interval to produce series of numbers. Each of these measurements is
a sample.
6
Sampling Rate
• 1- ab The rate at which a continuous waveform is sampled is called
sampling rate.
7
Quantization
• Also called resolution , a number of bits used to represent sample.
• Lower sound the quantization , lower the quality of sound.
8
MUSIC (MIDI Basic Concepts )
• MIDI is a set of specifications the using in building instruments so that the
instruments of different manufactures can, without difficulty, communicate
musical information between one another .
• MIDI- Music Instrument Digital Interface .
• MIDI interface has two different components : .
• Hardware
• which connects equipment
• Specifies MIDI cable.
• Deals with electronic signals
• stipulates MIDI pat is built in to an instrument
• Data format –
• Encodes the information travelling through the hardware.
• Encoding includes
• Instrument specifications.
• Notion of the beginning to end of a note.
• Basic Frequency - Sound volume.
9
MIDI Devices
• Musical instrument satisfying both component is called MIDI device .
eg, a synthesizer
• Components of synthesizers.
• Sound Generators.
• Micro processors
• keyboard
• control Panel
• Auxiliary Controller.
• Memory
10
Speech
• Speech signals have two properties which can be used in speech processing.
• Voiced speech signals show during certain time intervals almost periodic
behavior.
• The spectrum audio signals shows characteristics, maxima, which are mostly 3-5
frequency bands.
• Generated Speech must be Speech Generation understandable & most sound
natural
• speech output system could transfer text into speech automatcally without any
lengthy process.
11
Speech Generation
• basic notions.
• Phone: Smallest speech unit, such as m of mat and b of bat in English that
distinguishes one word from another in language.
• Speech can be generated by following ways
• a. Reproduced speech output:
• The easiest method of speech generation / output is to we prerecorded speech s play it back
in a timely fashion.
• Speech can be stored as PCM ( Pulse code Modulation Sample).
• Time - Dependent Sound Concatenation
• Speech Speech generation / output can also be by sound concatenation in a timely fashion.
speech units are composed like building blocks where composition con occur at the different.
• In the simplest case , the individual phones are understood as speech units.
12
r ^
K m r^
K
^m
r
fig: Phone sound fig: Diphone
concatenation concatenation
crumb
text
sentence
part
word
kr^m syllable kr^m
14
Frequency - Dependent Sound
Concatenation.
• Speech generation output can also be based on a frequency -dependent sound concatenation.
• Formants are frequency maxima in the spectrum of the speech signal.
• Formant synthesis simulates the vocal tract through fiter.
• The characteristics values are the filter's middle frequencies and their bandwidth Individual
speech elements (e.g phones) are defined through the characteristics values of the formants.
• The transitions ,known as co-articulation, present the most critical problem.
• additionally, the respective prosody has to be determined .
• The method used fur sound synthesize in order to simulate human speech is called linear
predictive coding (LPC) method .
• Using speech synthesis, an existent text can be transformed into an acoustic signal
• Figure shows the typical the system .
• formant: concentration of acoustic energy.
• co-articulation: idea that speech sound is affected by other speed sound oround it, and each
sound slightly damages according of its environment.
15
fig: components of a speech synthesis with
time-dependent sound concatenation
Letter to phone
rules and dictionary sound transfer
of expections
speech
text sound script
Transcription synthesis
16
• In the first step the transcription is pefromed where text is translated
into sound script.
• Most transcription method work with Letter. to -phone rules and a
Dictionary of Exceptions stored in a library .
• In the second step the sound script is translated into speech signal.
• Time for frequency dependent concatenation can follow
17
Speech Analysis
• speech analysis or input deals with the research areas shown in
below figure:
18
Speech Analysis
19
• Human speech has certain characteristics determined by a speaker.
• So speech analysis can server to analyze who is speaking i.e to
recognize a speaker for his/her identification and verification.
• Another task is to analyze what has been said i.e to recognize and
understand the speech signal itself.
• Another area of speech analysis tries to research speech patterns
with respect to how a certain statement was said.
• eg a speaker sentence sounds differently if a person is angry or happy.
20
fig: components of speech recognition and
understanding
sound pattern
syntax semantics
word models
understood
Acoustic syntactical semantic speech
and analysis analysis
phonetic
analysis
21
• The speech recognition and understanding system applies the
principle of " Data Reduction through extraction" several times as
follows.
• In the first step, the principle is applied to a sound pattern and / or word
model. An accoutical and phonetical analysis peformed.
• In the second step speech unit go through syntactical analysis. In this step the
errors in the previous unit can be recognized.
• In the third step semantics of the previously recognized language is dealed.
Here, the decision error of previous It can be recognized and corrected.
22
Problems in speech recognition
&understanding.
• Room acoustics with existent environmental noise.
• Word boundaries must be determined.
• For the comparison of the speech elements to the existing pattern ,
time normalization is necessary.
• The same word can be spoken quickly or slowly.
23
Speech Transmission
• The area of speech transmission deals with efficient coding of the
speech signal to allow speech/sound transmission at low transmission
rates over network.
• The goal is to provide the receiver with the same speech/sound
quality as was generated at sender side.
• This section includes some principles that are connected to speech
generation and recognition.
24
• Signal Form coding:
• this kind of coding considers no speech specific properties and parameters.
• here the goal is to achieve the most efficient coding of the audio signal.
• the data rate of a PCM-coded stereo audio signal with CD quality requirement is :
rate=2*44100/s *16/8
= 176400 bytes/s.
• Source coding:
• Parameterized system work with source coding algorithms.
• here, the speech specific characteristics are used for data reduction.
• channel vo-coder is an example of such a parameterized system.
• the channel vo-coder is an extension of a sub-channel coding.
• the signal is divided into a set of frequency channels during speech analysis because
only certain frequency maxima are relevant to speech.
25
fig: source coding in parameterized system: components of
speech transmission system.
Speech
analysis
coded speech signal
Reconstruction
26
• Recognition/synthesis method
• there have been attempts to reduce the transmission rate using pure
recognition/synthesis methods.
• Speech analysis (recognition) follows on the sender side of a speech
transmission system and speech synthesis (generation) on the receiver side.
27
fig: Recognition/ synthesis system components of
speech transmission system
Speech Synthesis
Analog Speech signal
28
• Achieved quality:
• the essential question regarding speech and audio transmission with respect
to multimedia system is how to achieve minimal data rate for a given quality.
29