0% found this document useful (0 votes)
27 views29 pages

Chapter 2

multimedia

Uploaded by

aaa9800000786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views29 pages

Chapter 2

multimedia

Uploaded by

aaa9800000786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Sound/ Audio System

Chapter 2

1
Basic concept of sound
• Sound is a physical phenomenon produced by the vibration of matter
and transmitted as waves.
• The perception of sound by human beings involves three systems:
• the source which emit sound.
• the medium through which the sound propagates.
• the detector which receives and interpret the sound.
• The simplest sound we can hear is sine wave.

2
fig: sine wave

Air Pressure

Amplitude

Time

One period
3
Attributes of sound
• period: the interval at which a periodic signals regularly.
• Pitch: A perception of sound by human beings.
• measures how 'high' is the sound as it is perceived by a listener.
• Frequency: it measures a physical property of a wave.
• it is the reciprocal value of a period f=1/p.
• the unit is Hertz( Hz) or Kilohertz (Khz).
• Amplitude: the amplitude of a sound is the measure of the displacement of
the air pressure wave from its mean state.
• it is also called loudness.
• Threshold of feeling: if the intensity of a sound is 1 watt/m2, we must start
to feel the sound.
• The ear may by damaged. This is known as threshold of feeling.

4
• Threshold of hearing: If the intensity is 10-12watt/m2, we may just able
to hear it.
• this is known as threshold of hearing.
• Bandwidth and Dynamic:
• Dynamic range means the change in sound levels.
• eg: large orchestra can reach 130 db at its climax and drop to as low as 30 db.
so range is 100db.

• bandwidth is range of frequencies a device can produce or a human can hear.

5
Computer Representation of Sound
• The continuous sound wave can not be directly represented in
computer .
• A computer Measure the amplitude of the waveform at regular time
interval to produce series of numbers. Each of these measurements is
a sample.

6
Sampling Rate
• 1- ab The rate at which a continuous waveform is sampled is called
sampling rate.

7
Quantization
• Also called resolution , a number of bits used to represent sample.
• Lower sound the quantization , lower the quality of sound.

8
MUSIC (MIDI Basic Concepts )
• MIDI is a set of specifications the using in building instruments so that the
instruments of different manufactures can, without difficulty, communicate
musical information between one another .
• MIDI- Music Instrument Digital Interface .
• MIDI interface has two different components : .
• Hardware
• which connects equipment
• Specifies MIDI cable.
• Deals with electronic signals
• stipulates MIDI pat is built in to an instrument
• Data format –
• Encodes the information travelling through the hardware.
• Encoding includes
• Instrument specifications.
• Notion of the beginning to end of a note.
• Basic Frequency - Sound volume.

9
MIDI Devices
• Musical instrument satisfying both component is called MIDI device .
eg, a synthesizer
• Components of synthesizers.
• Sound Generators.
• Micro processors
• keyboard
• control Panel
• Auxiliary Controller.
• Memory

10
Speech
• Speech signals have two properties which can be used in speech processing.
• Voiced speech signals show during certain time intervals almost periodic
behavior.
• The spectrum audio signals shows characteristics, maxima, which are mostly 3-5
frequency bands.
• Generated Speech must be Speech Generation understandable & most sound
natural
• speech output system could transfer text into speech automatcally without any
lengthy process.

11
Speech Generation
• basic notions.
• Phone: Smallest speech unit, such as m of mat and b of bat in English that
distinguishes one word from another in language.
• Speech can be generated by following ways
• a. Reproduced speech output:
• The easiest method of speech generation / output is to we prerecorded speech s play it back
in a timely fashion.
• Speech can be stored as PCM ( Pulse code Modulation Sample).
• Time - Dependent Sound Concatenation
• Speech Speech generation / output can also be by sound concatenation in a timely fashion.
speech units are composed like building blocks where composition con occur at the different.
• In the simplest case , the individual phones are understood as speech units.

12
r ^
K m r^

K
^m

r
fig: Phone sound fig: Diphone
concatenation concatenation

crumb
text
sentence
part
word
kr^m syllable kr^m

fig: Syllable sound vowels constants


constants

fig: Word Sound


Concatenation 13
• The phone sound concatenation shows the problem transition between individual phone .
• This is called co-articulation which is the mutual sound influence throughout several sounds.
• To overcome this problem , diphone is considered.
• Two phones constitute a diphone.
• second figure shows word crumb which consists of ordered set of diphones.
• In the next level, speech is generated through syllables to make transition problem easier.
• Third figure shows syllable sound of the wad crumb .
• The best pronunciation of a wad is achieved through storage of whole word .
• This leads towards synthesis by the speech sequence as shown in last figure.
• Additionally, prosody should be considered during speech generation /output.
• Prosody means the stress & melody course .
• example pronunciation of question strongly differs from a statement

14
Frequency - Dependent Sound
Concatenation.
• Speech generation output can also be based on a frequency -dependent sound concatenation.
• Formants are frequency maxima in the spectrum of the speech signal.
• Formant synthesis simulates the vocal tract through fiter.
• The characteristics values are the filter's middle frequencies and their bandwidth Individual
speech elements (e.g phones) are defined through the characteristics values of the formants.
• The transitions ,known as co-articulation, present the most critical problem.
• additionally, the respective prosody has to be determined .
• The method used fur sound synthesize in order to simulate human speech is called linear
predictive coding (LPC) method .
• Using speech synthesis, an existent text can be transformed into an acoustic signal
• Figure shows the typical the system .
• formant: concentration of acoustic energy.
• co-articulation: idea that speech sound is affected by other speed sound oround it, and each
sound slightly damages according of its environment.

15
fig: components of a speech synthesis with
time-dependent sound concatenation

Letter to phone
rules and dictionary sound transfer
of expections

speech
text sound script
Transcription synthesis

16
• In the first step the transcription is pefromed where text is translated
into sound script.
• Most transcription method work with Letter. to -phone rules and a
Dictionary of Exceptions stored in a library .
• In the second step the sound script is translated into speech signal.
• Time for frequency dependent concatenation can follow

17
Speech Analysis
• speech analysis or input deals with the research areas shown in
below figure:

18
Speech Analysis

WHO? WHAT? HOW?

verification Identification Recognition understands

19
• Human speech has certain characteristics determined by a speaker.
• So speech analysis can server to analyze who is speaking i.e to
recognize a speaker for his/her identification and verification.
• Another task is to analyze what has been said i.e to recognize and
understand the speech signal itself.
• Another area of speech analysis tries to research speech patterns
with respect to how a certain statement was said.
• eg a speaker sentence sounds differently if a person is angry or happy.

20
fig: components of speech recognition and
understanding
sound pattern
syntax semantics
word models

understood
Acoustic syntactical semantic speech
and analysis analysis
phonetic
analysis

21
• The speech recognition and understanding system applies the
principle of " Data Reduction through extraction" several times as
follows.
• In the first step, the principle is applied to a sound pattern and / or word
model. An accoutical and phonetical analysis peformed.
• In the second step speech unit go through syntactical analysis. In this step the
errors in the previous unit can be recognized.
• In the third step semantics of the previously recognized language is dealed.
Here, the decision error of previous It can be recognized and corrected.

22
Problems in speech recognition
&understanding.
• Room acoustics with existent environmental noise.
• Word boundaries must be determined.
• For the comparison of the speech elements to the existing pattern ,
time normalization is necessary.
• The same word can be spoken quickly or slowly.

23
Speech Transmission
• The area of speech transmission deals with efficient coding of the
speech signal to allow speech/sound transmission at low transmission
rates over network.
• The goal is to provide the receiver with the same speech/sound
quality as was generated at sender side.
• This section includes some principles that are connected to speech
generation and recognition.

24
• Signal Form coding:
• this kind of coding considers no speech specific properties and parameters.
• here the goal is to achieve the most efficient coding of the audio signal.
• the data rate of a PCM-coded stereo audio signal with CD quality requirement is :
rate=2*44100/s *16/8
= 176400 bytes/s.
• Source coding:
• Parameterized system work with source coding algorithms.
• here, the speech specific characteristics are used for data reduction.
• channel vo-coder is an example of such a parameterized system.
• the channel vo-coder is an extension of a sub-channel coding.
• the signal is divided into a set of frequency channels during speech analysis because
only certain frequency maxima are relevant to speech.

25
fig: source coding in parameterized system: components of
speech transmission system.

analog speech signal


A/D

Speech
analysis
coded speech signal

Reconstruction

analog speech signal


D/A

26
• Recognition/synthesis method
• there have been attempts to reduce the transmission rate using pure
recognition/synthesis methods.
• Speech analysis (recognition) follows on the sender side of a speech
transmission system and speech synthesis (generation) on the receiver side.

27
fig: Recognition/ synthesis system components of
speech transmission system

Analog Speech signal


Speech Recognition

Coded Speech signal

Speech Synthesis
Analog Speech signal

28
• Achieved quality:
• the essential question regarding speech and audio transmission with respect
to multimedia system is how to achieve minimal data rate for a given quality.

29

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy