0% found this document useful (0 votes)
178 views42 pages

Codec

This document discusses various speech coding techniques used for voice over IP (VoIP). It introduces efficient speech coding as using digital streams to represent voice at lower bandwidths, with lower quality at lower bandwidths. It describes common coding techniques like ADPCM, CELP, and LD-CELP that balance quality and bitrate. It also discusses measuring voice quality through Mean Opinion Scores and standards like G.711, G.723.1, and G.728 for different bitrate VoIP codecs.

Uploaded by

suntosh_14
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
178 views42 pages

Codec

This document discusses various speech coding techniques used for voice over IP (VoIP). It introduces efficient speech coding as using digital streams to represent voice at lower bandwidths, with lower quality at lower bandwidths. It describes common coding techniques like ADPCM, CELP, and LD-CELP that balance quality and bitrate. It also discusses measuring voice quality through Mean Opinion Scores and standards like G.711, G.723.1, and G.728 for different bitrate VoIP codecs.

Uploaded by

suntosh_14
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

Speech-Coding Techniques

Chapter 3
Introduction
 Efficient speech-coding techniques
 Advantages for VoIP
 Digital streams of ones and zeros
 The lower the bandwidth, the lower the quality
 RTP payload types
 Processing power
 The better quality (for a given bandwidth) uses a
more complex algorithm
 A balance between quality and cost

Internet Telephony 3-2


Voice Quality
 Bandwidth is easily quantified
 Voice quality is subjective
 MOS, Mean Opinion Score
 ITU-T Recommendation P.800
 Excellent – 5
 Good – 4
 Fair – 3
 Poor – 2
 Bad – 1
 A minimum of 30 people
 Listen to voice samples or in conversations

Internet Telephony 3-3


 P.800 recommendations
 The selection of participants
 The test environment
 Explanations to listeners
 Analysis of results
 Toll quality
 A MOS of 4.0 or higher

Internet Telephony 3-4


 Subjective and objective quality-testing
techniques
 PSQM – Perceptual Speech Quality
Measurement
 ITU-T P.861
 faithfully represent human judgement and
perception
 algorithmic comparison between the output signal
and a know input
 type of speaker, loudness, delay, active/silence
frames, clipping, environmental noise

Internet Telephony 3-5


A Little About Speech
 Speech
 Air pushed from the lungs past the vocal cords
and along the vocal tract
 The basic vibrations – vocal cords
 The sound is altered by the disposition of the
vocal tract ( tongue and mouth)
 Model the vocal tract as a filter
 The shape changes relatively slowly
 The vibrations at the vocal cords
 The excitation signal

Internet Telephony 3-6


Speech sounds
 Voiced sound
 The vocal cords vibrate open and close
 Interrupt the air flow
 Quasi-periodic pluses of air
 The rate of the opening and closing – the pitch
 A high degree of periodicity at the pitch period
 2-20 ms

Internet Telephony 3-7


 Voiced speech  Power spectrum density

Internet Telephony 3-8


 Unvoiced sounds
 Forcing air at high velocities through a constriction
 The glottis is held open
 Noise-like turbulence
 Show little long-term periodicity
 Short-term correlations still present

Internet Telephony 3-9


 unvoiced speech  Power spectrum density

Internet Telephony 3-10


 Plosive sounds
 A complete closure in the vocal tract
 Air pressure is built up and released suddenly
 A vast array of sounds
 The speech signal is relatively predictable over
time
 The reduction of transmission bandwidth can be
significant

Internet Telephony 3-11


Voice Sampling
 A-to-D
 discrete samples of the waveform and represent
each sample by some number of bits
 A signal can be reconstructed if it is sampled at a
minimum of twice the maximum freq.
 Human speech
 300-3800 Hz
 8000 samples per second

Each sample is encoded into


an 8-bit PCM code word
(e.g. 01100101)
time => 8000 x 8 bit/s
Internet Telephony 3-12
Quantization
 How many bits is used to represent
 Quantization noise
 The difference between the actual level of the
input analog signal
 More bits to reduce
 Diminishing returns
 Uniform quantization levels
 Louder talkers sound better
 11.2/11 v.s. 2.2/2

Internet Telephony 3-13


 Non-uniform quantization
 Smaller quantization steps at smaller signal levels
 Spread signal-to-noise ratio more evenly

Internet Telephony 3-14


DTX and Comfort Noise
 DTX is Discontinuous Transmission
 Voice activity detector (VAD) detects if there is
active speech or not.
 When there is no active speech different DTX
procedures can be used:
 No Transmission at all
 Comfort Noise (CN) using RFC 3389
 Codec built CN in like AMR SID (Silence Descriptor)
 Frequency of Comfort Noise packets varies but
is usually some fraction of normal packet rate

Internet Telephony 3-15


Type of Speech Coders
 Waveform codecs
 Sample and code
 High-quality and not complex
 Large amount of bandwidth
 source codecs (vocoders)
 Match the incoming signal to a math model
 Linear-predictive filter model of the vocal tract
 A voiced/unvoiced flag for the excitation
 The information is sent rather than the signal
 Low bit rates, but sounds synthetic
 Higher bit rates do not improve much

Internet Telephony 3-16


 Hybrid codecs
 Attempt to provide the best of both
 Perform a degree of waveform matching
 Utilize the sound production model
 Quite good quality at low bit rate

Internet Telephony 3-17


G.711
 The most commonplace codec
 Used in circuit-switched telephone network
 PCM, Pulse-Code Modulation
 If uniform quantization
 12 bits * 8 k/sec = 96 kbps
 Non-uniform quantization
 64 kbps DS0 rate
 mu-law
 North America
 A-law
 Other countries, a little friendlier to lower signal levels
 An MOS of about 4.3

Internet Telephony 3-18


DPCM
 DPCM, Differential PCM
 Only transmit the difference between the predicated value and
the actual value
 Voice changes relatively slowly
 It is possible to predict the value of a sample base on the
values of previous samples
 The receiver perform the same prediction
 The simplest form
 No prediction
 No algorithmic delay

Internet Telephony 3-19


ADPCM

 ADPCM, Adaptive DPCM


 Predicts sample values based on
 Past samples
 Factoring in some knowledge of how speech varies over
time
 The error is quantized and transmitted
 Fewer bits required
 G.721
 32 kbps
 G.726
 A-law/mu-law PCM -> 16, 24, 32, 40 kbps
 An MOS of about 4.0 at 32 kbps

Internet Telephony 3-20


Analysis-by-Synthesis (AbS) Codecs
 Hybrid codec
 Fill the gap between waveform and source codecs
 The most successful and commonly used
 Time-domain AbS codecs
 Not a simple two-state, voiced/unvoiced
 Different excitation signals are attempted
 Closest to the original waveform is selected
 MPE, Multi-Pulse Excited
 RPE, Regular-Pulse Excited
 CELP, Code-Excited Linear Predictive

Internet Telephony 3-21


G.728 LD-CELP
 CELP codecs
 A filter; its characteristics change over time
 A codebook of acoustic vectors
 A vector = a set of elements representing various char.
of the excitation
 Transmit
 Filter coefficients, gain, a pointer to the vector chosen
 Low Delay CELP
 Backward-adaptive coder
 Use previous samples to determine filter coefficients
 Operates on five samples at a time
 Delay < 1 ms
 Only the pointer is transmitted

Internet Telephony 3-22


 1024 vectors in the code book
 10-bit pointer (index)
 16 kbps
 LD-CELP encoder
 Minimize a frequency-weighted mean-square error

Internet Telephony 3-23


 LD-CELP decoder

 An MOS score of about 3.9


 One-quarter of G.711 bandwidth

Internet Telephony 3-24


G.723.1 ACELP
 6.3 or 5.3 kbps
 Both mandatory
 Can change from one to another during a
conversation
 The coder
 A band-limited input speech signal
 Sampled at 8 KHz, 16-bit uniform PCM quantization
 Operate on blocks of 240 samples at a time
 A look-ahead of 7.5 ms
 A total algorithmic delay of 37.5 ms + other delays
 A high-pass filter to remove any DC component

Internet Telephony 3-25


 Various operations to determine the appropriate
filter coefficients
 5.3 kbps, Algebraic Code-Excited Linear Prediction
 6.3 kbps, Multi-pulse Maximum Likelihood
Quantization
 The transmission
 Linear predication coefficients
 Gain parameters
 Excitation codebook index
 24-octet frames at 6.3 kbps, 20-octet frames at 5.3 kbps

Internet Telephony 3-26


 G.723.1 Annex A
 Silence Insertion Description (SID) frames of size
four octets
 The two lsbs of the first octet
 00 6.3kbps 24 octets/frame
 01 5.3kbps 20
 10 SID frame 4
 An MOS of about 3.8
 At least 27.5 ms delay

Internet Telephony 3-27


G.729
 8 kbps
 Input frames of 10 ms, 80 samples for 8 KHz
sampling rate
 5 ms look-ahead
 Algorithmic delay of 15 ms
 An 80-bit frame for 10 ms of speech
 A complex codec
 G.729.A (Annex A), a number of simplifications
 Same frame structure
 Encoder/decoder, G.729/G.729.A
 Slightly lower quality

Internet Telephony 3-28


 G.729.B
 VAD, Voice Activity Detection
 Based on analysis of several parameters of the input
 The current frames plus two preceding frames
 DTX, Discontinuous Transmission
 Send nothing or send an SID frame
 SID frame contains information to generate comfort
noise
 CNG, Comfort Noise Generation
 G.729, an MOS of about 4.0
 G.729A an MOS of about 3.7

Internet Telephony 3-29


 G.729 Annex D
 a lower-rate extension
 6.4 kbps; 10 ms speech samples, 64 bits/frame
 MOS  6.3 kbps G.723.1
 G.729 Annex E
 a higher bit rate enhancement
 the linear prediction filter of G.729 has 10 coef.
 that of G.729 Annex E has 30 coef.
 the codebook of G.729 has 35 bits
 that of G.729 Annex E has 44 bits
 118 bits/frame; 11.8 kbps

Internet Telephony 3-30


Other Codecs
 CDMA QCELP defined in IS-733
 Variable-rate coder
 Two most common rates
 The high rate, 13.3 kbps
 A lower rate, 6.2 kbps
 Silence suppression
 For use with RTP, RFC 2658

Internet Telephony 3-31


 GSM Enhanced Full-Rate (EFR)
 GSM 06.60
 An enhanced version of GSM Full-Rate
 ACELP-based codec
 The same bit rate and the same overall packing
structure
 12.2 kbps
 Support discontinuous transmission
 For use with RTP, RFC 1890

Internet Telephony 3-32


 GSM Adaptive Multi-Rate (AMR) codec
 20 ms coding delay
 Eight different modes
 4.75 kbps to 12.2 kbps
 12.2 kbps, GSM EFR
 7.4 kbps, IS-641 (TDMA cellular systems)
 Change the mode at any time
 Offer discontinuous transmission
 The SID (Silence Descriptor) is sent in every 8th frame
and is 5 bytes in size
 The coding choice of many 3G wireless networks

Internet Telephony 3-33


 The MOS values are for laboratory conditions
 G.711 does not deal with lost packets
 G.729 can accommodate a lost frame by
interpolating from previous frames
 But cause errors in subsequent speech frames
 Processing Power
 G.728 or G.729, 40 MIPS
 G.726 10 MIPS

Internet Telephony 3-34


iLBC
 a FREE codec for robust VoIP
 13.33 kbit/s with an encoding frame length of
30 ms and 15.20 kbps of 20 ms
 Computational complexity in a range of G.729A

Internet Telephony 3-35


Speex
 Open-source patent-free speech codec
 CELP (code-excited linear prediction) codec
 operating modes:
 narrowband (8 kHz sampling rate)
 2.15 – 24.6 kb/s
 delay of 30 ms
 wideband (16 kHz sampling rate)
 4-44.2 kb/s
 delay of 34 ms
 ultra-wideband (32 kHz sampling rate)
 intensity stereo encoding
 variable bit rate (VBR) possible
 voice activity detection (VAD)

Internet Telephony 3-36


 Cascaded Codecs
 E.g., G.711 stream -> G.729 encoder/decoder
 Might not even come close to G.729
 Each coder only generate an approximate of
the incoming signal
 Audio samples
 http://
www.cs.columbia.edu/~hgs/audio/codecs.html

Internet Telephony 3-37


Effects of packetization

Internet Telephony 3-38


Tones, Signal, and DTMF Digits
 The hybrid codecs are optimized for human
speech
 Other data may need to be transmitted
 Tones: fax tones, dialing tone, busy tone
 DTMF digits for two-stage dialing or voice-mail
 G.711 is OK
 G.723.1 and G.729 can be unintelligible
 The ingress gateway needs to intercept
 The tones and DTMF digits
 Use an external signaling system

Internet Telephony 3-39


 Easy at the start of a call
 Difficult in the middle of a call
 Encode the tones differently from the speech
 Send them along the same media path
 An RTP packet provides the name of the tone and the
duration
 Or, a dynamic RTP profile; an RTP packet containing the
frequency, volume and the duration
 RFC 2198
 An RTP payload format for redundant audio data
 Sending both types of RTP payload

Internet Telephony 3-40


 RTP Payload Format for DTMF Digits
 An Internet Draft
 Both methods described before
 A large number of tones and events
 DTMF digits, a busy tone, a congestion tone, a ringing
tone, etc.
 The named events
 E: the end of the tone, R: reserved

Internet Telephony 3-41


 Payload format

Internet Telephony 3-42

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy