0% found this document useful (0 votes)
50 views25 pages

MP3 Format

MPEG audio standards such as MP3 use lossy compression to reduce audio file sizes. MPEG-1 Audio Layer III (MP3) became popular for storing audio on CDs and sharing over the internet. MPEG-4 Audio (AAC) improved on MP3 with support for higher sample rates and bitrates, simpler encoding, and better compression efficiency. Both utilize psychoacoustic modeling to remove inaudible parts of audio according to human perception.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views25 pages

MP3 Format

MPEG audio standards such as MP3 use lossy compression to reduce audio file sizes. MPEG-1 Audio Layer III (MP3) became popular for storing audio on CDs and sharing over the internet. MPEG-4 Audio (AAC) improved on MP3 with support for higher sample rates and bitrates, simpler encoding, and better compression efficiency. Both utilize psychoacoustic modeling to remove inaudible parts of audio according to human perception.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

MP3 and MP4 Audio

MPEG-1 Audio (Moving Picture


Expert Group=MPEG)
 Lossy compression of audio
 In late 1980’s ISO’s MPEG group started
to standardize
 TV broadcasting
 Use of Audio on CD-ROM (later DVD)

 MPEG-1 Audio – 1992


 MPEG-2 Audio - 1994
 MPEG-1 Audio Layer I, II, III
MPEG-1 Audio Layer II
 Called MP2
 Dominant standard for audio broadcasting
 DAB digital radio and DVB digital television
 Came out of MUSICAM codecs with bit rates
64-196 kbps
 MUSICAM audio coding - basis for MPEG-1 and MPEG-2 audio
 Sampling rates: 32, 44.1, 48 kHz
 Bit rates: 32, 48, 56, 64, 80, 96, … 384 kbps
 Format: mono, stereo, dual channel, …
 MP2 – sub-band audio encoder in time domain
MPEG-1 Audio Layer III
 MPEG-1 Layer III is called MP3 format
 Popular for PC and Internet applications
 Goal to compress to 128 kbps, but can be
compressed to higher or lower resulting
quality
 Utilization of psychoacoustics
 Scientific study of sound perception .
MPEG Audio – MP3
 First psychoacoustic masking code was
proposed in 1979 in AT&T – Bell Labs,
Murray Hill.
 MP3 based on OCFD (optimum coding in
frequency domain) and PXFM (Perceptual
transform coding)
 MPEG-1 Audio Layer III – public release
1993
 MPEG-2 Audio III – public release 1995
MPEG-1 Audio Encoding
 Characteristics
 Precision16 bits
 Sampling frequency: 32KHz, 44.1 KHz, 48
KHz
 3 compression layers: Layer 1, Layer 2, Layer
3 (MP3)
 Layer 3: 32-320 kbps, target 64 kbps
 Layer 2: 32-384 kbps, target 128 kbps

 Layer 1: 32-448 kbps, target 192 kbps


MPEG Audio Encoding Steps

CS 414 - Spring 2012


MPEG Audio Filter Bank
 Filter bank divides input into multiple sub-bands
(32 equal frequency sub-bands)
 Sub-band i defined as
7 7
(2i  1)( k  16)
St[i]   3 cos( * (C[k  64 j ] * x[k  64 j ]
k 0 j 0 64

 i  [0,31], St[i ] - filter output sample for sub-band


i at time t, C[n] – one of 512 coefficients, x[n] –
audio input sample from 512 sample buffer
MPEG Audio Psycho-acoustic
Model
 MPEG audio compresses by removing
acoustically irrelevant parts of audio signals
 Takes advantage of human auditory systems
inability to hear quantization noise under
auditory masking
 Auditory masking: occurs when ever the
presence of a strong audio signal makes a
temporal or spectral neighborhood of weaker
audio signals imperceptible.
CS 414 - Spring 2012
Loudness and Pitch
(Review on Psychoacoustic Effects)

 More sensitive to loudness at mid


frequencies than at other frequencies
 intermediate
frequencies at [500hz, 5000hz]
 Human hearing frequencies at [20hz,20000hz]

 Perceived loudness of a sound changes


based on frequency of that sound
 basilarmembrane reacts more to intermediate
frequencies than other frequencies
Fletcher-Munson Contours

Each contour represents an equal perceived sound


Perception sensitivity (loudness) is not linear across all frequencies and intensities
Masking Effects
(Review of Psychoacoustic Effects)

Frequency masking

Temporal masking
MPEG/audio divides audio signal into frequency sub-bands that approximate critical
bands. Then we quantize each sub-band according to the audibility of quantization
noise within the band
MPEG Audio Bit Allocation
 This process determines number of code bits allocated to
each sub-band based on information from the psycho-
acoustic model
 Algorithm:
1. Compute mask-to-noise ratio: MNR=SNR-SMR
 Standard provides tables that give estimates for SNR resulting
from quantizing to a given number of quantizer levels
2. Get MNR for each sub-band
3. Search for sub-band with the lowest MNR
4. Allocate code bits to this sub-band.
 If sub-band gets allocated more code bits than appropriate, look
up new estimate of SNR and repeat step 1
Audio Quality
 Bitrate
 With too low bit rate, we get compression artifacts
 Ringing
 Pre-echo – sound is heard before it occurs. It is most
noticeable in impulsive sounds from percussion
instruments such as cymbals
 Occurs in transform-based audio compression algorithms

 Quality of encoder and encoding parameters


 Constant Bit rate encoding
 Variable Bit rate encoding
MP3 Audio Format

Source: http://wiki.hydrogenaudio.org/images/e/ee/Mp3filestructure.jpg
MPEG Audio Comments
 Precision of 16 bits per sample is needed to get
good SNR ratio
 Noise we are getting is quantization noise from
the digitization process
 For each added bit, we get 6dB better SNR ratio
 Masking effect means that we can raise the
noise floor around a strong sound because the
noise will be masked away
 Raising noise floor is the same as using less bits
and using less bits is the same as compression
Successor of MP3
 Advanced Audio Coding (AAC) – now part
of MPEG-4 Audio
 Inclusion of 48 full-bandwidth audio
channels
 Default audio format for iPhone, iPad,
Nintendo, PlayStation, Nokia, Android,
BlackBerry
 Introduced 1997 as MPEG-2 Part 7
 In 1999 – updated and included in MPEG-4
AAC’s Improvements over MP3
 More sample frequencies (8-96 kHz)
 Arbitrary bit rates and variable frame
length
 Higher efficiency and simpler filterbank
 Uses pure MDCT (modified discrete cosine
transform)
 Used in Windows Media Audio
MPEG-4 Audio
 Variety of applications
 General audio signals
 Speech signals
 Synthetic audio
 Synthesized speech (structured audio)
MPEG-4 Audio Part 3
 Includes variety of audio coding technologies
 Lossy speech coding (e.g., CELP)
 CELP – code-excited linear prediction – speech
coding
 General audio coding (AAC)
 Lossless audio coding
 Text-to-Speech interface
 Structured Audio (e.g., MIDI)
MPEG-4 Part 14
 Called MP4 with Extension .mp4
 Multimedia container format
 Stores digital video and audio streams and
allows streaming over Internet
 Container or wrapper format
 meta-fileformat whose spec describes how
different data elements and metadata coesit
in computer file
MPEG-4 Audio
 Bit-rate 2-64kbps
 Scalable for variable rates
 MPEG-4 defines set of coders
 Parametric Coding Techniques: low bit-rate 2-6kbps,
8kHz sampling frequency
 Code Excited Linear Prediction: medium bit-rates 6-
24 kbps, 8 and 16 kHz sampling rate
 Time Frequency Techniques: high quality audio 16
kbps and higher bit-rates, sampling rate > 7 kHz
Conclusion
 MPEG Audio is an integral part of the
MPEG standard to be considered together
with video
 MPEG-4 Audio represents an major
extension in terms of capabilities to
MPEG-1 Audio

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy