0% found this document useful (0 votes)
14 views30 pages

Wk9_MPEG_Part2

The document discusses techniques for encoding motion vectors in video compression, focusing on differential coding and motion estimation methods such as Sum of Absolute Differences (SAD) and Full Search. It covers various search strategies, including logarithmic and hierarchical motion estimation, and the use of B-frames for improved coding efficiency in MPEG standards. Additionally, it outlines the evolution of MPEG formats and their impact on video quality and compression efficiency.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views30 pages

Wk9_MPEG_Part2

The document discusses techniques for encoding motion vectors in video compression, focusing on differential coding and motion estimation methods such as Sum of Absolute Differences (SAD) and Full Search. It covers various search strategies, including logarithmic and hierarchical motion estimation, and the use of B-frames for improved coding efficiency in MPEG standards. Additionally, it outlines the evolution of MPEG formats and their impact on video quality and compression efficiency.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Encoding motion vectors

• Differential Coding of Motion Vectors


• Motion vectors tend to be highly correlated between macroblocks
• The horizontal component is compared to the previously valid horizontal motion vector and
• Only the difference is coded
• Same difference is calculated for the vertical component
• Difference codes are then described with a variable length code (e.g., Huffman) for maximum compression
efficiency

(c) Patrick Denny 2024 66


Recap: P-Frame coding summary

(c) Patrick Denny 2024 67


Estimating the motion vectors
• So how do we find the motion?
• Basic idea is to search for macroblock
• Within a +/- n x m pixel search window
• Work out for each window the Sum of Absolute Difference (SAD) or the Mean Absolute Error (MAE)
• Choose window where the SAD or MAE is a minimum.
• If the encoder decides that no acceptable match exists then it has the option of
• Coding that particular macroblock as an intra macroblock
• Even though it may be in a P frame!
• In this manner, high-quality video is maintained at a slight cost to coding efficiency

(c) Patrick Denny 2024 68


Sum of absolute differences (SAD)
• SAD is computed by
• 𝑆𝐴𝐷 𝑖, 𝑗 = σ𝑁−1 𝑁−1
𝑘=0 σ𝑙=0 𝐶 𝑥 + 𝑘, 𝑦 + 𝑙 − 𝑅 𝑥 + 𝑘 + 𝑖, 𝑦 + 𝑙 + 𝑗

• N : size of macroblock window, typically 16 or 32 pixels


• (x,y) : the position of the original macroblock C, and
• R : the reference region to compute the SAD
• C(x+k,y+l) : pixels in the macro block with upperleft corner (x,y) in the target
• R(x+k+i,y+l+j) : pixels in the macroblock with upper left corner (x+i,y+j) in the reference

(c) Patrick Denny 2024 69


Sum of squared differences (SSD)
• Alternatively, a sum of squared differences
2
• 𝑆𝑆𝐷 𝑖, 𝑗 = σ𝑁−1 σ𝑁−1
𝑘=0 𝑙=0 𝐶 𝑥 + 𝑘, 𝑦 + 𝑙 − 𝑅 𝑥 + 𝑘 + 𝑖, 𝑦 + 𝑙 + 𝑗
• Goal is to find a vector (i,j) such that SAD(i,j) or SSD(i,j) is minimum

(c) Patrick Denny 2024 70


Full search
• Search exhaustively the whole (2R+1) x (2R+1) window in the reference frame
• A macroblock centered at each of the positions within the window is compared to the macroblock in the target frame
pixel by pixel and their respective SAD (or MAE) is computed
• The vector (i,j) that offers the least SAD (or MAE) is designated as the motion vector for the macroblock in the
target frame
• Full search is very costly

(c) Patrick Denny 2024 71


Complexity of full search
• Assumptions
• Block size N x N and image size S = M1 x M2
• Search step size is 1 pixel
• Search range is +/- R pixels both horizontally and vertically
• Computational complexity
• Candidate matching blocks = (2R+1)2
• Operations for computing MAD for one block = O(N2)
• Operations for motion vector estimation per block = O((2R+1)2N2)
• Blocks = S/N2
• Total operations for entire frame O((2R+1)2S)
• i.e., overall computation load is independent of block size!
• Example:
• M = 512, N = 16, R = 16, 30fps
• Approximately 8.55 x 109 operations per second (8.5 gigaops!)
• Real time estimation is difficult
• Speed up with GPU?

(c) Patrick Denny 2024 72


Full search
• Advantages
• Guaranteed to find optimal motion vector within search range
• Disadvantages
• Can only search among finitely many candidates. What if the motion is in a fractional number of pixels
• High computational complexity : O((2R+1)2S)
• How to improve?
• Accuracy
• Consider fractional translations
• This requires interpolation (e.g., bilinear interpolation in H.263)
• Speed
• Try to avoid checking unlikely candidates

(c) Patrick Denny 2024 73


Bilinear interpolation

(c) Patrick Denny 2024 74


Logarithmic search
• An approach takes several iterations akin to a binary search
• Computationally cheaper, suboptimal, but usually effective
• Initially only nine locations in the search window are used as seeds for a SAD-based search (marked as ‘1’)
• After locating the one with the minimal SAD, the centre of the new search region is moved to it and the step-size
(“offset”) is reduced to half
• In the next iteration, the nine new locations are marked as ‘2’ and the process repeats
• If L iterations are applied, for altogether 9L positions, only 9L positions are checked

(c) Patrick Denny 2024 75


Logarithmic search

(c) Patrick Denny 2024 76


Hierarchical motion estimation
• Form several low-resolution
versions of the target and
reference pictures
• Find the best match motion
vector in the lowest
resolution version
• Modify the motion vector
level by level when going up

(c) Patrick Denny 2024 77


Hierarchical motion estimation

(c) Patrick Denny 2024 78


Performance comparison
• Operation for 720 x 480 at 30 frames per second (in gigaoperations per second)

Search Method p = 15 p=7


Full Search 29.890 6.990
Logarithmic 1.020 0.778
Hierarchical 0.507 0.399

(c) Patrick Denny 2024 79


Selecting intra/inter frame coding
• Based upon the motion estimation a decision is made on whether intra or inter coding is made
• To determine intra versus inter mode we do the following calculation
σ𝑁−1
𝑖=0,𝑗=0 𝐶 𝑖,𝑗
• 𝑀𝐵𝑚𝑒𝑎𝑛 =
𝑁2
• 𝐴 = σ𝑁−1
𝑖=0,𝑗=0 𝐶 𝑖, 𝑗 − 𝑀𝐵𝑚𝑒𝑎𝑛
• If A < (SAD – 2N2) then intra mode is chosen

(c) Patrick Denny 2024 80


MPEG compression
• MPEG stands for
• Motion Picture Expert Group – established circa 1990 to create standard for delivery of audio and video
• MPEG-1 (1991): Target VHS quality on a CD-ROM (320 x 240 + CD audio @1.5 Mbits/sec)
• MPEG-2 (1994): Target Television Broadcast
• MPEG-3 :HDTV but subsumed into an extension of MPEG-2
• MPEG-4 (1998): Very Low Bitrate Audio-Visual Coding, later MPEG-4 Part 10 (H.264) for wide range of bitrates and
better compression quality
• MPEG-7 (2001) “Multimedia Content Description Interface”
• MPEG-21 (2002) “Multimedia Framework”

(c) Patrick Denny 2024 81


Three parts to MPEG
• The MPEG standard has three parts

• Video
• based on H.261 and JPEG
• Audio
• based on MUSICAM (Masking pattern adapted Universal Subband Integrated Coding and Multiplexing)
technology
• System
• Control interleaving of streams

(c) Patrick Denny 2024 82


MPEG video
• MPEG compression is essentially an
attempt to overcome some
shortcomings of H.261 and JPEG
• Recall H.261 dependencies
• We’ve seen the power and use of P
and I frames, are there any other tricks
we can use?

(c) Patrick Denny 2024 83


Bidirectional
search
• A problem is that many macroblocks
need information that is not in the
reference frame
• The example in the figure shows this
• Occlusion by objects affects
differencing
• Difficult to track occluded objects etc.,
• MPEG uses forward/backward
interpolated prediction

(c) Patrick Denny 2024 84


MPEG B-frames
• The MPEG solution is to add a third
frame type which is a bidirectional
frame, or B-frame
• B-frames search for macroblock in
past and future frames
• Typical pattern is IBBPBBPBB
IBBPBBPBB IBBPBBPBB
• The actual pattern is up to the
specific encoder and need not be
regular

(c) Patrick Denny 2024 85


Example: I, P
and B frames
• Consider a group of pictures that last
for 6 frames
• Given I,B,P,B,P,B,I,B,P,B,P,B,…
• I frames are coded spatially only
(as before in H.261)
• P frames are forward predicted
based on previous I and P frames
(as before in H.261)
• B frames are coded based on a
forward prediction from a previous
I or P frame, as well as a
backward prediction from a
succeeding I or P frame

(c) Patrick Denny 2024 86


Bidirectional prediction

(c) Patrick Denny 2024 87


Example: I, P
and B frames
• 1st B frame is predicted from the 1st
I frame and 1st P frame
• 2nd B frame is predicted from the 1st
and 2nd P frames
• 3rd B frame is predicted from the
2nd and 3rd P frames
• 4th B frame is predicted from the 3rd
P frame and the 1st I frame of the
next group of pictures

(c) Patrick Denny 2024 88


Bidirectional prediction

(c) Patrick Denny 2024 89


Backward prediction
implications
• Note: backward prediction requires that
the future frames that are to be used
for backward prediction be encoded
and transmitted first, i.e., out of order
• This process is summarised in the
figure
• Consider the implications that this has
for memory accesses and latency
both for the encoder and the decoder

(c) Patrick Denny 2024 90


Backward prediction implications
• No defined limit to the number of consecutive B frames that may be used in a group of pictures
• Optimal number is application dependent
• Most broadcast quality applications, however, have tended to use 2 consecutive B frames (I,B,B,P,B,B,P,..) as the
ideal trade-off between compression efficiency and video quality
• MPEG suggests some standard groupings

(c) Patrick Denny 2024 91


Advantages of using B-frames
• Coding efficiency
• Most B frames use fewer bits
• Quality can also be improved in the case of moving objects that reveal hidden areas within a video sequence
• Better error propagation: B frames are not used to predict future frames, errors generated will not be propagated
further within the sequence
• Disadvantages
• Frame reconstruction memory buffers within the encoder and decoder must be double in size to accomdoate
the 2 anchor frames
• More delays in real-time applications

(c) Patrick Denny 2024 92


Frame sizes
• From a system point of view,
particular in embedded realtime
systems, a stable frame size is
preferred as this leads to very
efficient video pipelines
• The figure shows the mixture of
frame sizes that can occur during a
standard MPEG transmission

(c) Patrick Denny 2024 93


Random Access
Points
• The MPEG standard also puts
some constraints on where a video
stream can be randomly entered

(c) Patrick Denny 2024 94


MPEG-2, MPEG-3 and MPEG-4
• MPEG-2 difference from MPEG-1
• Search on fields, not just frames
• 4:2:2 and 4:4:4 macroblocks
• Frame sizes as large as 16383 x 16383
• Scalable modes: Temporal, Progressive,…
• Non-linear macroblock quantization factor
• A bunch of minor fixes
• MPEG-3
• Originally for HDTV (1920 x 1080), got folded into MPEG-2
• MPEG-4
• Very low bit-rate communication (4.8 to 64 kbit/sec)
• Around objects not frames

(c) Patrick Denny 2024 95

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy