Video Compressions
Video Compressions
The video application will determine whether compression can be implemented on a transmitted video stream,
and to what level. In the broadcasting industry transmitted needs to be of a very high quality when the video is
used for video editing, or pre-production processing, but could be of a slightly lower quality for distribution to the
consumer. In video surveillance, the image quality must be sufficient to present the operator with a good enough
image to identify what is happening in the scene, this does not require a broadcast standard of video.
Analogue CCTV cameras produce either PAL or NTSC video signals, depending on geographical region. A PAL
image comprises two separate video fields of 720x288 pixels and both fields are interlaced together on the
monitor to produce a single video frame, which is displayed 25 times per second. NTSC works in a similar
manner but uses a frame of 720x480 pixels and displays 30 frames per second. The diagram below shows the
composition of the PAL video image.
720 Pixels
Field 0
288 Pixels
720 Pixels
576 Pixels
Field 1
There are four main digital compression techniques used today within the Digital Video Surveillance marketplace:
Motion-JPEG, MPEG-2, H.263 and Wavelet.
Motion-JPEG
Motion-JPEG is a development of the still photographic digital compression technique known as JPEG, which is
the abbreviation for the Joint Photographic Experts Group who defined the standard. Motion-JPEG is an
extension of this standard in which many individual images are sent per second giving the visual illusion of
Motion.
Each
Eachvideo
videofield
fieldindividually
individually
Compressed
Compressedandandtransmitted
transmitted
Video-sequence
Video-sequence
@
@25
25full
fullframes/second
frames/second
M-JPEG transmission systems start being performing an Analogue to Digital conversion on an analogue video
image and this input is fed into a JPEG compression engine which takes blocks of pixels within the picture –
typically 8 x 8 pixels, and subjecting them to a mathematical process known as ‘Discrete Cosine Transform’.
The result of this process is a set of digital data that represents not just the brightness and colour of each pixel,
but the changes in brightness and colour between the pixels in the block. If every pixel is the same colour and
brightness, then the amount of data generated will be low, as the part of the data representing the colour and
brightness will only exist once – for all 64 pixels in the 8x8 block. If every pixel is different, then the data
generated will actually be more than the original data. The encoder contains a set of “rules”, known as the “Q” or
Standardisation
The Joint Pictures Expert Group developed the JPEG encoding technique, which resulted in the standards of
ITU-T T.81 and ISO/IEC 10928-1 – 1992 respectively. At the time there was no standardisation for the
transmission of multiple images, leaving manufacturers able to implement their own mechanisms.
JPEG-2000 is an emerging standard for still image compression, and attempts to standardise the editing,
processing and targeting of images between different devices and applications.
· Relatively high bandwidth because each complete video field is transmitted regardless of change of
content
· Lower bandwidth connections require a reduced number of fields to be transmitted per second, or more
image compression
MPEG
There is actually no such thing as a single ‘MPEG’ standard: MPEG consists of a set of standards for motion
video compression, with many different options allowed even inside one umbrella standard such as MPEG–2.
MPEG-2 was originally designed for the broadcast TV industry, and is now one of the compression techniques
commonly used within the video surveillance industry.
and these frames types are typically used in the following formats:
· ‘I’ Frame only
· ‘IP’
· ‘IBBP’
1 2 3
4 5 6
This transmission sequence provides the lowest MPEG-2 latency as the encoder and digitising and compressing
every frame, but uses more bandwidth as every frame is transmitted in its entirety.
‘IP’ Sequence
The first frame of an ‘IP’ sequence consists of a complete frame. ‘P’ frames contain the predicted movements
from this frame:
1 2 3
4 5 6
‘IBBP’ Sequence
The ‘IBBP’ consists of a similar Group of Pictures to IP, but ‘B’ frames are used which are the changes based on
analysis of the preceding and succeeding frames. The transmitting codec transmits the images in the order that
they need to be received at the receiver, as shown on the diagram below.
1 3 4
2 5 6
The use of the additional ‘B’ frames not only provides a more accurate reproduction of the original image, but
there is also a significant bandwidth saving here over ‘IP’ sequences because the B frames only include a very
small amount of data. The penalty for lower bandwidth and better quality is additional latency due to the additional
MPEG-2 Transmission is a trade off between Latency, Bit-Rate and Quality. All MPEG-2 transmission system will
have a minimum latency of one video frame in both the encoder and decoder, but when using the complex frame
sequences the process of analysing and buffering the video stream adds further latency.
Whilst MPEG-2 transmission is attractive for a low bit-rate connection when there is little, or limited movement
within an image, if that camera is moved (PTZ) by an Operator then every pixel within the image is likely to
change which will require significantly more bandwidth than the still image.
Video Quality is determined by the Profile, which states the quality and resolution of the video. Within MPEG-2
there is a wide variety of video profiles to cater for the Broadcast TV, Tele-medicine, Video-conferencing and
Surveillance industries, each of which has it’s own requirement in terms of number of pixel resolution and how
well the image is compressed but in general the video-surveillance industry used the “Main Profile Main Level”
which provides PAL and NTSC compression with a level of colour detail that is more than acceptable for Video
Surveillance. Main Profile Main Level is also the profile normally used to transmit Digital Television to the home.
Originally developed for the Broadcast Video market, the MPEG-2 standard provides a toolkit for a wide-range of
digital video compression and transmission. Broadcast Video requires the full range of MPEG-2 features such as
video, audio and data multiplexing into a single transport stream, transport stream multiplexing, lip-syncing and
higher quality video support using extended profiles. As the compression and transmission of the MPEG-2 video
streams are standardised it is possible for the video stream to be encoded and decoded by different
manufacturers codecs.
MPEG-4 provides a mechanism to allow multiple media types – including natural video, computer generated
video and audio to be generated, edited, distributed and stored using a standardised range of tools. MPEG-4 is
targeted at the non-real time, low bit rate environment so is not seen to compete with MPEG-2 in the video
surveillance marketplace. MPEG-4 can also be used for transmission between video editing suites, but at very
high resolution and bit rate so again is not relevant to surveillance.
MPEG-7 is being developed to provide a mechanism for identifying and describing a wide range of multimedia
formats such as still pictures, graphics, 3D models, audio and video data. The ability to identify the content of the
file without decoding it allows automated processes to transfer, store, and transmit this information. MPEG-7
itself is not a compression algorithm, and will build on existing standards such as MPEG-1, MPEG-2 and MPEG-
4 to re-use standardised components wherever possible.
MPEG-21 is an enhancement to all the MPEG standards mentioned so far, and will provide an umbrella concept
of a digital item that will identify all parts of a multimedia file.
· More complex techniques means currently relatively more expensive to encode per port.
· Higher Bandwidth if used in I-frame only mode (comparable to M-JPEG)
· Higher Latency if using predictive methods (typically upwards of 300 milli-seconds)
H.263
The International Telecommunications Union (ITU) has standardised a digital video compression standard within
its set of videoconferencing standards. The main ITU videoconference standards, H.320 for ISDN lines, and
H.323 for local area networks (LANs) both call out use of H.261 standard compression. H.263 has a coding
algorithm that is similar to H.261, but with some improvements to transmit at higher resolutions, and improved
error recovery.
This compression technique was designed specifically for low bandwidth links, and is not designed to provide TV
quality. Specifically, it can only handle pictures with about 75,000 pixels – one quarter of TV standard resolution.
The subjective effects are such that it can produce quite good looking still and semi – still pictures (the head of
person talking set against a bland, single colour background), but it will produce jerky motion and blocking effects
if too much motion is attempted. The standard is a reasonable technical compromise when used for low
bandwidth video-surveillance applications but cannot compete with M-JPEG or MPEG in high quality video
transmission.
Wavelet
A number of companies are heralding Wavelet compression as the answer to all video transmission
requirements: Wavelets can produce reasonable pictures at low data rates and good quality pictures at higher
data rates.
Wavelet compression has been heralded as a breakthrough in compression technology, but is really
development of an established technique known as sub-band coding (SBC). Only recently has the chip
technology been available to implement wavelet compression, so up to now the video community has
disregarded it.
The operation actually is mathematically very similar (although produced by entirely different means) to M-JPEG,
except that the encoding is done using digital filters on data from an entire frame, not just a block of pixels. The
data sets that result has the basic structure of the picture in one set, and detailed information in others so the
picture can be rebuilt to a higher quality depending on the amount of data at the receiver. Quality tables are used
to cut away surplus information. Wavelet compression schemes can also use temporal compression, which
analyses the content of the image to determine changes to the image, and only resends these changes.
The data sets that result has the basic structure of the picture in one set, and detailed information in others so
the picture can be rebuilt to a higher quality depending on the amount of data at the receiver. Quality tables are
used to cut away surplus information.
Wavelet transforms can be used at high and low data rates, and at any resolution, which makes the transform
quite attractive when designing systems to run over different types of transmission network. The downside of
wavelet transforms is that even at quite high data rates the received pictures are subjectively “soft” and have a
relatively high latency. Higher bandwidth Wavelet compression can reduce the latency, with the minimum latency
for end-to-end wavelet transmission still being just under 200milli-seconds, which may be an issue when used
within PTZ systems.
Wavelet transforms function well at low and medium data rates, where they offer subjectively good picture
quality, with less annoying “block” effects than an H.261 transform. At higher data rates, Wavelet transforms are
more difficult to perform than M-JPEG, and offer no advantages compared to the additional bandwidth used.
Standardisation
Until recently Wavelet compression has had very little standardisation. The JPEG-2000 standard provides
guidance for the implementation of Wavelet, but many options are still being implemented using proprietary
techniques.
The decision as to which video compression system is adopted is dependant upon the application to which it is
going to be applied. The following key areas should be considered when deciding which compression technology.
Latency.
Most CCTV projects require Security operators to be able to pan, tilt and zoom the CCTV cameras. If this is the
case, the feedback loop between the operator moving the joystick and the video moving on the screen must be
minimal. Discussions with operators have shown that between 100-200ms round-trip latency is acceptable, but
any more than this causes problems in using the system.
This is shown most readily when an operator is trying to follow a suspect person through, for example, a
departure lounge. As the operator tries to zoom in to identify that person’s actions, the camera must respond
readily to the commands sent to it by the operator. If the latency of the whole system is in excess of 200ms the
system becomes unmanageable for that operator as every-time he wants to stop the camera, the system latency
means that the camera has actually moved on from the image the operator is seeing and so stops there. The
operator then has to move the camera again in order to home in on the subject. This becomes very frustrating,
very quickly.
Video
TV
CC
Control
Video
Control
Network
Note that is the whole system latency that can cause problems to the operator. The surveillance solution must be
designed from a systems perspective as the performance of the system is a combination of the combined
performance of the network, codec, control system and remote equipment.
Network Bandwidth becomes an issue when there are a large number of images to be transmitted across the
network simultaneously, especially if the images are being sent across the Wide Area using a Service Provider.
Whilst using MPEG-2 can reduce the bandwidth needed, it may result in additional latency that can be
unacceptable to the Operator.
Video Switching
Using traditional CCTV technology, all images are continuously streamed to a CCTV control room and then
connections made on a local matrix. There is very little delay in the switching of video images on a single monitor.
Networked Video provides a mechanism for only making video connections as and when required. Therefore the
time taken between the Security Operator selecting a camera and the image appearing on the screen must be
kept as low as possible.
Conclusion.
Video Transmission is only one part of a Networked Video Surveillance solution. The solution for any project can
only be designed when the KPI’s (Key Performance Indicators), Service Level Agreements (SLA’s) and
applications have been decided. This information will indicate to the designer what type of network, what type of
camera, what type of codec and what control system needs to be proposed.
The CellStack Centauri provides a flexible architecture that allows the user to customise the chassis
configuration depending on what network, what codec and what control system is specified. Using simple design
rules, combinations of M-JPEG, MPEG-2, IP and ATM network interfaces can all be used to provide an
integrated surveillance solution that really delivers.