Image
Image
Image Compression
Images can be sampled and quantized sufficiently finely so that a binary data stream can
represent the original data to an extent that is satisfactory to the most discerning eye. Since we
can represent a picture by anything from a thousand to a million bytes of data, we should be
able to apply the techniques studied earlier directly to the task of compressing that data for
storage and transmission. First, we consider the following points:
1. High quality images are represented by very large data sets. A photographic quality image
may require 40 to 100 million bits for representation. These large file sizes drive the need for
extremely high compression ratios to make storage and transmission (particularly of movies)
practical.
2. Applications that involve imagery such as television, movies, computer graphical user
interfaces, and the World Wide Web need to be fast in execution and transmission across
distribution networks, particularly if they involve moving images, to be acceptable to the
human eye.
3. Imagery is characterised by higher redundancy than is true of other data. For example, a pair
of adjacent horizontal lines in an image is nearly identical while, two adjacent lines in a book
are generally different.
The first two points indicate that the highest level of compression technology needs to be used
for the movement and storage of image data. The third factor indicates that high compression
ratios could be applied. The third factor also says that some special compression techniques
may be possible to take advantage of the structure and properties of image data. The close
relationship between neighbouring pixels in an image can be exploited to improve the
compression ratios. This has a very important implication for the task of coding and decoding
image data for real-time applications.
Another interesting point to note is that the human eye is highly tolerant to approximation error
in an image. Thus, it may be possible to compress the image data in a manner in which the less
important details (to the human eye) can be ignored. That is, by trading off some of the quality
of the image we might obtain a significantly reduced data size. This technique is called Lossy
Compression, as opposed to the Lossless Compression techniques.
Such liberty cannot be taken, say, financial or textual data! Lossy Compression can only be
applied to data such as images and audio where deficiencies are made up by the tolerance by
human senses of sight and hearing.
The Joint Photographic Experts Group (]PEG) was formed jointly by two 'standards'
organisations--the CCITT (The European Telecommunication Standards Organisation) and the
International Standards Organisation (ISO). Let us now consider the lossless compression
option of the JPEG Image Compression Standard which is a description of 29 distinct coding
systems for compression of images. Why are there so many approaches? It is because the needs
of different users vary so much with respect to quality versus compression and compression
versus computation time that the committee decided to provide a broad selection from which
to choose. We shall briefly discuss here two methods that use entropy coding.
The two lossless JPEG compression options discussed here differ only in the form of the
entropy code that is applied to the data. The user can choose either a Huffman Code or an
Arithmetic Code. We will not treat the Arithmetic Code concept in much detail here. However,
we will summarize its main features:
Arithmetic Code, like Huffman Code, achieves compression in transmission or storage by
using the probabilistic nature of the data to render the information with fewer bits than used in
the source data stream. Its primary advantage over the Huffman Code is that it comes closer to
the Shannon entropy limit of compression for data streams that involve a relatively small
alphabet. The reason is that Huffman codes work best (highest compression ratios) when the
probabilities of the symbols can be expressed as fractions of powers of two. The Arithmetic
code construction is not closely tied to these particular values, as is the Huffman code. The
computation of coding and decoding Arithmetic codes is costlier than that of Huffman codes.
Typically a 5 to 10% reduction in file size is seen with the application of Arithmetic codes over
that obtained with Huffman coding. Some compression can be achieved if we can predict the
next pixel using the previous pixels. In this way we just have to transmit the prediction
coefficients (or difference in the values) instead of the entire pixel. The predictive process that
is used in the lossless JPEG coding schemes to form the innovations data is also variable.
However, in this case, the variation is not based upon the user's choice, but rather, for any
image on a line by line basis. The choice is made according to that prediction method that
yields the best prediction overall for the entire line.
There are eight prediction methods available in the JPEG coding standards. One of the eight
(which is the no prediction option) is not used for the lossless coding option that we are
examining here. The other seven may be divided into the following categories:
1. Predict the next pixel on the line as having the same value as the last one.
2. Predict the next pixel on the line as having the same value as the pixel in this position on the
previous line (that is, above it).
3. Predict the next pixel on the line as having a value related to a combination of the previous,
above and previous to the above pixel values. One such combination is simply the average of
the other three.
The differential encoding used in the JPEG standard consists of the differences between the
actual image pixel values and the predicted values. As a result of the smoothness and
redundancy present in most pictures, these differences give rise to relatively small positive and
negative numbers that represent the small typical error in the prediction. Hence, the
probabilities associated with these values are large for the small innovation values and quite
small for large ones. This is exactly the kind of data stream that compresses well with an
entropy code.
The typical lossless compression for natural images is 2:1. While this is substantial, it does not
in general solve the problem of storing or moving large sequences of images as encountered in
high quality video.
The JPEG standard includes a set of sophisticated lossy compression options developed after
a study of image distortion acceptable to human senses. The JPEG lossy compression algorithm
consists of an image simplification stage, which removes the image complexity at some loss of
fidelity, followed by a lossless compression step based on predictive filtering and Huffman or
Arithmetic coding.
The lossy image simplification step, which we will call the image reduction, is based on the
exploitation of an operation known as the Discrete Cosine Transform (DCT), defined as
follows.
(3.8)
where the input image is N pixels by M pixels, y(i, j) is the intensity of the pixel in row i and
column j, Y(k,l) is the DCT coefficient in row k and column l of the DCT matrix. All DCT
multiplications are real. This lowers the number of required multiplications, as compared to
the Discrete Fourier Transform. For most images, much of the signal energy lies at low
frequencies, which appear in the upper left comer of the DCT. The lower right values represent
higher frequencies, and are often small (usually small enough to be neglected with little visible
distortion).
In the JPEG image reduction process, the DCT is applied to 8 by 8 pixel blocks of the image.
Hence, if the image is 256 by 256 pixels in size, we break it into 32 by 32 square blocks of 8
by 8 pixels and treat each one independently. The 64 pixel values in each block are transformed
by the DCT into a new set of 64 values. These new 64 values, known also as the DCT
coefficients ' form a whole new way of representing an image. The DCT coefficients represent
the spatial frequency of the image sub-block. The upper left corner of the DCT matrix has low
frequency components and the lower right corner the high frequency components (see Fig. 3.5).
The top left coefficient is called the DC coefficient. Its value is proportional to the average
value of the 8 by 8 block of pixels. The rest are called the AC coefficients. So far we have not
obtained any reduction simply by taking the DCT. However, due to the nature of most natural
images, maximum energy (information) lies in low frequency as opposed to high frequency.
We can represent the high frequency components coarsely, or drop them altogether, without
strongly affecting the quality of the resulting image reconstruction. This leads to a lot of
compression (lossy). The JPEG lossy compression algorithm does the following operations:
a. The Graphics Interchange Format (GIF): The GIF format is developed by CompuServe,
involves simply an application of the Lempel-Ziv-Welch (LZW) universal coding algorithm to
the image data.
b. Tagged Image File Format (TIFF): is a container format for storing images, including
photographs and line art. It is now under the control of Adobe. Originally created by the
company Aldus for use with what was then called "desktop publishing," TIFF is a
popular format for colour and black and white images.