Mod 1 DCT
Mod 1 DCT
TECHNIQUES
CST 446
SYLLABUS
Module-1
Modelling and Types of Compression
Introduction to Compression Techniques- Lossy compression &
Lossless compression, Measures of Performance, Modeling and coding.
Mathematical modelling for Lossless and lossy compression - Physical
models and probability models.
Data compression
• Data compression algorithms are used to reduce the number of bits
required to represent a text or an image or a video sequence or music
• The art or science of representing information in a compact form.
• compact representations are created by identifying and using structures
that exist in the data.
• early example of data compression is Morse code, developed by Samuel
Morse in the mid-19th century.
• Letters sent by telegraph are encoded with dots and dashes.
• The assigned shorter sequences to letters that occur more frequently, such
as e (·) and a (· −), and longer sequences to letters that occur less
frequently, such as q (−−·−) and j (·−−−).
Compression Techniques
• the compression algorithm that takes an input x
• generates a representation x that requires fewer bits
• reconstruction algorithm operates on the compressed representation x c
to generate the reconstruction y
Compression Techniques
• Based on the requirements of reconstruction, data compression
schemes can be divided into two broad classes:
• lossless compression schemes, in which is y identical to x, and
• lossy compression schemes, which generally provide much higher
compression than lossless compression but allow y to be different
from x
Compression Techniques
• Lossless Compression
• involve no loss of information
• the original data can be recovered exactly from the compressed data
• used for applications that cannot tolerate any difference between
the original and reconstructed data
• Text compression is an important area for lossless compression
• Consider the sentences “Do not send money” and “Do now send
money.”
• a radiological image
• Data obtained from satellites
Compression Techniques
• Lossy Compression
• involve some loss of information
• data cannot be recovered or reconstructed exactly
• higher compression ratios than in lossless compression
• Applications in which lack of exact reconstruction is not a
problem
• storing or transmitting speech, the exact value of each sample of
speech is not necessary
• reconstruction of a video sequence
Compression Techniques
• Measures of Performance
• A compression algorithm can be evaluated in a number ways.
the relative complexity of the algorithm
the memory required to implement the algorithm
how fast the algorithm performs on a given machine
the amount of compression
how closely the reconstruction resembles the original.
Compression Techniques
• Measures of Performance
• compression ratio
• ratio of the number of bits required to represent the data before
compression to the number of bits required to represent the data after
compression
• Storing an image made up of a square array of 256×256 pixels requires
65,536 bytes.
• the compressed version requires 16,384 bytes.
• the compression ratio is 4:1.
• We can also represent the compression ratio by expressing the reduction in
the amount of data required as a percentage of the size of the original data.
In this particular example the compression ratio would be 75%.
Compression Techniques
• Measures of Performance
• Rate
• average number of bits required to represent a single sample
• if we assume 8 bits per byte (or pixel), the average number of
bits per pixel in the compressed representation is 2. Thus, we
would say that the rate is 2 bits per pixel.
• Bit rate measures the average number of bits required to
represent one unit of information. A lower bit rate indicates
more efficient compression.
Compression Techniques
Measures of Performance
• Time Complexity:
• Time complexity is a measure of the amount of time an algorithm takes to
complete as a function of the size of its input.
• It provides an estimation of how the algorithm's running time grows with the size
of the input.
• Lower time complexity is generally desirable, especially for real-time
applications.
• Space Complexity:
• This measures the amount of memory required by the compression algorithm.
Lower space complexity is preferable, particularly in resource-constrained
environments.
Compression Techniques
• Measures of Performance
• Lossy compression, the reconstruction differs from the original data
• To determine the efficiency of a compression algorithm, we have to quantify the difference
• The difference between the original and the reconstruction is called the distortion.
• In compression of speech and video, the final arbiter of quality is human.
• Approximate measures of distortion are used to determine the quality of the
reconstructed waveforms
• fidelity and quality- to terms to measure distortion
• When we say that the fidelity or quality of a reconstruction is high, we mean that the
difference between the reconstruction and the original is small
• Fidelity specifically relates to the accuracy of the compressed representation compared to
the original, while quality encompasses a broader set of criteria, including fidelity, and
may involve subjective judgments based on human perception and preferences.
Compression Techniques
• Modeling and Coding
• Exact compression scheme we use will depend on the characteristics of the
data that need to be compressed
• Redundancies inherent in the data.
• The development of data compression algorithms for a variety of data can be
divided into two phases.
• The first phase is modeling.
• extract information about any redundancy that exists in the data and describe the redundancy
in the form of a model.
• The second phase is called coding.
• A description of the model and a “description” of how the data differ from the model are
encoded, using a binary alphabet.
• The difference between the data and the model is referred to as the residual
Compression Techniques
• Modeling and Coding
• Example 1
• Consider the following sequence of numbers x₁,x₂,x₃,….
• The decoder adds each received value to the previous decoded value to obtain
the reconstruction
• Techniques that use the past values of a sequence to predict the current value
and then encode the error in prediction, or residual, are called predictive
coding schemes.
Compression Techniques
• Example 3:
• Suppose we have the following sequence:
abarayaranbarraybranbfarbfaarbfaaarbaway
• sequence is made up of eight different symbols
• to represent eight symbols, we need to use 3 bits per symbol
• Some symbols are more often than others
• Assign binary codes of different lengths to different symbols.
Compression Techniques
• If we substitute the codes for each symbol, we will use 106 bits to encode the entire sequence. As there are 41
symbols in the sequence, this works out to approximately 2.58 bits per symbol. This means we have obtained a
compression ratio of 1.16:1
Mathematical modelling for Lossless
Compression
• Physical Models
• Physics of the data generation process
• In speech-related applications, knowledge about the physics of speech
production can be used to construct a mathematical model for the sampled
speech process
• Models for certain telemetry data can also be obtained through knowledge of
the underlying process
• Residential electrical meter readings at hourly intervals were to be coded,
knowledge about the living habits of the populace could be used to determine
when electricity usage would be high and when the usage would be low.
• Instead of the actual readings, the difference (residual) between the actual
readings and those predicted by the model could be coded.
Mathematical modelling for Lossless
Compression
• Probability Models
• Ignorance model
• assume that each letter that is generated by the source is independent
of every other letter, and each occurs with the same probability
• Probability model
• assume that each letter that is generated by the source is independent
of every other letter, and each occurs with the different probability
• For a source that generates letters from an alphabet A= a1,a2,…. aM ,
we can have a probability model P= P(a1), P(a2),…….P(aM).
Mathematical modelling for Lossy
Compression
• When modeling sources in order to design or analyze lossy
compression schemes, we look more to the general rather than exact
correspondence.
• Certain probability distribution functions are more analytically
tractable than others, and we try to match the distribution of the source
with one of these “nice” distributions.
Mathematical modelling for Lossy
Compression
• Probability Models
• method for characterizing a particular source
• we look more to the general rather than exact correspondence
• Four probability models
• Uniform distribution
• Gaussian distribution
• Laplacian distribution
• Gamma distribution
Mathematical modelling for Lossy
Compression
• Uniform Distribution:
• this is an ignorance model.
• If we do not know anything about the distribution of the source output, except
possibly the range of values, we can use the uniform distribution to model the
source.
• A random variable X has a uniform distribution over(-2, 2)
• The probability density function for a random variable uniformly distributed
between a and b is
Mathematical modelling for Lossy
Compression
• Gaussian Distribution
• most commonly used probability models
• it is mathematically tractable
• Most data points cluster toward the middle of the range, while the rest taper
off symmetrically toward either extreme.
• The middle of the range is also known as the mean of the distribution.
• The probability density function for a random variable with a Gaussian
distribution
Mathematical modelling for Lossy
Compression
• Laplacian Distribution
• Many sources that we deal with have distributions that are quite peaked at
zero.
• For example, speech consists mainly of silence. Therefore, samples of speech
will be zero or close to zero with high probability.
• Image pixels there is a high degree of correlation among pixels. Therefore, a
large number of the pixel-to-pixel differences will have values close to zero.
• The distribution function for a zero mean random variable with Laplacian
distribution
Mathematical modelling for Lossy
Compression
• Gamma Distribution
• distribution that is more peaked an considerably less tractable
• The distribution function for a Gamma distributed random variable with zero
mean is given by
Mathematical modelling for Lossy
Compression
• The shapes of these four distributions, assuming a mean of zero and a
variance of one
Mathematical Preliminaries for Lossless and
Lossy Compression
• Information Theory
• Shannon defined a quantity called self-information
• Suppose we have an event A, which is a set of outcomes of some random experiment.
• If P(A) is the probability that the event A will occur, then the self-information associated
with A is given by
first-order entropy
• MarkovModels
• the probability of the next letter is heavily influenced by the preceding letters
• It is assumed that future states depend only on the current state
• The use of the Markov model does not require the assumption of linearity. For example,
consider a binary image.
• The image has only two types of pixels, white pixels and black pixels.
• We know that the appearance of a white pixel as the next observation depends, to some
extent, on whether the current pixel is white or black.
• For models used in lossless compression, we use a specific type of Markov process called
a discrete time Markov chain. A sequence is said to follow a kth-order Markov model if
• MarkovModels
• MarkovModels
• we have processed precedin and are going to encode the next letter.
• If we take no account of the context and treat each letter as a surprise, the
probability of the letter g occurring is relatively low.
• If we use a first-order Markov model or single-letter context (that is, we look
at the probability model given n), we can see that the probability of g would
increase substantially.
• As we increase the context size (go from n to in to din and so on), the
probability of the alphabet becomes more and more skewed, which results in
lower entropy.
• CompositeSourceModel
• In many applications, it is not easy to use a single model to describe the
source.
• In such cases, we can define a composite source, which can be viewed as a
combination or composition of several sources, with only one source being
active at any given time.
Thank You