0% found this document useful (0 votes)
48 views15 pages

Data Compression Techniques Module 1 Ktu

The document provides an in-depth overview of data compression techniques, explaining the necessity for compression in terms of storage efficiency, faster transmission, and cost savings. It distinguishes between lossless and lossy compression, detailing their applications, methods, and trade-offs, while also discussing performance measures such as compression ratio, distortion, and fidelity. Additionally, it covers modeling and coding processes essential for effective compression, highlighting the role of mathematical models in both lossless and lossy compression methods.

Uploaded by

Harith PES 2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views15 pages

Data Compression Techniques Module 1 Ktu

The document provides an in-depth overview of data compression techniques, explaining the necessity for compression in terms of storage efficiency, faster transmission, and cost savings. It distinguishes between lossless and lossy compression, detailing their applications, methods, and trade-offs, while also discussing performance measures such as compression ratio, distortion, and fidelity. Additionally, it covers modeling and coding processes essential for effective compression, highlighting the role of mathematical models in both lossless and lossy compression methods.

Uploaded by

Harith PES 2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1.

Introduction to Compression Techniques (In-Depth)

What is Data Compression?

Definition:
Data compression is the process of transforming information into a compact form so that it
requires fewer bits to represent.

It’s like summarizing a book: you keep the essential ideas and remove repetition or irrelevant
info.

Why Do We Need Compression?

1. Storage Efficiency
o A 4K video can take gigabytes of space. Compressing reduces size → more
files stored.
2. Faster Transmission
o Compressed files are faster to upload/download (especially critical for video
streaming, IoT devices, etc.).
3. Cost Savings
o Less bandwidth and storage = reduced costs for users and companies.
4. Real-Time Applications
o Voice calls (VoIP), video conferencing use compression to work with limited
bandwidth.

What Can Be Compressed?

Data Type Example Is Compression Effective?


Text Articles, logs, code Yes (patterns, symbols)
Images Photos, screenshots Yes (colors, structure)
Audio Songs, voice calls Yes (frequency patterns)
Video Movies, surveillance footage Yes (frame similarity)
Binary Files Executables, databases Sometimes (if redundant)

What’s Being Removed or Reduced?

• Redundancy: Repetition of data


e.g., "AAAAAA" becomes "6A" in RLE
• Irrelevance: Data not needed for perception
e.g., High-frequency audio we can't hear

How Does Compression Work?

Data compression typically works in two steps:

1. Modeling

• Understand patterns in the data.


• Predict or structure the data to expose redundancy.

2. Coding

• Represent the output of the model using fewer bits.


• Smart encoding techniques (like Huffman, Arithmetic).

Compression Trade-offs

Factor Lossless Compression Lossy Compression


Accuracy Perfect reconstruction Approximate reconstruction
Size Reduction Moderate (2:1 to 5:1) High (10:1 to 100:1)
Use Cases Text, code, medical, financial Video, images, audio

Historical Context & Examples

• Morse Code (1830s):


Short codes for frequent letters (e.g., E = “.”, Q = “––·–”) – early variable-length
encoding.
• Braille Compression:
Grade 2 Braille uses contractions for frequent words → 20% space savings.
• Vocoder (1930s):
Compresses speech by modeling the human vocal tract.

These historical ideas evolved into modern compression techniques like JPEG, MP3, and
ZIP.

Summary Points

• Compression = making data smaller.


• Removes redundancy and irrelevance.
• Used everywhere: files, music, video, text.
• Modeled, then encoded.
• Enables storage, streaming, communication, and efficiency.

Would you like to go into real-world examples or visual explanations next? Or shall we
continue to the next topic: Lossless vs. Lossy Compression?

4o

You said:

next
ChatGPT said:

Awesome! Let’s now go deep into the second topic from Module 1:

2. Lossy Compression vs. Lossless Compression (In-


Depth)
Compression techniques fall into two broad categories, each with different goals and use
cases:

A. Lossless Compression

Definition

Compression where no information is lost. After decompression, you get exactly the
original data.

Used for

• Text files (e.g., .txt, .xml)


• Executables (.exe, .class)
• Medical records or images
• Bank transactions, scientific data

How it works

• Identifies redundancy (repeating patterns, predictable structures).


• Builds efficient binary representations (e.g., shorter codes for frequent characters).
Common Techniques

Method Description
Huffman Coding Short codes for frequent symbols
Arithmetic Coding Probabilistic coding → single number encoding
LZW (ZIP, GIF) Dictionary-based compression
Run-Length Encoding (RLE) Compress repeated characters

Example:

Original: AAAAABBBCC
Compressed: 5A3B2C

Restoring gives: AAAAABBBCC

B. Lossy Compression

Definition

Compression that discards some information. The decompressed output is an


approximation of the original.

Used for

• Images (JPEG)
• Videos (MPEG, H.264)
• Audio (MP3, AAC)

Why lossy?

• Human perception has limits:


o Can’t hear very high or low sounds → remove them
o Can’t notice tiny color differences in images

So… if we can’t notice it, we don’t store it.

Common Techniques

Method Description
Transform Coding Convert data to frequency domain (DCT, FFT)
Quantization Round off small values (loss of precision)
Psychoacoustic Modeling Remove inaudible audio details
Motion Compensation Reuse unchanged frames in video
Example: JPEG

Original: High-resolution photo


Compressed: Some fine textures lost, but looks the same to the eye
Restored: Slightly blurry zoom-in → not exact but acceptable

Comparison Table

Feature Lossless Lossy


Integrity 100% original is restored Approximate reconstruction
Data Type Text, source code, logs, data Images, audio, video
Compression Ratio Low to moderate (2:1 to 5:1) High (10:1 to 100:1+)
Reversible Yes No
User Experience Critical for accuracy OK as long as it "looks/sounds" right

Real-World Analogy

Context Lossless Example Lossy Example


Zip archive .zip with documents N/A
Audio FLAC (CD-quality) MP3 (streaming audio)
Image PNG (pixel-perfect) JPEG (compressed photo)
Video Uncompressed AVI MP4 (YouTube, Netflix)

Key Insight

If you care about accuracy → go with lossless.


If you care about size & speed, and small losses are acceptable → go with lossy.

Would you like to see real compressed vs. uncompressed examples, or should we continue
to the next topic: Measures of Performance?

4o

You said:

next
ChatGPT said:

Great! Let’s now go deep into the third topic from Module 1:
3. Measures of Performance (In-Depth)
Compression algorithms are not just about making files smaller — we need metrics to
evaluate how well they perform.

Here are the key performance measures used to compare compression methods:

1️⃣ Compression Ratio (CR)

Definition:

The ratio between the size of the original data and the compressed data.

Compression Ratio (CR)=Size of Original DataSize of Compressed Data\text{Compression


Ratio (CR)} = \frac{\text{Size of Original Data}}{\text{Size of Compressed
Data}}Compression Ratio (CR)=Size of Compressed DataSize of Original Data

Example:

• Original = 100 KB
• Compressed = 25 KB
• CR = 100 / 25 = 4:1 → File is 4x smaller

2️⃣ Compression Factor

Sometimes referred to as the inverse of CR:

Compression Factor=Size of Compressed DataSize of Original Data=1CR\text{Compression


Factor} = \frac{\text{Size of Compressed Data}}{\text{Size of Original Data}} =
\frac{1}{CR}Compression Factor=Size of Original DataSize of Compressed Data=CR1

In the above example: 25 / 100 = 0.25

3️⃣ Bit Rate (or Rate)

Definition:

The average number of bits per symbol (or sample, pixel, etc.) after compression.
Rate=Total Bits in Compressed FileNumber of Symbols\text{Rate} = \frac{\text{Total Bits
in Compressed File}}{\text{Number of
Symbols}}Rate=Number of SymbolsTotal Bits in Compressed File

Example:

• 256×256 image = 65,536 pixels


• Compressed file = 131,072 bits
• Bit Rate = 131,072 / 65,536 = 2 bits/pixel

Useful for comparing file sizes of same-resolution images or same-length videos/audio.

4️⃣ Distortion (only for lossy)

Definition:

Difference between the original and decompressed data.

Distortion=Original−Reconstructed\text{Distortion} = \text{Original} -
\text{Reconstructed}Distortion=Original−Reconstructed

Common Metrics:

Measure Description
MSE Mean Squared Error between original and output
PSNR Peak Signal-to-Noise Ratio (used in images/video)
SSIM Structural Similarity Index (perceptual quality)

Note: Distortion = 0 in lossless compression!

5️⃣ Fidelity / Quality Measures (subjective + objective)

Fidelity describes how close the reconstructed data is to the original, especially in perceptual
terms.

Categories:

• Objective Metrics: MSE, PSNR, SSIM, SNR


• Subjective Metrics: MOS (Mean Opinion Score — used in audio/video quality tests)

Example:

Two JPEG images of the same original photo:


• One has high PSNR = better quality.
• But if the image looks bad to the human eye, the PSNR isn't everything. That’s why
SSIM is often better.

6️⃣ Time & Memory Requirements

Factor Why it Matters


Compression Time How long it takes to compress data
Decompression Time Speed of reading/unpacking the data
Memory Usage RAM or buffer required during processing

E.g., some methods compress well but take too long (e.g., Arithmetic coding).

Summary Table

Metric Used In Purpose


Compression Ratio All Measures size reduction
Bit Rate All Measures per-symbol efficiency
Distortion Lossy only Measures signal degradation
PSNR / SSIM Lossy only Image/video quality evaluation
Time & Memory All Efficiency of algorithm (esp. real-time apps)

Ready to continue to Modeling and Coding (topic 4)? Or would you like example problems
based on these metrics?

4o

You said:

next
ChatGPT said:

Awesome! Let’s now go deep into the fourth topic from Module 1:

4. Modeling and Coding (In-Depth)


Compression isn’t just about “shrinking” — it’s about understanding patterns (modeling)
and then using smart binary representations (coding). Both are core steps in nearly every
compression algorithm.

A. What is Modeling?

Definition:

Modeling is the process of analyzing the structure or predictable patterns in data to reduce
redundancy.

Goal:

Create a model that represents the data efficiently so that only the unexpected or new parts
need to be encoded.

Types of Redundancy Modeled

Redundancy Type Example


Repetition "AAAABBB" → repeated characters
Predictability A number sequence like 1, 2, 3, 4...
Correlation Neighboring pixels in an image
Statistical Bias Letters like e occur more than z

Example of Modeling:

Sequence:
9 11 11 11 14 13 15 17 16 17 20 21

You can model this as a linear trend:

x^n=n+8\hat{x}_n = n + 8x^n=n+8

Then encode just the residuals (the difference between actual and predicted):

en=xn−x^ne_n = x_n - \hat{x}_nen=xn−x^n

If residuals are small (like -1, 0, 1), they can be encoded with fewer bits.

Types of Models
1. Static Models

• Don’t change during compression.


• Assume fixed probabilities for symbols.
• Good for predictable data (e.g., known text files).

2. Adaptive Models

• Update probabilities dynamically based on past data.


• Useful when symbol frequencies change over time (e.g., live speech or real-time
data).

3. Physical Models

• Use knowledge of how data is generated (used in lossy compression).


• Example: Speech synthesis via vocal tract simulation (used in vocoders).

4. Probability Models

• Assign probabilities to each symbol based on how often it appears.


• More frequent symbols → shorter codes (leads into entropy coding).

B. What is Coding?

Definition:

Coding is the process of converting the modeled data (or residuals) into an efficient binary
format using fewer bits.

Types of Coding Techniques

Method Idea Lossy/Lossless


Prefix Codes No code is a prefix of another (safe decoding) Lossless
Huffman Coding Short codes for frequent symbols (binary tree) Lossless
Arithmetic Coding Encodes the whole message as a fraction (0–1) Lossless
Run-Length Coding Encodes repeated symbols as (value, count) Lossless
Golomb Coding Efficient for geometric distributions Lossless
Transform Coding Converts data to frequency domain Lossy
Vector Quantization Maps vectors to codebook entries Lossy

Example of Coding after Modeling


Let’s say your model outputs:
0 0 1 0 1 1 1 0 1 0 0 0

You notice: 0 = 70%, 1 = 30%


Assign shorter code to 0 (e.g., 0) and longer code to 1 (e.g., 10)

Using Huffman coding, you now store the sequence using fewer bits than naïve binary.

Why Separate Modeling and Coding?

• Modeling reduces uncertainty (entropy) in the data.


• Coding turns that reduced uncertainty into fewer bits.

Together, they form a compression pipeline:

Raw Data→Modeling→Residual/Probability Output→Coding→Compressed Data\text{Raw


Data} \rightarrow \text{Modeling} \rightarrow \text{Residual/Probability Output}
\rightarrow \text{Coding} \rightarrow \text{Compressed
Data}Raw Data→Modeling→Residual/Probability Output→Coding→Compressed Data

Summary

Step What it Does Example


Modeling Identifies structure or pattern Predict next pixel
Coding Encodes efficiently in binary Huffman, Arithmetic coding

Next up: Mathematical Modeling for Lossless & Lossy Compression


Ready to continue?

4o

You said:

yes
ChatGPT said:

Awesome! Let’s now go deep into the fifth topic from Module 1:

5. Mathematical Modeling for Lossless & Lossy


Compression
This topic dives into how we use mathematics to represent data more compactly — the
foundation of both lossless and lossy compression.

Why Use Mathematical Models?

• Real-world data has structure — it’s not random.


• If we understand the structure, we can encode data using fewer bits.
• Models help us predict, approximate, or represent data efficiently.

There are two major approaches:

A. Mathematical Modeling for Lossless Compression


Goal:

Build a model that allows perfect reconstruction of the original data after decoding.

Common Techniques:

1. Statistical (Probability) Models

These assign probabilities to symbols based on frequency of occurrence.

• More frequent symbols → higher probability → shorter code


• Example: Huffman and Arithmetic coding use this model.

Entropy (H) is a measure of how much uncertainty or “surprise” is in the data:

H=−∑P(xi)log⁡2P(xi)H = - \sum P(x_i) \log_2 P(x_i)H=−∑P(xi)log2P(xi)

Example:

Text: aaaaaaaabbbc

Frequencies:

• a → 8 times
• b → 3 times
• c → 1 time

Use a probability model:


• a = 0.667 → assign short code (e.g., 1)
• c = 0.083 → assign long code (e.g., 01101)

This is the idea behind entropy coding.

2. Predictive Models (Lossless)

Use past values to predict current value:

xn≈f(xn−1,xn−2,...)x_n \approx f(x_{n-1}, x_{n-2}, ...)xn≈f(xn−1,xn−2,...)

Then store only the prediction error (residual):

en=xn−x^ne_n = x_n - \hat{x}_nen=xn−x^n

If residuals are small and repetitive → highly compressible!

Summary (Lossless Models):

Model Type Used In Key Benefit


Probability Model Huffman, Arithmetic Shorter codes for common data
Predictive Model PNG, TIFF, JBIG Exploits correlation

B. Mathematical Modeling for Lossy Compression


Goal:

Use a simplified model to represent the data approximately to get higher compression.

Here, some information is discarded, especially data that is perceptually unimportant.

Common Techniques:

1. Transform Models

Convert data into another domain (often frequency domain) to better reveal redundancies.

• DCT (Discrete Cosine Transform) – used in JPEG


• FFT (Fast Fourier Transform) – used in audio
• Wavelet Transform – used in JPEG 2000
The idea:

• Most of the energy is concentrated in a few coefficients.


• Keep only the important ones (high-energy), discard the rest.

2. Quantization

Rounding off values to reduce precision → reduces number of bits.

Q(x)=round(x/Δ)×ΔQ(x) = \text{round}(x / \Delta) \times \DeltaQ(x)=round(x/Δ)×Δ

• Introduces loss, but if chosen wisely, loss is imperceptible.


• The heart of lossy compression.

3. Perceptual Models

Use models of human perception to decide what to discard.

• Psychoacoustic models → in MP3, remove sounds humans can't hear


• Visual masking → in JPEG, discard details the eye won't notice

Summary (Lossy Models):

Model Type Used In Key Idea


Transform Model JPEG, MP3 Represent data in frequency space
Quantization JPEG, MPEG Reduce precision for compression
Perceptual Model MP3, H.264 Drop imperceptible data

Comparison: Lossless vs Lossy Modeling

Feature Lossless Lossy


Goal Perfect reconstruction Acceptable approximation
Method Prediction, probabilities Transforms, quantization, perception
Entropy-Based? Yes Sometimes used pre-quantization
Reversible Always Never fully reversible

Final Thought
The better the model captures the structure, the better the compression.

In lossless compression, the model predicts exactly.


In lossy compression, the model chooses what to keep, what to approximate, and what to
discard.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy