Data Compression Techniques Module 1 Ktu
Data Compression Techniques Module 1 Ktu
Definition:
Data compression is the process of transforming information into a compact form so that it
requires fewer bits to represent.
It’s like summarizing a book: you keep the essential ideas and remove repetition or irrelevant
info.
1. Storage Efficiency
o A 4K video can take gigabytes of space. Compressing reduces size → more
files stored.
2. Faster Transmission
o Compressed files are faster to upload/download (especially critical for video
streaming, IoT devices, etc.).
3. Cost Savings
o Less bandwidth and storage = reduced costs for users and companies.
4. Real-Time Applications
o Voice calls (VoIP), video conferencing use compression to work with limited
bandwidth.
1. Modeling
2. Coding
Compression Trade-offs
These historical ideas evolved into modern compression techniques like JPEG, MP3, and
ZIP.
Summary Points
Would you like to go into real-world examples or visual explanations next? Or shall we
continue to the next topic: Lossless vs. Lossy Compression?
4o
You said:
next
ChatGPT said:
Awesome! Let’s now go deep into the second topic from Module 1:
A. Lossless Compression
Definition
Compression where no information is lost. After decompression, you get exactly the
original data.
Used for
How it works
Method Description
Huffman Coding Short codes for frequent symbols
Arithmetic Coding Probabilistic coding → single number encoding
LZW (ZIP, GIF) Dictionary-based compression
Run-Length Encoding (RLE) Compress repeated characters
Example:
Original: AAAAABBBCC
Compressed: 5A3B2C
B. Lossy Compression
Definition
Used for
• Images (JPEG)
• Videos (MPEG, H.264)
• Audio (MP3, AAC)
Why lossy?
Common Techniques
Method Description
Transform Coding Convert data to frequency domain (DCT, FFT)
Quantization Round off small values (loss of precision)
Psychoacoustic Modeling Remove inaudible audio details
Motion Compensation Reuse unchanged frames in video
Example: JPEG
Comparison Table
Real-World Analogy
Key Insight
Would you like to see real compressed vs. uncompressed examples, or should we continue
to the next topic: Measures of Performance?
4o
You said:
next
ChatGPT said:
Great! Let’s now go deep into the third topic from Module 1:
3. Measures of Performance (In-Depth)
Compression algorithms are not just about making files smaller — we need metrics to
evaluate how well they perform.
Here are the key performance measures used to compare compression methods:
Definition:
The ratio between the size of the original data and the compressed data.
Example:
• Original = 100 KB
• Compressed = 25 KB
• CR = 100 / 25 = 4:1 → File is 4x smaller
Definition:
The average number of bits per symbol (or sample, pixel, etc.) after compression.
Rate=Total Bits in Compressed FileNumber of Symbols\text{Rate} = \frac{\text{Total Bits
in Compressed File}}{\text{Number of
Symbols}}Rate=Number of SymbolsTotal Bits in Compressed File
Example:
Definition:
Distortion=Original−Reconstructed\text{Distortion} = \text{Original} -
\text{Reconstructed}Distortion=Original−Reconstructed
Common Metrics:
Measure Description
MSE Mean Squared Error between original and output
PSNR Peak Signal-to-Noise Ratio (used in images/video)
SSIM Structural Similarity Index (perceptual quality)
Fidelity describes how close the reconstructed data is to the original, especially in perceptual
terms.
Categories:
Example:
E.g., some methods compress well but take too long (e.g., Arithmetic coding).
Summary Table
Ready to continue to Modeling and Coding (topic 4)? Or would you like example problems
based on these metrics?
4o
You said:
next
ChatGPT said:
Awesome! Let’s now go deep into the fourth topic from Module 1:
A. What is Modeling?
Definition:
Modeling is the process of analyzing the structure or predictable patterns in data to reduce
redundancy.
Goal:
Create a model that represents the data efficiently so that only the unexpected or new parts
need to be encoded.
Example of Modeling:
Sequence:
9 11 11 11 14 13 15 17 16 17 20 21
x^n=n+8\hat{x}_n = n + 8x^n=n+8
Then encode just the residuals (the difference between actual and predicted):
If residuals are small (like -1, 0, 1), they can be encoded with fewer bits.
Types of Models
1. Static Models
2. Adaptive Models
3. Physical Models
4. Probability Models
B. What is Coding?
Definition:
Coding is the process of converting the modeled data (or residuals) into an efficient binary
format using fewer bits.
Using Huffman coding, you now store the sequence using fewer bits than naïve binary.
Summary
4o
You said:
yes
ChatGPT said:
Awesome! Let’s now go deep into the fifth topic from Module 1:
Build a model that allows perfect reconstruction of the original data after decoding.
Common Techniques:
Example:
Text: aaaaaaaabbbc
Frequencies:
• a → 8 times
• b → 3 times
• c → 1 time
Use a simplified model to represent the data approximately to get higher compression.
Common Techniques:
1. Transform Models
Convert data into another domain (often frequency domain) to better reveal redundancies.
2. Quantization
3. Perceptual Models
Final Thought
The better the model captures the structure, the better the compression.