0% found this document useful (0 votes)
75 views48 pages

Lecture I: Data Compression Data Encoding: Efficient Information Encoding To

This document discusses techniques for data compression, error correction, and encoding information efficiently. It covers data transformations that decorrelate and predict data to encode only new information. These include delta coding, color space transformations, Fourier transforms, and Lempel-Ziv compression. It also discusses quantization, statistical modeling, and entropy coding techniques like Huffman coding and arithmetic coding that take advantage of the symbol probabilities. The goal is to reduce storage and transmission requirements while allowing fast decompression.

Uploaded by

Jun Zhao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views48 pages

Lecture I: Data Compression Data Encoding: Efficient Information Encoding To

This document discusses techniques for data compression, error correction, and encoding information efficiently. It covers data transformations that decorrelate and predict data to encode only new information. These include delta coding, color space transformations, Fourier transforms, and Lempel-Ziv compression. It also discusses quantization, statistical modeling, and entropy coding techniques like Huffman coding and arithmetic coding that take advantage of the symbol probabilities. The goal is to reduce storage and transmission requirements while allowing fast decompression.

Uploaded by

Jun Zhao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Old and new coding techniques for

data compression, error correction


and nonstandard situations
Lecture I: data compression … data encoding

Efficient information encoding to:


- reduce storage requirements,
- reduce transmission requirements,
- often reduce access time for storage and transmission
(decompression is usually faster than HDD, e.g. 500 vs 100MB/s)

Jarosław Duda
Nokia, Kraków
25. IV. 2016
1
Symbol frequencies from a version of the British
National Corpus containing 67 million words
( http://www.yorku.ca/mack/uist2011.html#f1 ):
Brute: 𝒍𝒈(𝟐𝟕) ≈ 𝟒. 𝟕𝟓 bits/symbol (𝑥 → 27𝑥 + 𝑠)
Huffman uses: 𝐻 ′ = ∑𝑖 𝑝𝑖 𝒓𝒊 ≈ 𝟒. 𝟏𝟐 bits/symbol
Shannon: 𝐻 = ∑𝑖 𝑝𝑖 𝐥𝐠(𝟏/𝒑𝒊 ) ≈ 𝟒. 𝟎𝟖 bits/symbol
We can theoretically improve by ≈ 𝟏% here
𝚫𝑯 ≈ 𝟎. 𝟎𝟒 bits/symbol
Order 1 Markov: ~ 3.3 bits/symbol (~gzip)
Order 2: ~3.1 bps, word: ~2.1 bps (~ZSTD)
Currently best compression: cmix-v9 (PAQ)
(http://mattmahoney.net/dc/text.html )
109 bytes of text from Wikipedia (enwik9)
into 123874398 bytes: ≈ 𝟏 bit/symbol
108 → 𝟏𝟓𝟔27636 106 → 𝟏𝟖𝟏476
Hilberg conjecture: 𝐻(𝑛) ~ 𝑛𝛽 , 𝛽 < 0.9

… lossy video compression: > 100× reduction


2
Lossless compression – fully reversible,
e.g. gzip, gif, png, JPEG-LS, FLAC, lossless mode in h.264, h.265 (~50%)
Lossy – better compression at cost of distortion
Stronger distortion – better compression (rate distortion theory, psychoacoustics)

General purpose (lossless, e.g. gzip, rar, lzma, bzip, zpaq, Brotli, ZSTD) vs
specialized compressors (e.g. audiovisual, text, DNA, numerical, mesh,…)

For example audio (http://www.bobulous.org.uk/misc/audioFormats.html ):


Wave: 100%, FLAC: 67%, Monkey’s: 63%, MP3: 18%

720p video - uncompressed: ~1.5 GB/min, lossy h.265: ~ 10MB/min

Lossless very data dependent, speed-ratio tradeoff


PDF: ~80%, RAWimage: 30-80%, binary: ~20-40%, txt: ~12-30%,
source code: ~20%, 10000URLs: ~20%, XML:~6%, xls: ~2%, mesh: <1%
3
General scheme for Data Compression … data encoding:
1) Transformations, predictions
To decorrelate data, make it more predictable, encode only new information
Delta coding, YCrCb, Fourier, Lempel-Ziv, Burrows-Wheeler, MTF, RLC …

2) If lossy, quantize coefficients (e.g. 𝑐 → round(𝑐/𝑄))


sequence of (context_ID, symbol)
3) Statistical modeling of final events/symbols
Static, parametrized, stored in headers, context-dependent, adaptive
4) (Entropy) coding: use 𝐥𝐠(𝟏/𝒑𝒔 ) bits, 𝑯 = ∑𝒔 𝒑𝒔 𝐥𝐠(𝟏/𝒑𝒔 ) bits total
Prefix/Huffman: fast/cheap, but inaccurate (𝑝 ~ 2−𝑟 )
Range/arithmetic: costly (multiplication), but accurate
Asymmetric Numeral Systems – fast and accurate
general (Apple LZFSE, Fb ZSTD), DNA (CRAM), games (LZNA, BitKnit), Google VP10, WebP
4
Part 1: Transformations, predictions, quantization
Delta coding: write relative values (differences) instead of absolute
e.g. time (1, 5, 21, 28, 34, 44, 60) → (1, 4, 16, 7, 6, 10, 6)
the differences often have some parametric probability distribution:
1 |𝑥−𝜇|
Laplacian distribution (geometric): Pr(𝑥) = exp (− )
2𝑏 𝑏
1 (𝑥−𝜇)2
Gaussian distribution (normal): Pr(𝑥) = exp (− 2 )
𝜎 √2𝜋 2𝜎
1
Pareto distribution: Pr(𝑥) ∝ 𝛼+1
𝑥

5
JPEG LS – simple prediction
https://en.wikipedia.org/wiki/Lossless_JPEG

- scan pixels line-by-line


- predict X (edge detection):

- find context determining coding parameters:


93 +1
(𝑫 − 𝑩, 𝑩 − 𝑪, 𝑪 − 𝑨) ( = 365 contexts)
2
- (entropy) coding of differences using Golomb codes
(good prefix code for Laplace distribution)
with parameter m accordingly to context
6
Color transformation to better exploit
spatial correlations

YCbCr:
Y – luminance/brightness
Cb, Cr – blue, red difference

Chroma subsampling
In lossy compression
(usually 4:2:0)

Color depth:
3x8 – true color
3x10,12,16 – deep color
7
Karhunen–Loève transform to decorrelate into independent variables
Often close to Fourier (DCT):
(energy preserving, frequency domain)
(stored quantized in JPEG)

Principal Component Analysis


8
Video: predict values (in blocks), encode only difference (residue)

intra-prediction (in a frame, choose and encode mode),


inter-prediction (between frames, use motion of objects)

9
DCT – fixed support, Wavelets – varying support, e.g. JPEG2000
Haar wavelets:

Progressive decoding: send low frequencies first, then further to improve quality
10
Lempel-Ziv (large family)
Replace repeats with
position and length
(decoding is just parsing)

https://github.com/inikep/lzbench
i5-4300U,
100MB total files (from W8.1):
method Compr.[MB/s] Decomp.[MB/s] Ratio
lz4fast r131 level 17 LZ 994 3172 74 %
lz4 r131 LZ 497 2492 62 %
lz5hc v1.4.1 level 15 LZ 2.29 724 44 %
Zlib (gzip) level 1 LZ + Huf 45 197 49 %
Zlib (gzip) level 9 LZ + Huf 7.5 216 45 %
ZStd v0.6.0 level 1 LZ + tANS 231 595 49 %
ZStd v0.6.0 level 22 LZ + tANS 2.25 446 37 %
Google Brotli level 0 LZ + o1 Huf 210 195 50 %
Google Brotli level 11 LZ + o1 Huf 0.25 170 36 %
11
Burrows – Wheeler Transform: sort lexicographically all cyclic shifts, take last column

Usually improves statistical properties for compression (e.g. text, DNA)


https://sites.google.com/site/powturbo/entropy-coder benchmark:
enwik9 (1GB) enwik9 + BWT Speed (BWT, MB/s)
O1 rANS 55.7 % 21.8 % 193/403
TurboANX (tANS) 63.5 % 24.3 % 492/874
Zlibh (Huffman) 63.8 % 28.5 % 307/336
12
Run-Length Encoding (RLE) – encode repeat lengths
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWW
WWWWWWWWWBWWWWWWWWWWWWWW → 12W1B12W3B24W1B14W

Move-to-front (MTF, Bentley 86) – move last symbol to front of alphabet

Often used with BWT:


mississipi → (BWT) → pssmipissii → (MTF) → 23033313010
“To be or not to be…” : 7807 bits of entropy (symbol-wise)
7033 bits after MTF, 6187 bits after BWT+MTF (7807 after BWT only)
13
Binning – most significant bits of floats often have characteristic distribution
Quantization, e.g. 𝑐 → round(𝑐/𝑄)
store only bin number
(rate distortion theory )
Vector quantization (many coordinates at once)
- (r, angle) for energy conservation (e.g. Daala )
- For more convenient representation

14
Part 2: (Entropy) coding – the heart of data compression

Transformations, predictions, quantization etc.


have lead to a sequence of symbols/event we finally need to encode

Optimal storage of symbol of probability 𝒑 uses 𝐥𝐠(𝟏/𝒑) bits


Using approximated probabilities means suboptimal compression ratio
We need to model these probabilities, then use (entropy) coder with them

For example unary coding: 𝟎 → 0, 𝟏 → 10, 𝟐 → 110, 𝟑 → 1110, 𝟒 → 11110, …


Is optimal if Pr(𝟎) = 0.5, Pr(𝟏) = 0.25, … , Pr(𝒌) = 2−𝑘−1 (geometric)

Encoding (𝑝𝑖 ) distribution with entropy coder optimal for (𝑞𝑖 ) distribution costs
𝑝 1 (𝑝𝑖 −𝑞𝑖 )2
Δ𝐻 = ∑𝑖 𝑝𝑖 lg(1/𝑞𝑖 ) − ∑𝑖 𝑝𝑖 lg(1/𝑝𝑖 ) = ∑𝑖 𝑝𝑖 lg ( 𝑖 ) ≈ ∑𝑖
𝑞𝑖 ln(4) 𝑝𝑖
more bits/symbol - so called Kullback-Leibler “distance” (not symmetric)
15
Imagine we want to store n values (e.g. hash) of m bits: directly 𝑚 ⋅ 𝑛 bits
Their order contains lg(𝑛!) ≈ lg((𝑛/𝑒)𝑛 ) = 𝑛 lg(𝑛) − 𝑛 lg(𝑒) bits of information
If order is not important, we could use only ≈ 𝑛(𝑚 − lg(𝑛) + 1.443)
It means halving storage for 178k of 32bit hashes, 11G of 64bit hashes

How to cheaply
approximate it?

Split the range (𝟐𝒎 positions) into n equal size buckets


- Use unary system to write number of hashes in each bucket: 2n bits
- Write suffixes inside buckets: lg(2𝑚 /𝑛) for each hash:
2𝑛 + 𝑛 ⋅ lg(2𝑚 /𝑛) = 𝑛(𝑚 − lg(𝑛) + 2) bits total

Overhead can be cheaply reduced to lg(3) ≈ 1.585 = 1.443 + 0.142 bits


By using 1 trit coding instead of unary (James Dow Allen):
“0”, ”1”, ”2 or more”,
then store prefixes: 𝑎1 < 𝑎2 <. . < 𝑎𝑘−2 < 𝑎𝑘 > 𝑎𝑘−1
16
We need 𝒏 bits of information to choose one of 𝟐𝒏 possibilities.
For length 𝑛 0/1 sequences with 𝑝𝑛 of “1”, how many bits we need to choose one?

𝑛! (𝑛/𝑒 )𝑛
(𝑝𝑛)! ((1−𝑝)𝑛)!
≈ (1−𝑝)𝑛 = 2𝑛 lg(𝑛)−𝑝𝑛 lg(𝑝𝑛)−(1−𝑝)𝑛 lg((1−𝑝)𝑛) = 2𝑛(−𝑝 lg(𝑝)−(1−𝑝)lg(1−𝑝))
(𝑝𝑛/𝑒 )𝑝𝑛 ((1−𝑝)𝑛/𝑒)

Entropy coder: encodes sequence with (𝑝𝑠 )𝑠=0..𝑚−1 probability distribution


using asymptotically at least 𝑯 = ∑𝒔 𝒑𝒔 𝐥𝐠(𝟏/𝒑𝒔 ) bits/symbol (𝐻 ≤ lg(𝑚))

Seen as weighted average: symbol of probability 𝒑 contains 𝐥𝐠(𝟏/𝒑) bits.


17
Statistical modelling – where the probabilities come from
- Frequency counts – e.g. for each block (~140B/30kB) and written in header,
- Assume a parametric distribution, estimate its parameter – for example
1 |𝑥−𝜇| 1 (𝑥−𝜇)2 1
Pr(𝑥) = exp (− ), Pr(𝑥) = exp (− ), Pr(𝑥) ∝
2𝑏 𝑏 𝜎 √2𝜋 2𝜎 2 𝑥 𝛼+1
- Context dependence, like neighboring pixels, similar can be grouped
e.g. order 1 Markov (context is the previous symbol): use Pr(𝑠𝑖 |𝑠𝑖−1 ),
- Prediction by partial matching (PPM) – varying number of previous
symbols as context (build and update suffix tree),
- Context mixing (CM, ultracompressors like PAQ) – combine predictions
from multiple contexts (e.g. PPM) using some machine learning.

Static or adaptive – modify probabilities (dependent on decoded symbols),


modify CDF toward CDF of recently processed data block
Enwik9: CM 12.4%, PPM 15.7%, BWT 16%, dict 16.5%, LZ+bin 19.3%, LZ+byt: 21.5%, LZ: 32%
Decoding: 100h cmix, 1h mons, 400s M03 , 15s glza , 55s LZMA , 2.2s ZSTD , 3.7s lz5
18
Adaptiveness – modify probabilities (dependent on decoded symbols), e.g.
for (int i = 1; i < m; i++) CDF[i] -= (CDF[i] - mixCDF[i]) >> rate;
where CDF[i] = ∑𝑗<𝑖 Pr(𝑗), mixCDF[] is CDF for recent data block

example: {0014044010001001404441000010010000000000000000000000
100000000000000000000000000000000000000000000000000}
{0,1,2,3,4} alphabet, length = 104, direct encoding: 104 ⋅ lg(5) ≈ 𝟐𝟒𝟏 bits
Static probability (stored?) : {89, 8, 0, 0, 7}/104 104 ⋅ ∑𝑖 𝑝𝑖 lg(1/𝑝𝑖 ) ≈ 77 bits
Symbol-wise adaptation starting from flat (uniform) or averaged initial distribution:

flat, rate = 3, −∑ lg(Pr) ≈ 𝟕𝟎 flat, rate = 4, −∑ lg(Pr) ≈ 𝟕𝟖 avg, rate = 3, −∑ lg(Pr) ≈ 𝟔𝟕


19
Some universal codings, upper bound doesn’t need to be known
unary coding: 𝟎 → 0, 𝟏 → 10, 𝟐 → 110, 𝟑 → 1110, 𝟒 → 11110, …
Is optimal if Pr(𝟎) = 0.5, Pr(𝟏) = 0.25, … , Pr(𝒙) = 2−𝑥−1 (geometric)

Golomb coding: choose parameter 𝑀 (preferably a power of 2).


Write ⌊𝒙/𝑴⌋ with unary coding, then directly 𝒙 mod 𝑴
Needs ⌊𝑥/𝑀⌋ + 1 + lg(𝑀) bits,
𝑥 −𝑥
2 −⌊ ⌋ 𝑀
optimal for ~geometric distribution: Pr(𝑥) ≈ ⋅2 𝑀 ≈ ( √2)
𝑀

Elias gamma coding: write 𝑁 = ⌊lg(𝑥)⌋ zeroes, then 𝑁 + 1 bits of 𝑥


Uses 2⌊lg(𝑥)⌋ + 1 bits – optimal for Pareto: Pr(𝑥) ≈ 2−2⌊lg(𝑥)⌋+1 ≈ 𝑥 −2
1
Directly writing 𝑥 costs ⌊lg(𝑥)⌋ + 1 bits → Pr(𝑥) ≈ , but length also costs
𝑥
1
Elias delta coding – lg(lg(𝑥)) with unary, optimal for Pr(𝑥) ≈
𝑥 lg2 (𝑥)
1
Elias omega coding – recursive, optimal for Pr(𝑥) ≈
𝑥⋅lg(𝑥)⋅lg(lg(𝑥))⋅…
20
symbol of probability 𝒑
contains 𝐥𝐠(𝟏/𝒑) bits of information
symbol → length 𝑟 bit sequence
assumes: Pr(symbol) ≈ 2−𝑟

Huffman coding: perfect prefix code


- Sort symbols by frequencies
- Replace two least frequent
with tree of them and so on

generally: prefix codes


defined by a tree
unique decoding:
no sequence is a prefix of another
http://en.wikipedia.org/wiki/Huffman_coding
𝐴 → 0, 𝐵 → 100, 𝐶 → 101, 𝐷 → 110, 𝐸 → 111

21
Encoding (𝑝𝑖 ) distribution with entropy coder optimal for (𝑞𝑖 ) distribution costs
𝑝𝑖 1 (𝑝𝑖 −𝑞𝑖 )2
Δ𝐻 = ∑𝑖 𝑝𝑖 lg(1/𝑞𝑖 ) − ∑𝑖 𝑝𝑖 lg(1/𝑝𝑖 ) = ∑𝑖 𝑝𝑖 lg ( ) ≈ ∑𝑖
𝑞𝑖 ln(4) 𝑝𝑖
more bits/symbol - so called Kullback-Leibler “distance” (not symmetric).

Huffman inaccurate, terrible for skewed distributions (Pr(a) >> 0.5)


e.g. Pr(a) = 0.99, Pr(b) = 0.01, H ~ 0.08 bits/symbol, Huffman: 𝑎 → 0, 𝑏 → 1

We can reduce Δ𝐻 by grouping 𝑚 symbols together (alphabet size 2𝑚 or 3𝑚 ):

Δ𝐻 ∝ ~1/𝑚 but cost grows like 2𝑚


22
23
Huffman vs ANS in compressors (LZ + entropy coder):
from Matt Mahoney benchmarks http://mattmahoney.net/dc/text.html
Enwiki8 encode time decode time
100,000,000B [ns/byte] [ns/byte]
ZSTD 0.6.0 –22 --ultra 25,405,601 701 2.2
Brotli (Google Feb 2016) –q11 w24 25,764,698 3400 5.9
LZA 0.82b –mx9 –b7 –h7 26,396,613 449 9.7
lzturbo 1.2 –39 –b24 26,915,461 582 2.8
WinRAR 5.00 –ma5 –m5 27,835,431 1004 31
WinRAR 5.00 –ma5 –m2 29,758,785 228 30
lzturbo 1.2 –32 30,979,376 19 2.7
zhuff 0.97 –c2 34,907,478 24 3.5
gzip 1.3.5 –9 36,445,248 101 17
pkzip 2.0.4 –ex 36,556,552 171 50
WinRAR 5.00 –ma5 –m1 40,565,268 54 31
ZSTD 0.4.2 –1 40,799,603 7.1 3.6
zhuff, ZSTD (Yann Collet, Facebook): LZ4 + tANS (switched from Huffman)
lzturbo (Hamid Bouzidi): LZ + tANS (switched from Huffman)
LZA (Nania Francesco): LZ + rANS (switched from range coding)
saving time and energy in extremely frequent task
24
Apple LZFSE =
Lempel-Ziv + Finite State Entropy
Default in iOS9 and OS X 10.11
“matching the compression ratio of ZLIB
level 5, but with much higher energy
efficiency and speed (between 2x and 3x)
for both encode and decode operation”

Finite State Entropy is


Yann Collet’s (known from e.g. LZ4)
implementation of tANS

Default DNA compression: CRAM 3.0 of European Bioinformatics Institute


Format Size Encoding(s) Decoding(s)
SAM 5 579 036 306 46 -
CRAM v2 (LZ77+Huff) 1 053 744 556 183 27
CRAM v3 (order 1 rANS) 869 500 447 75 31
25
gzip used everywhere … but it’s ancient … what’s next?
Google – first Zopfli (gzip compatible)
Now Brotli (incompatible, Huffman) – Firefox support, all news:
“Google’s new Brotli compressio algorithm is 26 percent better than existing
solutions” (22.09.2015) … now ZSTD (tANS, Yann Collet, Facebook):

general (Apple LZFSE, Fb ZSTD), DNA (CRAM), games (LZNA, BitKnit), Google VP10, WebP
26
Huffman coding (HC), prefix codes: example of Huffman penalty
most of everyday compression, for truncated 𝜌(1 − 𝜌)𝑛 distribution
e.g. zip, gzip, cab, jpeg, gif, png, tiff, pdf, mp3… (can’t use less than 1 bits/symbol)
Zlibh (the fastest generally available implementation):
Encoding ~ 320 MB/s (/core, 3GHz) 8/H zlibh FSE
Decoding ~ 300-350 MB/s Ratio →
𝜌 = 0.5 4.001 3.99 4.00
Range coding (RC): large alphabet arithmetic 𝜌 = 0.6 4.935 4.78 4.93
coding, needs multiplication, e.g. 7-Zip, VP Google 𝜌 = 0.7 6.344 5.58 6.33
video codecs (e.g. YouTube, Hangouts). 𝜌 = 0.8 8.851 6.38 8.84
Encoding ~ 100-150 MB/s tradeoffs 𝜌 = 0.9 15.31 7.18 15.29
Decoding ~ 80 MB/s 𝜌 = 0.95 26.41 7.58 26.38
𝜌 = 0.99 96.95 7.90 96.43
(binary) Arithmetic coding (AC):
H.264, H.265 video, ultracompressors e.g. PAQ Huffman: 1byte → at least 1 bit
Encoding/decoding ~ 20-30MB/s ratio ≤ 8 here
Asymmetric Numeral Systems (ANS) tabled (tANS) - without multiplication
FSE implementation of tANS: Encoding ~ 350 MB/s Decoding ~ 500 MB/s
RC → ANS: ~7x decoding speedup, no multiplication (switched e.g. in LZA compressor)
HC → ANS means better compression and ~ 1.5x decoding speedup (e.g. zhuff, lzturbo)
27
Operating on fractional number of bits

restricting range 𝑁 to length 𝐿 subrange number 𝑥


contains 𝐥𝐠(𝑵/𝑳) bits contains 𝐥𝐠(𝒙) bits
adding symbol of probability 𝑝 - containing 𝐥𝐠(𝟏/𝒑) bits
lg(𝑁/𝐿) + lg(1/𝑝) ≈ lg(𝐿/𝐿′ ) for 𝑳′ ≈ 𝒑 ⋅ 𝑳 lg(𝑥) + lg(1/𝑝) = lg(𝑥′) for 𝒙′ ≈ 𝒙/𝒑
28
Asymmetric numeral systems – redefine even/odd numbers:
symbol distribution: 𝑠: ℕ → 𝒜 (𝒜 – alphabet, e.g. {0,1})
( 𝑠(𝑥 ) = mod(𝑥, 𝑏) for base 𝑏 numeral system: 𝐶 (𝑠, 𝑥 ) = 𝑏𝑥 + 𝑠)
Should be still uniformly distributed – but with density 𝒑𝒔 :
# { 0 ≤ 𝑦 < 𝑥: 𝑠(𝑦) = 𝑠} ≈ 𝑥𝑝𝑠
then 𝒙 becomes 𝒙-th appearance of given symbol:
𝐶 (𝑠, 𝑥) = 𝑥 ′ ∶ 𝑠(𝑥 ′ ) = 𝑠, |{0 ≤ 𝑦 < 𝑥 ′ : 𝑠(𝑦) = 𝑠(𝑥 ′ )}| = 𝑥
𝐷 (𝑥′) = ( 𝑠(𝑥 ′ ), |{0 ≤ 𝑦 < 𝑥 ′ : 𝑠(𝑦) = 𝑠(𝑥 ′ )}|)
𝐶(𝐷(𝑥 ′ )) = 𝑥′ 𝐷(𝐶 (𝑠, 𝑥)) = (𝑠, 𝑥) 𝒙′ ≈ 𝒙/𝒑𝒔

𝑠(𝑥) = 0 if mod(𝑥, 4) = 0, else 𝑠(𝑥) = 1


29
example: range asymmetric binary systems (rABS)
Pr(0) = 1/4 Pr(1) = 3/4 - take base 4 system and merge 3 digits,
cyclic (0123) symbol distribution 𝑠 becomes cyclic (0111):

𝑠(𝑥) = 0 if mod(𝑥, 4) = 0, else 𝑠(𝑥) = 1


to decode or encode 1, localize quadruple (⌊𝑥/4⌋ or ⌊𝑥/3⌋)
if 𝑠(𝑥) = 0, 𝐷(𝑥) = (0, ⌊𝑥/4⌋) else 𝐷(𝑥) = (1,3⌊𝑥/4⌋ + mod(𝑥, 4) − 1)
𝐶 (0, 𝑥) = 4𝑥 𝐶 (1, 𝑥) = 4⌊𝑥/3⌋ + 1 + mod(𝑥, 3)

30
rANS - range variant for large alphabet 𝒜 = {0, . . , 𝑚 − 1}
assume Pr(𝑠) = 𝑓𝑠 /2𝑛 𝑐𝑠 : = 𝑓0 + 𝑓1 + ⋯ + 𝑓𝑠−1
start with base 2𝑛 numeral system and merge length 𝑓𝑠 ranges
𝑛
for 𝑥 ∈ {0,1, … , 2𝑛 − 1}, 𝒔(𝒙) = 𝐦𝐚𝐱{𝒔: 𝒄𝒔 ≤ 𝒙}, 𝑚𝑎𝑠𝑘 = 2 − 1
encoding: 𝐶 (𝑠, 𝑥 ) = ⌊𝑥/𝑓𝑠 ⌋ ≪ 𝑛 + mod(𝑥, 𝑓𝑠 ) + 𝑐𝑠
decoding: 𝑠 = 𝑠(𝑥 & 𝑚𝑎𝑠𝑘 ) (e.g. tabled, alias method)
𝐷 (𝑥 ) = (𝑠, 𝑓𝑠 ⋅ (𝑥 ≫ 𝑛) + ( 𝑥 & 𝑚𝑎𝑠𝑘 ) − 𝑐𝑠 )

Plus renormalization to make for example 𝑥 ∈ {216 , … , 232 − 1}, 𝑛 = 12 :


Decoding step (𝑚𝑎𝑠𝑘 = 2𝑛 − 1) Encoding step (msk = 216 - 1)
s = symbol[x & mask] s = readSymbol();
writeSymbol(s); if(x > bound[s])
x = f[s] (x >> n) + (x & mask) – c[s]; {write16bits(x & msk); x >>= 16; }
if(x < 216) x = x << 16 + read16bits(); x = (x / f[s]) << n + (x % f[s]) + c[s];
CRAM v3 (2015): order 1 rANS: 256 frequencies depending on previous symbol
31
Plus renormalization to make for example 𝑥 ∈ {216 , … , 232 − 1}, 𝑛 = 12 :
Decoding step (𝑚𝑎𝑠𝑘 = 2𝑛 − 1) Encoding step (msk = 216 - 1)
s = symbol(x & mask); s = readSymbol();
writeSymbol(s); if(x > bound[s])
x = f[s] (x >> n) + (x & mask) – CDF[s]; {write16bits(x & msk); x >>= 16; }
if(x < 216) x = x << 16 + read16bits(); x = (x / f[s]) << n + (x % f[s]) + CDF[s];
Similar to Range Coding, but decoding has 1 multiplication (instead of 2), for
determining symbol range is fixed, and state is 1 number (instead of 2), making
it convenient for SIMD vectorization ( https://github.com/rygorous/ryg_rans ).
Various ways to handle symbol(𝑦) = 𝑠 : 𝐶𝐷𝐹[𝑠] ≤ 𝑦 < 𝐶𝐷𝐹[𝑠 + 1] for 0 ≤ 𝑦 < 2𝑛 :
Tabled(symbol[𝑦]), search (binary or SIMD - CDF only!), or alias method:
‘Alias’ method: rearrange probability distribution into 𝑚 buckets:
containing the primary symbol and eventually a single ‘alias’ symbol

32
uABS - uniform binary variant (𝒜 = {0,1}) - extremely accurate
Assume binary alphabet, 𝒑 ≔ 𝐏𝐫(𝟏), denote 𝒙𝒔 = {𝒚 < 𝒙: 𝒔(𝒚) = 𝒔} ≈ 𝒙𝒑𝒔
For uniform symbol distribution we can choose:
𝒙𝟏 = ⌈𝒙𝒑⌉ 𝒙𝟎 = 𝒙 − 𝒙𝟏 = 𝒙 − ⌈𝒙𝒑⌉
𝑠(𝑥) = 1 if there is jump on next position: 𝑠 = 𝑠(𝑥) = ⌈(𝑥 + 1)𝑝⌉ − ⌈𝑥𝑝⌉
decoding function: 𝐷 (𝑥) = (𝑠, 𝑥𝑠 )
its inverse – coding function:
𝑥+1
𝐶 (0, 𝑥) = ⌈ ⌉ − 1
1−𝑝
𝑥
𝐶 (1, 𝑥) = ⌊ ⌋
𝑝

For 𝑝 = Pr(1) = 1 − Pr(0) = 0.3:

33
Stream version – renormalization
Up to now: we encode using succeeding 𝐶 functions into a huge number 𝑥,
then decode (in opposite direction!) using succeeding 𝐷.
Like in arithmetic coding, we need renormalization to limit working precision -
enforce 𝒙 ∈ 𝑰 = {𝑳, … , 𝒃𝑳 − 𝟏} by transferring base-𝒃 youngest digits:
ANS decoding step from state 𝑥 encoding step for symbol 𝑠 from state 𝑥
(𝑠, 𝑥) = 𝐷(𝑥); while 𝒙 ≥ 𝒎𝒂𝒙𝑿[𝒔] // = 𝑏𝐿𝑠
useSymbol(𝑠); {writeDigit(mod(𝒙, 𝒃)); 𝒙 = ⌊𝒙/𝒃⌋};
while 𝒙 < 𝑳, 𝒙 = 𝒃𝒙 + 𝐫𝐞𝐚𝐝𝐃𝐢𝐠𝐢𝐭(); 𝑥 = 𝐶(𝑠, 𝑥);
For unique decoding,
we need to ensure that there is a single way to perform above loops:
𝐼 = {𝐿, … , 𝑏𝐿 − 1}, 𝐼𝑠 = {𝐿𝑠 , … , 𝑏𝐿𝑠 − 1} where 𝐼𝑠 = {𝑥: 𝐶 (𝑠, 𝑥) ∈ 𝐼}

Fulfilled e.g. for


- rABS/rANS when 𝑝𝑠 = 𝑓𝑠 /2𝑛 has 1/𝐿 accuracy: 2𝑛 divides 𝐿,
- uABS when 𝑝 has 1/𝐿 accuracy: 𝑏⌈𝐿𝑝⌉ = ⌈𝑏𝐿𝑝⌉,
- in tabled variants (tABS/tANS) it will be fulfilled by construction.
34
uABS rABS arithmetic coding
xp = (uint64_t)x * p0; xf = x & mask; //32bit split = low + ((uint64_t)
out[i]=((xp & mask) >= p0); xn = p0 * (x >> 16); (hi - low) * p0 >> 16);
xp >>= 16; if (xf < p0) out[i] = (x > split);
x = out[i] ? xp : x - xp; { x = xn + xf; out[i] = 0 } if (out[i]) {low = split + 1}
else {x-=xn+p0;out[i] = 1 } else {hi = split;}

if (x < 0x10000) { x = (x << 16) | *ptr++; } if ((low ^ hi) < 0x10000)


{ x = (x << 16) | *ptr++;
low <<= 16;
//32bit x, 16bit renormalization, mask = 0xffff hi = (hi << 16) | mask }

35
RENORMALIZATION to prevent 𝑥 → ∞
Enforce 𝑥 ∈ 𝐼 = {𝐿, … ,2𝐿 − 1} by
transmitting lowest bits to bitstream
“buffer” 𝒙 contains 𝐥𝐠(𝒙) bits of information
– produces bits when accumulated
Symbol of 𝑃𝑟 = 𝑝 contains lg(1/𝑝) bits:
lg(𝑥 ) → lg(𝑥 ) + lg(1/𝑝) modulo 1

Tabled variant (tABS/tANS) – put everything into a table:

36
37
38
Single step of stream version:
to get 𝑥 ∈ 𝐼 to 𝐼𝑠 = {𝐿𝑠 , … , 𝑏𝐿𝑠 − 1}, we need to transfer 𝑘 digits:
𝑠
𝑥 → (𝐶(𝑠, ⌊𝑥/𝑏 𝑘 ⌋), mod(𝑥, 𝑏 𝑘 )) where 𝑘 = ⌊log 𝑏 (𝑥/𝐿𝑠 )⌋
𝑘 = 𝑘𝑠 or 𝑘 = 𝑘𝑠 − 1 for 𝒌𝒔 = −⌊𝐥𝐨𝐠 𝒃 (𝒑𝒔 )⌋ = −⌊log 𝑏 (𝐿𝑠 /𝐿)⌋
e. g.: 𝒑𝒔 = 𝟏𝟑/𝟔𝟔, 𝒃 = 𝟐, 𝑘𝑠 = 3, 𝐿𝑠 = 13, 𝐿 = 66, 𝑏 𝑘𝑠 𝑥 + 3 = 115, 𝑥 = 14:

Huffman: 𝑘 = −lg(𝑝𝑠 ) = 𝑘𝑠 , above lines are vertical


39
General picture:
encoder prepares before consuming succeeding symbol
decoder produces symbol, then consumes succeeding digits

Decoding is in opposite direction: we have stack of symbols (LIFO)


- encoding should be made in backward direction (buffer required),
- the final state has to be stored, but we can write information in initial state,
alternatively: fixing this state will make it a checksum – random after an error.
40
In single step (𝐼 = {𝐿, … , 𝑏𝐿 − 1}): 𝐥𝐠(𝒙) → ≈ 𝐥𝐠(𝒙) + 𝐥𝐠(𝟏/𝒑) modulo 𝐥𝐠(𝒃)
Three sources of unpredictability/chaosity:
1) Asymmetry: behavior strongly dependent on chosen
symbol – small difference changes decoded symbol
and so the entire behavior.
2) Ergodicity: usually log 𝑏 (1/𝑝) is irrational –
succeeding iterations cover entire range.
3) Diffusivity: 𝐶(𝑠, 𝑥) is close but not exactly 𝑥/𝑝𝑠 – there
is additional ‘diffusion’ around expected value

So 𝐥𝐠(𝒙) ∈ [lg(𝐿) , lg(𝐿) + lg(𝑏))


has nearly uniform distribution –
𝑥 has approximately:
𝐏𝐫(𝒙) ∝ 𝟏/𝒙
probability distribution – contains
lg(1/Pr(𝑥)) ≈ lg(𝑥) + const
bits of information.
41
What symbol spread should we choose? (link)
(using PRNG seeded with cryptkey for encryption)
for example: rate loss for 𝑝 = (0.04, 0.16, 0.16, 0.64)
𝐿 = 16 states, 𝑞 = (1, 3, 2, 10), 𝑞/𝐿 = (0.0625, 0.1875, 0.125, 0.625)
dH/H rate
method symbol spread
loss
- - ~0.011 penalty of quantizer itself
would give Huffman
Huffman 0011222233333333 ~0.080
decoder
spread_range_i() 0111223333333333 ~0.059 Increasing order
spread_range_d() 3333333333221110 ~0.022 Decreasing order
spread_fast() 0233233133133133 ~0.020 fast
spread_prec() 3313233103332133 ~0.015 close to quantization dH/H
better than quantization
spread_tuned() 3233321333313310 ~0.0046
dH/H due to using also 𝑝
spread_tuned_s() 2333312333313310 ~0.0040 L log L complexity (sort)
testing 1/(p ln(1+1/i)) ~ i/p
spread_tuned_p() 2331233330133331 ~0.0058
approximation
42
Precise initialization (heuresis)
0.5+𝑖
𝑁𝑠 = { : 𝑖 = 0, … , 𝐿𝑠 − 1}
𝑝𝑠
are uniform – we need to shift
them to natural numbers.
(priority queue with put, getmin)

for 𝑠 = 0 to 𝑛 − 1 do
put((0.5/𝑝𝑠 , 𝑠));
for 𝑋 = 0 to 𝐿 − 1 do
{(𝑣, 𝑠) = getmin;
put((𝑣 + 1/𝑝𝑠 , 𝑠));
𝑠𝑦𝑚𝑏𝑜𝑙 [𝑋] = 𝑠;
}

𝐚𝐩𝐩𝐫𝐨𝐱𝐢𝐦𝐚𝐭𝐞𝐥𝐲:
𝐚𝐥𝐩𝐡𝐚𝐛𝐞𝐭 𝐬𝐢𝐳𝐞 𝟐
𝚫𝑯 ∝ ( )
𝑳

43
Tuning: 𝑝𝑠 ≈ 𝑞𝑠 /𝐿. Can we “tune” spread of 𝑞𝑠 symbols accordingly to 𝑝𝑠 ?
Shift symbols right when 𝒑𝒔 < 𝒒𝒔 /𝑳, left otherwise
𝟏
Assume 𝐏𝐫(𝒙) ≈
𝒙 𝐥𝐧(𝟐)
𝑏
∑𝑥=𝑎… 𝑏 Pr(𝑥) ≈ lg ( )
𝑎
𝑠 appears 𝑞𝑠 times: 𝑖 ∈ 𝐼𝑠 = {𝑞𝑠 , … ,2𝑞𝑠 − 1}
𝑖+1
Pr(𝑖-th interval) ≈ lg ( )
𝑖
To fulfill the Pr(𝑥) assumption,
𝑥 for this interval should fulfill:
𝑖+1 1
lg ( ) ⋅ 𝑝𝑠 ≈
𝑖 𝑥 ln(2)
1
𝑥≈
𝑝𝑠 ln(1 + 1/𝑖)
is the preferred position for
𝑖 ∈ 𝐼𝑠 = {𝑞𝑠 , … ,2𝑞𝑠 − 1} appearance
of 𝑝𝑠 ≈ 𝑞𝑠 /𝐿 symbol
https://github.com/JarekDuda/AsymmetricNumeralSystemsToolkit
44
tABS
test all possible
symbol distributions
for binary alphabet

store tables for


quantized probabilities (p)
e.g. 16 ⋅ 16 = 256 bytes
← for 16 state, Δ𝐻 ≈ 0.001
H.264 “M decoder” (arith. coding) tABS
𝑡 = 𝑑𝑒𝑐𝑜𝑑𝑖𝑛𝑔𝑇𝑎𝑏𝑙𝑒[𝑝][𝑋];
𝑋 = 𝑡. 𝑛𝑒𝑤𝑋 + readBits(𝑡. 𝑛𝑏𝐵𝑖𝑡𝑠);
useSymbol(𝑡. 𝑠𝑦𝑚𝑏𝑜𝑙);

no branches,
no bit-by-bit renormalization
state is single number (e.g. SIMD)
45
Additional tANS advantage – simultaneous encryption
we can use huge freedom while initialization: choosing symbol distribution –
slightly disturb 𝒔(𝒙) using PRNG initialized with cryptographic key
ADVANTAGES comparing to standard (symmetric) cryptography:
standard, e.g. DES, AES ANS based cryptography (initialized)
based on XOR, permutations highly nonlinear operations
bit blocks fixed length pseudorandomly varying lengths
“brute force” just start decoding perform initialization first for new
or QC attacks to test cryptokey cryptokey, fixed to need e.g. 0.1s
speed online calculation most calculations while initialization
entropy operates on bits operates on any input distribution

46
tANS (2007) - fully tabled: Apple LZFSE, Facebook ZSTD, lzturbo
fast: no multiplication (FPGA!), less memory efficient (~8kB for 2048 states)
static in ~32kB blocks, costly to update (rather needs rebuilding),
allows for simultaneous encryption (PRNG to perturb symbol spread)
tANS decoding step Encoding step (for symbol s)
t = decodingTable[x]; nbBits = (x + nb[s]) >> r ;
writeSymbol(t.symbol); writeBits(x, nbBits);
x = t.newX + readBits(t.nbBits); x = encodingTable[start[s] + (x >> nbBits)];
rANS (2013) – needs multiplication – CRAM (DNA), VP10 (video), LZNA, BitKnit
more memory effective – especially for large alphabet and precision (CDF only)
better for adaptation (𝐶𝐷𝐹 [𝑠] ≤ 𝑦 < 𝐶𝐷𝐹[𝑠 + 1] - tabled, alias, binary search/SIMD)
rANS decoding step (𝑚𝑎𝑠𝑘 = 2𝑛 − 1) Encoding step (s) (msk = 216 - 1)
s = symbol(x & mask); writeSymbol(s); if(x > bound[s])
x = f[s] (x >> n) + (x & mask) – CDF[s]; {write16bits(x & msk); x >>= 16; }
if(x < 216) x = x << 16 + read16bits(); x = (x / f[s]) << n + (x % f[s]) + CDF[s];

i5-4300U: FSE/tANS: 295/467, rANS: 221/342, zlibh: 225/210 MB/s


47
General scheme for Data Compression … data encoding:
1) Transformations, predictions
To decorrelate data, make it more predictable, encode only new information
Delta coding, YCrCb, Fourier, Lempel-Ziv, Burrows-Wheeler, MTF, RLC …

2) If lossy, quantize coefficients (e.g. 𝑐 → round(𝑐/𝑄))


sequence of (context_ID, symbol)
3) Statistical modeling of final events/symbols
Static, parametrized, stored in headers, context-dependent, adaptive
5) (Entropy) coding: use 𝐥𝐠(𝟏/𝒑𝒔 ) bits, 𝑯 = ∑𝒔 𝒑𝒔 𝐥𝐠(𝟏/𝒑𝒔 ) bits total
Prefix/Huffman: fast/cheap, but inaccurate (𝑝 ~ 2−𝑟 )
Range/arithmetic: costly (multiplication), but accurate
Asymmetric Numeral Systems – fast and accurate
general (Apple LZFSE, Fb ZSTD), DNA (CRAM), games (LZNA, BitKnit), Google VP10, WebP
48

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy