Vector Quantization: April 2006
Vector Quantization: April 2006
net/publication/230309750
Vector Quantization
CITATION READS
1 5,292
2 authors:
All content following this page was uploaded by Omer N. Gerek on 24 October 2017.
1 Introduction
Changing the quantization dimension from one (for scalar) to multi (for vec-
tors) has many important implications. First of all, VQ does not necessarily
correspond to rounding of data values to coarse levels, any more. VQ stage
produces indices that represent the vector formed by grouping samples. The
output index, which is an integer, has little or no physical relation with the
vector it is representing, which is formed by grouping real or complex valued
samples. The word “quantization” in VQ comes from the fact that similar
vectors are represented by the same index. Therefore, many distinct vectors
on the multi–dimensional space are quantized to a single vector which is rep-
resented by the index. Each index corresponds to a previously decided vector.
In that aspect, the number of distinct indices defines the number of quanti-
zation levels. It is reasonable to argue that the quantization index of a data
vector should be selected according to the nearest vector in the set of pre-
viously decided vectors (which is called the VQ codebook). As an example,
if the considered vector x is nearest to an element of the codebook, say vi ,
then the VQ output is simply i. In the de-quantization stage, the index i is
reconstructed as the vector vi , so one can say that x is quantized to vi .
Assigning indices to a number of vectors has implications other than coding [1],
too. Since vectors near to vi are indexed as i and those near to vj are indexed
as j, this automatically provides the clustering information around codebook
vectors. Clustering of vectors is commonly used in solving classification prob-
lems [2]. Classification of data is a major element of pattern recognition. As a
result, many VQ algorithms that are developed for signal coding, have analo-
gous counterparts in the pattern classification and recognition literature. ISO-
DATA [3], k–Nearest Neighbor (k-NN) [4], and Self-organizing feature maps
(SOM) [5] are popular clustering methods which have algorithms very similar
to those for desining VQ codebooks, such as Max–Lloyd [6],[7] and Linde–
Buzo–Gray (LBG) [8] algorithms. Another commonly used VQ application is
“color reduction” for images. Many acquisition devices produce color images
allocating 8 bits to red, green and blue components of a pixel, respectively.
This makes a total of 24 bits/pixel. Due to display buffer limitations or stor-
age requirements, it is desirable to reduce the number of bits to assign to
each pixel. This is also done by vector quantizing RGB components to fewer
number of indices [9],[10].
This chapter is organized as follows. First the basic concepts of a vector quan-
tizer is presented. The issue of distortion and several metrics used in the design
and implementation of a VQ are presented here. Second, properties of mini-
mum distortion VQ and necessary equations for optimality are presented. In
the third section, the basic iteration that optimizes the codebook with a given
set of data is presented and several VQ codebook design techniques are intro-
duced. Finally, some typical VQ applications are presented at the end of the
chapter.
In this module, there are a number of distinct vectors (called the code vectors
or codewords, vi ) to form a set (called the codebook, C). The encoder module
searches the codebook for the nearest match to the input vector. If the ith
2
14 25 57 9
88 35 11 47
37 92 29 33
Encoder 5
54 7 63 78
ct
e
sel
vi
v N -1
vN
ENCODER
Fig. 2. Comparing the input vector to the code vectors in the codebook.
Normally, the term “vector quantizer” (Q) is used for the combination of the
encoder and decoder modules. In terms of mathematical notations,
i = E(x),
vi = D(i), and (1)
vi = Q(x) = D(E(x)).
3
v1
v2
i v3 vi
sel
ect
th
i
ce
du
pro
vi
v N -1
vN
DECODER
Fig. 3. Generating a code vector according to the input index at the decoder.
• the codebook size, N .
Q : Rk → C (2)
For coding purposes, the codebook is usually known by both the transmitter
(encoder) and receiver (decoder) parts. Therefore, only the integer output (the
index) of the encoder is transmitted.
On the other hand, the codebook itself represents a useful partitioning of the k
dimensional Euclidean space, Rk into N regions, Ri . Each region Ri is directly
defined by the quantizer in such a way that, if the encoder produces index i
for the input vector x, then x is in region Ri . These regions are also known as
Voronoi cells. The i-th cluster is, therefore, determined as the set of all vectors
in the data set clossest to the the vector vi . Mathematically;
Ri = {x ∈ Rk |Q(x) = vi } (3)
This means that Ri is the set of all points which are closer to vi than to
all other code vectors. A region can be bounded (granular cell) with finite
k-dimensional volume, or unbounded (overload cell).
4
then Q(x) = vi . Notice that this definition is very suitable when using VQ for
grouping or clustering purposes [11]. For these purposes, the encoder indices
immediately specify the cluster which the input belongs to.
We have been using the term “nearer to one of the code vectors than others” in
the encoder stage, since the beginning of the section. Therefore, what is meant
by “nearer” should be clarified. For most practical purposes, an Euclidean
distance between two vectors is used for measuring how near two vectors are.
On the other hand, we will see that several other distance measures can be
used for determining how near two vectors are (Chapter 2 of [12]). The only
constraint about the definition of the distance is that, it must be a proper
metric 1 . If the encoder and decoder uses a proper metric for measuring the
distance of the input vector to a code vector in C, then the each region Ri
must be convex 2 .
Figure 4 shows a 2-D VQ region partitioning. The regions in the central por-
tions are bounded cells, and the ones that extend out of the center (dashed
line) are unbounded cells. Notice that this partitioning is proper, in the sense
that the regions are all convex.
The final remark about the general structure of VQ is due to its ability to op-
timize compression performance for inputs that are grouped to form vectors.
1 A proper metric is the distance measure D(·, ·) for a metric space which satisfies
four properties for vectors a, b, and c:
nonnegativity: D(a, b) ≥ 0
reflexivity: D(a, b) = 0 ⇔ a = b
(4)
symmetry: D(a, b) = D(b, a)
triangle inequality: D(a, b) + D(b, c) ≥ D(a, c)
2 An Euclidean region is convex if the lines connecting any two points in the region
always lie inside the region, too. A more general definition for convex sets is; if α
and β are members of a convex set, then λα + (1 − λ)β is also a member of the set
for 0 ≤ λ ≤ 1
5
Shannon has shown that if we have a coding system which maps input vectors
into one of N indices with the best coding performance 3 , VQ can achieve as
good as the above “best” encoder [13]. The way to reach to this performance
is through the optimization of the regions and code vectors, which will be
described in the next section. Since the minimization is with respect to a dis-
tortion, several distortion metrics can be formulated, which all yield different
optimization results [14],[15]. Most commonly used metrics are:
• Minkowski metric:
k 1/L
L
dL (x, vi ) = |x(m) − vi (m)| (5)
m=1
k
dM (x, vi ) = |x(m) − vi (m)| (7)
m=1
• Hamming distance:
k
dH (x, vi ) = 1 − δx(m),vi (m) , (9)
m=1
where
⎧
⎪
⎨ 1, α = β
δα,β = ⎪
⎩ 0, α = β
• Mahalanobis distance:
dR (x, vi ) = (x − vi )T C−1
x (x − vi ) , (10)
6
Many other distortion metrics can be developed depending on the application
and usefulness. Using a distortion metric, d(·, ·), the overall VQ diagram can
be re-illustrated as in Figure 5.
Encoder C: Decoder C:
{v 1 ,v 2 , ... , v N } {v 1 ,v 2 , ... , v N }
3 Minimum Distortion VQ
In Section 2, it is pointed out that the index of the output vector is determined
according to the minimum distance rule. These kind of vector quantizers are
known as “nearest neighbor” quantizers. The “nearest neighbor” rule is re-
quired for the optimality of the encoder:
The justification of the nearest neighbor rule for encoder optimality is quite
simple; if a vector is quantized to a code vector which is not the nearest to
the input vector, then the distortion is increased. For that reason, the optimal
encoder must search the whole codebook for the smallest distance d(x, vi ):
N
d(x, Q(x)) = min d(x, vi ) (12)
i=1
7
This minimum distortion statistically minimizes the average expected distor-
tion:
The second optimality criterion is about the decoder part, satisfied by finding
the optimum codebook. In other words, if we are given the clustering regions,
we must find the best representing code vector for that region. Statistically,
the code vector vi in a region Ri must be selected in such a way that the
expected distortion it makes with any input vector x that lies inside region
Ri must be minimized:
vi = arg min
v
E {d(x, v)|x ∈ Ri } (14)
N
N
D= d(x, vi )fX (x)dx = Pi d(x, vi )fX|i (x)dx, (16)
i=1 R i=1
i
where Pi is the probability of x being in region Ri , and fX|i (x) is the condi-
tional pdf of x given x ∈ Ri . From the definition of the expected value:
8
Since the centroid minimizes the expected value at the left of Equation 17,
it also minimizes the integral at the right. Therefore, the summing term of
Equation 16, hence the distortion, is minimized.
RixfX (x)dx
vi = xfX|i (x)dx = . (18)
Ri Ri fX (x)dx
For the optimum VQ, encoder and decoder optimality criteria must be sat-
isfied simultaneously. For a given number of code vectors, say N , the goal is
to achieve the minimum distortion by selecting the code vectors, hence the
corresponding regions. The properties and equations were described in Sec-
tion 3. There are a number of methods proposed for achieving the optimal or
a sub–optimal quantizer from the given data. One of the most commonly used
technique is called the Generalized Lloyd Algorithm [6],[7], which improves the
codebook iteratively starting from an initial codebook. The scalar version of
this quantizer design is also used for scalar quantizer design. Furthermore,
this algorithm is commonly referred to as k-means [4] or ISODATA [3] in the
literature concerning clustering.
9
to a cluster. For other distance measures, centroid calculation differs 4 .
These two steps are iteratively performed until the overall distortion does
not reduce any more, or the amount of distortion improvement goes below
a certain threshold after an iteration. Notice that each iteration step must
reduce or keep the distortion level. In some cases, empty regions may occur.
In that case, a new code vector is assigned, or the codebook size N is reduced.
The Lloyd iteration is a very general method for optimization. However, there
are several more VQ design techniques, some of them relying on the Lloyd
iteration as intermediate steps. We will name a few of these methods and
indicate their basic ideas here.
(a) Random Coding: Over a whole set of data vectors, one choses N of the
vectors randomly, and assigns them as the code vectors. This is a very
empirical technique, however if the data is strongly correlated, it may
yield acceptable results.
(b) Pruning: In this case, the data vectors are sequentially appended to the
codebook list according to whether they are near enough to one of the
code vectors in the codebook, or not. If the new vector has a high distance
to each code vector, that new vector is added to the codebook [16].
(c) Pairwise Nearest Neighbor: The algorithm combines clusters which has
nearest centroids, and continues combining as long as the number of clus-
ters is more than the desired codebook size. Initially each data vector
forms its own cluster, and the clusters grow iteratively, having more data
vectors inside them [17]. The combination of clusters produce a different
centroid corresponding to the wighted average of the combined centroids.
Therefore, unlike the previous methods, the code vectors do not neces-
sarily correspond to data vectors.
(d) Product Codes: If the codebook size is represented as N = 2kR , then a
cartesian product of k scalar quantizers with 2R levels can be used as the
vector quantizer [18].
1
M
vEuc = xk
M
k=1
vM an (i) = {x|P (x(j) > x(i)) = P (x(j) < x(i))}
M M
vChe (i) = {min xj (i) + min xj (i)}/2
j=1 j=1
vHam (i) = {xk (i)|P (xk (i)) > P (xl (i)) ∀l}
M −1 T T
j=1 Cxj xi
vM ah =
C−1
xj
10
(e) Lloyd iteration with Stochastic Relaxation: A zero mean noise is added
to the centroids generated by each iteration in the Lloyd algorithm, and
the noise power is as the iterations proceed [19]. If the random noise is
generated according to a temperature parameter, Tm , which is decreased
as iteration number m proceeds, then this technique is also considered as
Simulated Annealing.
(f) Simulated Annealing: As a subset of the Stochastic Relaxation algorithm,
the noise is added to centroids (which is called a perturbation) and the
perturbed centroid is accepted with probability P = e−ΔH/T , where ΔH
is a cost which increases by the iteration number [20],[21].
(g) Fuzzy Clustering: Inclusion of a data vector inside a cluster is not as-
signed binary values 0 and 1. Instead a fuzzy membership (Sj (xi ): de-
gree of membership of xi in region Rj )value between 0 and 1 is as-
signed [22]. In this way, the membership of the data vector to the consid-
ered region is only partial, and a new fuzzy distortion definition is used:
N
DF = M1 M i=1
q
j=1 d(xi , vj )[Sj (xi )] . In the Lloyd iteration, the param-
eter q is initially selected as a large number (indicating high fuzzyness),
and decreased gradually down to 1.
(h) Linde-Buzo-Gray (splitting) Algorithm: Probably the most conventional
method that utilizes the Lloyd iteration is the Linde-Buzo-Gray (LBG)
algorithm [8]. In this case, The algorithm starts with a single code vector
(which is normally assigned to be the average of the data vectors. Then
the code vector is split into two by adding and subtracting a vector small
in magnitude, along the direction of maximum variation in the vector
space. With these two new vectors, the Lloyd iteration is applied and op-
timum code vectors with a codebook size 2 is obtained. LBG algorithm
iteratively splits each code vector into two by using the above perturba-
tion method, then applies Lloyd iteration again, until the desired number
of code vectors are obtained. This is a very convenient method to com-
pletely design the optimal codebook from the scratch without the risk of
obtaining empty or unbalanced clusters.
There are other variations on the quantizer design technique, too. For instance,
depending on the general structure of the input, it may be desirable to set up
a fixed structured quantizer. Lattice vector quantizers are popular for this
aspect, where the clusters are selected according to a geometrical grid, mostly
hexagonal.
Quantizer improvements are also studied in the literature. The most commonly
used improvements can be listed as:
11
• Gain-Shape VQ: If the input data shows significant dynamic range varia-
tions, the codebook needs to be very arge for a fairly small distortion. To
remedy this situaion, the input vectors can be first normalized and then vec-
tor quantized. The normalization factor needs to be encoded separately [1].
• Mean-Removed VQ: In many images, the vector segments may contain sim-
ilar shape charcteristics, but because of intensity variations, they may be
quite far from each other according to distance metrics. To quantize such
vectors into the same code vector would improve the efficiency. This is pos-
sible if the means of the vectors are subtracted from each, and the resulting
vectors are quantized. Similar to the above situation, the mean values must
be encoded separately.
• Classified VQ: If the input data contains multiple patterns that exhibit
large spatial differences from each other, while vectors generated from the
same pattern portion are quite similar, then designing quantizers separately
for each pattern, and applying the appropriate quantizer to the vector im-
proves the efficiency of the quantizer [23]. Usually, there is an overhead of
transmitting the infomation of which codebook the encoder will use.
• Multistage VQ: This method significantly reduces encoder complexity and
memory requirements [24]. The idea is to quantize the input coarsely at the
first stage, and then continue quantizing the difference between the signal
and the its coarsely quantized version, iteratively. As an example, if we have
three quantizers Q1 , Q1 , and Q1 , with an input x, then
y1 = Q1 (x)
y2 = Q2 (x − Q1 (x)) = Q2 (x − y1 )
y3 = Q3 (x − Q1 (x) − Q2 (x − Q1 (x))) = Q2 (x − y1 − y2 )
12
5 VQ Applications and Examples
5.1 Compression
The most widely used application of VQ is data compression [27]. The in-
put data can be compressed by a VQ at the expense of distortion. From an
information theoretical perspective, distortion and rate are two inversely pro-
portional quantities. If the compression is high, then distortion increases, but
rate decreases. For a given source signal, the rate/distortion curve typically
has a shape shown with dashed lines in Figure 6. This curve is obtained by
evaluating the minimum distortion achieved by the best encoder at a given
rate. On the other hand, the characteristics of a vector quantizer usually looks
like the solid line staircase–like shape. Using the optimization techniques de-
scribed in the previous section, it is desired that the solid lines touch the
minimum rate/distortion curve at the given rate.
Distortion
Rate
13
transform that is used in JPEG), Wavelet transforms, and input specific trans-
forms such as the Karhunen–Loeve Transform and Singular Value Decompo-
sition. The common point in most of these transforms is; they reduce the
correlation between the samples of its input vector, hence compacts most of
the energy of the vector to only a few of the output vector elements. The oper-
ation is reversible by the use of the inverse transforms. Another de–correlating
method is called predictive coding, where an element in a sequence is first tried
to be predicted using previous elements of the same list, and then only the
prediction difference is generated as the output. Using the same prediction
algorithm, and given the previously decoded elements, the decoder can re–
genrate the same prediction and add the prediction difference to reconstruct
the signal.
After either of these methods, the signal samples mostly contain small values,
which can be safely quantized to zero. As an example, in the JPEG image com-
pression standard an image is typically divided into 8 by 8 segments and DCT
of these segments are computed. This transfom causes a majority of the trans-
form signals to have values that will be quantized to zero. In order to better
understand the efficiency of transformation followed by quantization, consider
the example of an input vector x = [1.2, 1.1, 0.9, 0.8]. Assume that, in order
to achieve some compression, we want to quantize its elements by truncating
the samples to the greatest smaller integer ( • ). If we apply this quantization
without any transformation, the output vector would be x̃ = [1.0, 1.0, 0.0, 0.0],
1/2
and the distortion would be 14 4i=1 (x(i) − x̃(i))2 = 0.6124. Now, instead
of direct quantization, let us first apply DCT to the input signal, and obtain
a new vector c = DCT {x} = [2.0, 0.3, 0, 0]. Applying quantization over c, we
get c̃ = [2, 0, 0, 0]. Taking the inverse DCT, we obtain x̃ = [1.0, 1.0, 1.0, 1.0],
and the distortion is only 0.1581.
14
200
800
150 600
400
100
200
50
8 0 1
0 7 −200 2
1 1 3
6
2 2
5 4
3 3
4 4 4 5
5 3 5 6
6 6
2 7
7 7
8 1 8 8
(a) (b)
Fig. 7. (a) An 8 × 8 segment of an image, and (b) its DCT.
x[ n] Predictor u[ n] y[n] = x[ n]
x'[ n]
Predictor
(a) (b)
Fig. 8. DPCM (a) encoder, and (b) decoder.
15
(described in Section 4) are used as the design methods.
Case 1: 4 × 4 blocks: First, let us consider the sixteen 4 × 4 code vectors gener-
ated by the LBG algorithm (shown in Figure 10(a)). It is quite interesting to
see that all of the 4 × 4 sub–blocks of the original image could be represented
by one of the vectors in this codebook quite efficiently. Indeed, the total re-
construction distortion is only 17.83 (corresponding to a PSNR of 23.11 dB).
The image quantized by this codebook is shown in shown in Figure 11(a). Per-
haps, what is more interesting is, the codevectors generated by the Random
Coding algorithm (shown in Figure 10(b)) could also produce an acceptable
performance of 22.98 dB PSNR (shown in Figure 11(b)). Note that Random
Coding has significantly less computational complexity. For both cases, the
compression ratio is CR = (4 × 4 × 8) : (log2 16) = 32 : 1.
(a) (b)
Fig. 10. Sixteen 4×4 codevectors generated by (a) LBG algorithm, and (b) Random
Coding method.
Case 2: 8 × 8 blocks: Finally, the same compression ratio of 32:1 could also
16
(a) (b)
Fig. 11. Quantized images with N = 16 and block size of 4 × 4 using (a) LBG
algorithm, and (b) Random Coding method.
(a) (b)
Fig. 12. (a) 32 code vectors generated by the LBG algorithm, (b) Quantized image
using this codebook.
(a) (b)
Fig. 13. Thirtytwo 8 × 8 codevectors generated by (a) LBG algorithm, and (b)
Random Coding method.
17
(a) (b)
Fig. 14. Quantized images with N = 32 and block size of 8 × 8 using (a) LBG
algorithm, and (b) Random Coding method.
It can be seen that using a vector block size of 4 × 4 (Case 1) produces better
results than using 8 × 8 (Case 2) blocks at the same compression ratio. The
reason is, 4 × 4 blocks exhibit higher inter–pixel correlations than 8 × 8 blocks
do. Therefore, 4 × 4 code vectors represent the input vectors more efficiently.
18
Fig. 15. Three types of objects in an image.
Perimeter
xx x
x x ++
+
x:
x x + ++
+
+:
*
* * * *:
* * *
Area
Fig. 16. Area/Perimeter scatter of the objects, and their classification regions.
also indicates that VQ over higher dimensions provide more efficient feature
clustering. If both area (a) and perimeter (p) are used together to form the
input vector x = (a, p), then the scatter plot depicted in Figure 16 is obtained.
Running a simple 3-level VQ design produces the clustering drawn by the
dashed lines. Hence, an efficient classification is obtained.
xm,n ∈ Ω, (20)
where xm,n is an image pixel at an arbitrary location (m, n), and Ω is a set
defined as: Ω = {(r, g, b) 0 ≤ r, g, b ≤ 255}. In this representation, each image
pixel is represented by three primary colors: r, g, and b, which all have 8–bits.
In a normal color image, the human eye does not distinguish this many colors.
19
Therefore, there is a representation redundancy. Similarly, many computer
monitors (with frame buffers) or printers are not capable of reproducing 24–
bit colors. Due to these reasons, it is a desirable application to reduce the
number of colors from 224 to, for instance, 28 = 256 [9],[10].
0 0 0
0 0 1
0 0 2
0 0 3
0 0 255 0
0 1 0 1
0 1 1
0 1 2
i
Color reduction need not be applied on true color images, only. Sometimes,
even 256–level gray scale images can be color–reduced to 16 levels. Due to
producing fewer number of colors, color reduction must be made carefully
according to visual perception parameters. Usually, colormap images tend to
exhibit contouring effects around smooth regions (Fig. 19(b)). To produce im-
ages with better perceptual characteristics, the customary practice is dithering,
where a pseudo–random noise is added to the colormap image (Fig. 19(c)).
20
code vectors
0 0 0 12
1 0 23 36
i
i 0 2 7 color of pixel(m,n )
21
References
[5] T. Kohonen, Self Organization and Associative Memory, 3rd Ed., Springer-
Verlag, Berlin, 1989.
[8] Y. Linde, A. Buzo, R. M. Gray, “An algorithm for vector quantizer design,”
IEEE Trans. on Communications, Vol. 28, pp.84-95, January 1980.
[11] A. K. Jain, R. C. Dubes, Algorithms for clusterin data, Prentice-Hall Inc., NJ,
1988.
[12] R. M. Gray, Source Coding Theory, Kluwer Academic Press, Boston, MA,
USA, 1990.
22
[17] W. H. Equitz, “A new vector quantization clustering algorithm,” IEEE Trans.
on A.S.S.P., pp.1568-1575, October 1989.
[18] M. J. Sabin, R. M. Gray, “Product code vector quantizers for waveform and
voice coding,” IEEE Trans. on A.S.S.P., Vol. 32, pp.474-488, June 1984.
23
[32] J. D. Murray, W. vanRyper, Encyclopedia of Graphics File Formats, O’Reilly
and Associates, Inc., Sebastopol, CA, 1994.
24