0% found this document useful (0 votes)
44 views8 pages

Flickr Solution

This document presents a method called locally optimized product quantization (LOPQ) for approximate nearest neighbor search in high-dimensional spaces. LOPQ trains an individual product quantizer locally within each cell of a coarse quantizer, optimizing it for the local residual distribution rather than the global data distribution. This leads to lower distortion than methods that optimize globally. LOPQ requires modest additional space and time costs compared to product quantization, and it outperforms existing methods on several public datasets with billion-scale points.

Uploaded by

ram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views8 pages

Flickr Solution

This document presents a method called locally optimized product quantization (LOPQ) for approximate nearest neighbor search in high-dimensional spaces. LOPQ trains an individual product quantizer locally within each cell of a coarse quantizer, optimizing it for the local residual distribution rather than the global data distribution. This leads to lower distortion than methods that optimize globally. LOPQ requires modest additional space and time costs compared to product quantization, and it outperforms existing methods on several public datasets with billion-scale points.

Uploaded by

ram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Locally Optimized Product Quantization

for Approximate Nearest Neighbor Search

Yannis Kalantidis and Yannis Avrithis


National Technical University of Athens
{ykalant, iavr}@image.ntua.gr

Abstract

We present a simple vector quantizer that combines low


distortion with fast search and apply it to approximate near-
est neighbor (ANN) search in high dimensional spaces.
Leveraging the very same data structure that is used to pro-
vide non-exhaustive search, i.e., inverted lists or a multi-
index, the idea is to locally optimize an individual product (a) k-means (b) PQ
quantizer (PQ) per cell and use it to encode residuals. Lo-
cal optimization is over rotation and space decomposition;
interestingly, we apply a parametric solution that assumes
a normal distribution and is extremely fast to train. With
a reasonable space and time overhead that is constant in
the data size, we set a new state-of-the-art on several public
datasets, including a billion-scale one.

(c) OPQ (d) LOPQ


1. Introduction Figure 1. Four quantizers of 64 centroids ( ) each, trained on a
Approximate nearest neighbor (ANN) search in high- random set of 2D points ( ), following a mixture distribution. (c)
and (d) also reorder dimensions, which is not shown in 2D.
dimensional spaces is not only a recurring problem in com-
puter vision, but also undergoing significant progress. A
large body of methods maintain all data points in memory
and rely on efficient data structures to compute only a lim- seen as (lossy) data compression targeting minimal distor-
ited number of exact distances, that is ideally fixed [14]. At tion, with extreme examples being [1, 5].
the other extreme, mapping data points to compact binary As such, k-means, depicted in Fig. 1(a), is a vector quan-
codes is not only efficient in space but may achieve fast ex- tization method where by specifying k centroids, log2 k bits
haustive search in Hamming space [10, 16]. can represent an arbitrary data point in Rd for any dimen-
Product quantization (PQ) [12] is an alternative compact sion d; but naı̈ve search is O(dk) and low distortion means
encoding method that is discrete but not binary and can be very large k. By constraining centroids on an axis-aligned,
used for exhaustive or non-exhaustive search through in- m-dimensional grid, PQ achieves k m centroids keeping
verted indexing or multi-indexing [3]. As is true for most search at O(dk); but as illustrated in Fig. 1(b), many of
hashing methods [11], better fitting to the underlying distri- these centroids remain without data support e.g. if the dis-
bution is critical in search performance, and one such ap- tributions on m subspaces are not independent.
proach for PQ is optimized product quantization (OPQ) [9] OPQ allows the grid to undergo arbitrary rotation and re-
and its equivalent Cartesian k-means [15]. ordering of dimensions to better align to data and balance
How are such training methods beneficial? Different cri- their variance across subspaces to match bit allocation that
teria are applicable, but the underlying principle is that all is also balanced. But as shown in Fig. 1(c), a strongly multi-
bits allocated to data points should be used sparingly. Since modal distribution may not benefit from such alignment.
search can be made fast, such methods should be ultimately Our solution in this work is locally optimized product

1
quantization (LOPQ). Following a quite common search with a multi-index and is the current state-of-the-art on a
option of [12], a coarse quantizer is used to index data by in- billion-scale dataset, but all optimizations are still global.
verted lists, and residuals between data points and centroids We observe that OPQ performs significantly better when
are PQ-encoded. But within-cell distributions are largely the underlying distribution is unimodal, while residuals are
unimodal; hence, as in Fig. 1(d), we locally optimize an in- much more unimodal than original data. Hence we indepen-
dividual product quantizer per cell. Under no assumptions dently optimize per cell to distribute centroids mostly over
on the distribution, practically all centroids are supported underlying data, despite the constraints of a product quan-
by data, contributing to a lower distortion. tizer. In particular, we make the following contributions:
LOPQ requires reasonable space and time overhead 1. Partitioning data in cells, we locally optimize one prod-
compared to PQ, both for offline training/indexing, and on- uct quantizer per cell on the residual distribution.
line queries; but all overhead is constant in data size. It is 2. We show that training is practical since local distribu-
embarrassingly simple to apply and boosts performance on tions are easier to optimize via a simple OPQ variant.
several public datasets. A multi-index is essential for large 3. We provide solutions for either a single or a multi-
scale datasets and combining with LOPQ is less trivial, but index, fitting naturally to existing search frameworks
we provide a scalable solution nevertheless. for state-of-the-art performance with little overhead.
A recent related work is [7], applying a local PCA ro-
2. Related work and contribution tation per centroid prior to VLAD aggregation. However,
Focusing on large datasets where index space is the bottle- both our experiments and [9, 8] show that PCA without sub-
neck, we exclude e.g. tree-based methods like [14] that re- space allocation actually damages ANN performance.
quire all data uncompressed in memory. Binary encoding is
the most compact representation, approaching ANN search 3. Background
via exhaustive search in Hamming space. Methods like
Vector quantization. A quantizer is a function q that maps
spectral hashing [18], ITQ [10] or k-means hashing [11]
a d-dimensional vector x ∈ Rd to vector q(x) ∈ C, where C
focus on learning optimized codes on the underlying data
is a finite subset of Rd , of cardinality k. Each vector c ∈ C
distribution. Search in Hamming space is really fast but,
is called a centroid, and C a codebook. Given a finite set X
despite learning, performance suffers.
of data points in Rd , q induces distortion
Significant benefit is to be gained via multiple quantiz-
X
ers or hash tables as in LSH [6], at the cost of storing each E= kx − q(x)k2 . (1)
point index multiple times. For instance, [17, 19] gain per- x∈X
formance by multiple k-means quantizers via random re-
initializing or partitioning jointly trained centroids. Sim- According to Lloyd’s first condition, regardless of the cho-
ilarly, multi-index hashing [16] gains speed via multiple sen codebook, a quantizer that minimizes distortion should
hash tables on binary code substrings. We still outperform map vector x to its nearest centroid, or
all such approaches at only a fraction of index space. x 7→ q(x) = arg min kx − ck, (2)
c∈C
PQ [12] provides efficient vector quantization with less
distortion than binary encoding. Transform coding [4] is for x ∈ Rd . Hence, an optimal quantizer should minimize
a special case of scalar quantization that additionally allo- distortion E as a function of codebook C alone.
cates bits according to variance per dimension. OPQ [9] Product quantization. Assuming that dimension d is a
and Ck-means [15] generalize PQ by jointly optimizing ro- multiple of m, write any vector x ∈ Rd as a concatenation
tation, subspace decomposition and sub-quantizers. Inter- (x1 , . . . , xm ) of m sub-vectors, each of dimension d/m. If
estingly, the parametric solution of OPQ aims at the exact C 1 , . . . , C m are m sub-codebooks in subspace Rd/m , each
opposite of [4]: balancing variance given a uniform bit al- of k sub-centroids, a product quantizer [12] constrains C to
location over subspaces. the Cartesian product
Although [12] provides non-exhaustive variant IVFADC
based on a coarse quantizer and PQ-encoded residuals, [9, C = C1 × · · · × Cm, (3)
15] are exhaustive. The inverted multi-index [3] achieves
i.e., a codebook of k m centroids of the form c =
very fine space partitioning via one quantizer per subspace
(c1 , . . . , cm ) with each sub-centroid cj ∈ C j for j ∈ M =
and is compatible with PQ-encoding, gaining performance
{1, . . . , m}. An optimal product quantizer q should mini-
at query times comparable to Hamming space search. On
mize distortion E (1) as a function of C, subject to C being
the other hand, the idea of space decomposition can be ap-
of the form (3) [9]. In this case, for each x ∈ Rd , the nearest
plied recursively to provide extremely fast codebook train-
centroid in C is
ing and vector quantization [2].
The recent extension of OPQ [8] combines optimization q(x) = (q 1 (x1 ), . . . , q m (xm )), (4)
where q j (xj ) is the nearest sub-centroid of sub-vector xj in Rd/2 , each of K sub-centroids. A cell is now a pair of
C j , for j ∈ M [9]. Hence an optimal product quantizer q in sub-centroids. There are K 2 cells, which can be struc-
d dimensions incurs m subproblems of finding m optimal tured on a 2-dimensional grid, inducing a fine partition
sub-quantizers q j , j ∈ M, each in d/m dimensions. We over Rd . For each point x = (x1 , x2 ) ∈ X , sub-vectors
write q = (q 1 , . . . , q m ) in this case. x1 , x2 ∈ Rd/2 are separately (and exhaustively) quantized
Optimized product quantization [9],[15] refers to opti- to Q1 (x1 ), Q2 (x2 ), respectively. For each cell, an inverted
mizing the subspace decomposition apart from the cen- list of data points is again maintained.
troids. Constraint (3) of the codebook is now relaxed to Given a query vector y = (y1 , y2 ), the (squared) Eu-
clidean distances of each of sub-vectors y1 , y2 to all sub-
C = {Rĉ : ĉ ∈ C 1 × · · · × C m , RT R = I}, (5) centroids of Q1 , Q2 respectively are found first. The dis-
tance of y to a cell may then be found by a lookup-add
where orthogonal d × d matrix R allows for arbitrary ro-
operation, similarly to (6) for m = 2. Cells are traversed in
tation and permutation of vector components. Hence E
increasing order of distance to y by the multi-sequence al-
should be minimized as a function of C, subject to C be-
gorithm [3]—a form of distance propagation on the grid—
ing of the form (5). Optimization with respect to R and
until a target number T of points is collected. Different
C 1 , . . . , C m can be either joint as in Ck-means [15] and in
options exist for encoding residuals and re-ranking.
the non-parametric solution OPQNP of [9], or decoupled, as
in the parametric solution OPQ P of [9].
4. Locally optimized product quantization
Exhaustive search. Given a product quantizer q =
(q 1 , . . . , q m ), assume that each data point x ∈ X is rep- We investigate two solutions: ordinary inverted lists, and a
resented by q(x) and encoded as tuple (i1 , . . . , im ) of m second-order multi-index. Section 4.1 discusses LOPQ in
sub-centroid indices (4), each in index set K = {1, . . . , k}. the former case, which simply allocates data to cells and
This PQ-encoding requires m log2 k bits per point. locally optimizes a product quantizer per cell to encode
Given a new query vector y, the (squared) Euclidean dis- residuals. Optimization per cell is discussed in section 4.2,
tance to every point x ∈ X may be approximated by mostly following [9, 15]; the same process is used in sec-
m tion 4.4, discussing LOPQ in the multi-index case.
X
2
δq (y, x) = ky − q(x)k = kyj − q j (xj )k2 , (6)
4.1. Searching on a single index
j=1
Given a set X = {x1 , . . . , xn } of n data points in Rd ,
j j
where q (x ) ∈ C = j
{cj1 , . . . , cjk }
for j ∈ M. Distances we optimize a coarse quantizer Q, with associated code-
kyj − cji k2 are precomputed for i ∈ K and j ∈ M, so (6) book E = {e1 , . . . , eK } of K centroids, or cells. For
amounts to only O(m) lookup and add operations. This is i ∈ K = {1, . . . , K}, we construct an inverted list Li con-
the asymmetric distance computation (ADC) of [12]. taining indices of points quantized to cell ei ,
Indexing. When quantizing point x ∈ Rd by quantizer q,
its residual vector is defined as Li = {j ∈ N : Q(xj ) = ei } (8)

rq (x) = x − q(x). (7) where N = {1, . . . , n}, and collect their residuals in
Non-exhaustive search involves a coarse quantizer Q of Zi = {x − ei : x ∈ X , Q(x) = ei }. (9)
K centroids, or cells. Each point x ∈ X is quantized to
Q(x), and its residual vector rQ (x) is quantized by a prod- For each cell i ∈ K, we locally optimize PQ encoding of
uct quantizer q. For each cell, an inverted list of data points residuals in set Zi , as discussed in section 4.2, yielding an
is maintained, along with PQ-encoded residuals. orthogonal matrix Ri and a product quantizer qi . Residuals
A query point y is first quantized to its w nearest cells, are then locally rotated by ẑ ← RiT z for z ∈ Zi and PQ-
and approximate distances between residuals are then found encoded as qi (ẑ) = qi (RiT z).
according to (6) only within the corresponding w inverted At query time, the query point y is soft-assigned to its
lists. This is referred to as IVFADC search in [12]. w nearest cells A in E. For each cell ei ∈ A, residual
Re-ranking. Second-order residuals may be employed yi = y − ei is individually rotated by ŷi ← RiT yi . Asym-
along with ADC or IVFADC, again PQ-encoded by m0 sub- metric distances δqi (ŷi , ẑp ) to residuals ẑp for p ∈ Li are
quantizers. However, this requires full vector reconstruc- then computed according to (6), using the underlying local
tion, so is only used for re-ranking [13]. product quantizer qi . The computation is exhaustive within
Multi-indexing applies the idea of PQ to the coarse quan- list Li , but is performed in the compressed domain.
tizer used for indexing. A second-order inverted multi- Analysis. To illustrate the individual gain from the two op-
index [3] comprises two subspace quantizers Q1 , Q2 over timized quantities, we investigate optimizing rotation alone
with fixed sub-quantizers, as well as both rotation and sub- 1
quantizers, referred to as LOR+PQ and LOPQ, respectively.
0.8
In the latter case, there is an O(K(d2 +dk)) space overhead,
comparing e.g. to IVFADC [12]. Similarly, local rotation of

recall@R
0.6
the query residual imposes an O(wd2 ) time overhead. IVFADC
0.4 I-PCA+RP
4.2. Local optimization I-PCA
0.2 I-OPQ
Let Z ∈ {Z1 , . . . , ZK } be the set of residuals of data LOPQ
0
points quantized to some cell in E. Contrary to [12], we
100 101 102 103 104
PQ-encode these residuals by locally optimizing both space
R
decomposition and sub-quantizers per cell. Given m and k
as parameters, this problem is expressed as minimizing dis- Figure 2. Recall@R performance on SYNTH1M—recall@R is
tortion as a function of orthogonal matrix R ∈ Rd×d and defined in section 5.1. We use K = 1024 and w = 8 for all
sub-codebooks C 1 , . . . , C m ⊂ Rd/m per cell, methods; for all product quantizers, we use m = 8 and k = 256.
Curves for IVFADC, I-OPQ and I-PCA+RP coincide everywhere.
X
minimize min kz − Rĉk2
ĉ∈Ĉ
z∈Z
(10) of minimal variance, i.e., B ∗ ← B ∗ ∪ s with
subject to Cˆ = C 1 × · · · × C m Y
RT R = I, B ∗ = arg min λs , (12)
B∈B
|B|<d∗ s∈B
where |C j | = k for j ∈ M = {1, . . . , m}. Given solution until all buckets are full. Then, buckets determine a re-
R, C 1 , . . . , C m , codebook C is found by (5). For j ∈ M, ∗
ordering of dimensions: if vector bj ∈ Rd contains ele-
sub-codebook C j determines a sub-quantizer q j by ments of bucket B j (in any order) for j ∈ M and b =
(b1 , . . . , bm ), then vector b is read off as a permutation π
x 7→ q j (x) = arg min kx − ĉj k (11)
j j
ĉ ∈C of set {1, . . . , d}. If Pπ is the permutation matrix of π, then
matrix U PπT represents a re-ordering of eigenvectors of Σ
for x ∈ Rd/m , as in (2); collectively, sub-quantizers deter- and is the final solution for R. In other words, Z is first
mine a product quantizer q = (q 1 , . . . , q m ) by (4). Local PCA-aligned and then dimensions are grouped in subspaces
optimization can then be seen as a mapping Z 7→ (R, q). exactly as eigenvalues are allocated to buckets.
Following [9, 15], there are two solutions that we briefly Non-parametric solution (OPQNP [9] or Ck-means [15])
describe here, focusing more on OPQ P . is a variant of k-means, carried out in all m subspaces in
Parametric solution (OPQ P [9]) is the outcome of as- parallel, interlacing in each iteration its two traditional steps
suming a d-dimensional, zero-mean normal distribution assign and update with steps to rotate data and optimize R,
N (0, Σ) of residual data Z and minimizing the theoretical i.e., align centroids to data. OPQ P is extremely faster than
lower distortion bound as a function of R alone [9]. That is, OPQNP in practice. Because we locally optimize thousands
R is optimized independently prior to codebook optimiza- of quantizers, OPQNP training is impractical, so we only use
tion, which can follow by independent k-means per sub- it in one small experiment in section 5.2 and otherwise focus
space, exactly as in PQ. on OPQ P , which we refer to as I-OPQ in the sequel.
Given the d × d positive definite covariance matrix Σ,
4.3. Example
empirically measured on Z, the solution for R is found in
closed form, in two steps. First, rotating data by ẑ ← RT z To illustrate the benefit of local optimization, we experi-
for z ∈ Z should yield a block-diagonal covariance matrix ment on our synthetic dataset SYNTH1M, containing 1M
Σ̂, with the j-th diagonal block being sub-matrix Σ̂jj of j- 128-dimensional data points and 10K queries, generated by
th subspace, for j ∈ M. That is, subspace distributions taking 1000 samples from each of 1000 components of an
should be pairwise independent. This is accomplished e.g. anisotropic Gaussian mixture distribution. All methods are
by diagonalizing Σ as U ΛU T . non-exhaustive as in section 4.1, i.e. using a coarse quan-
Second, determinants |Σ̂jj | should be equal for j ∈ M, tizer, inverted lists and PQ-encoded residuals; however, all
i.e., variance should be balanced across subspaces. This is optimization variants are global except for LOPQ. For fair
achieved by eigenvalue allocation [9]. In particular, a set B comparison here and in section 5, I-OPQ is our own non-
of m buckets B j is initialized with B j = ∅, j ∈ M, each exhaustive adaptation of [9]. IVFADC (PQ) [12] uses natu-
of capacity d∗ = d/m. Eigenvalues in Λ are then traversed ral order of dimensions and no optimization.
in descending order, λ1 ≥ · · · ≥ λd . Each eigenvalue λs , Figure 2 shows results on ANN search. On this ex-
s = 1, . . . , d, is greedily allocated to the non-full bucket B ∗ tremely multi-modal distribution, I-OPQ fails to improve
over IVFADC. PCA-aligning all data and allocating dimen- in cell (e1i1 , e2i2 ) ∈ E is constrained to be block-diagonal
sions in decreasing order of eigenvalues is referred to as with blocks Ri11 , Ri22 , keeping rotations within-subspace.
I-PCA. This is even worse than natural order, because e.g. By contrast, OMulti-D-OADC [8] employs an arbitrary ro-
the largest d/m eigenvalues are allocated in a single sub- tation matrix that is however fixed for all cells.
space, contrary to the balancing objective of I-OPQ. Ran- Analysis. Comparing to Multi-D-ADC [3], the space over-
domly permuting dimensions after global PCA-alignment, head remains (asymptotically) the same as in section 4.1,
referred to as I-PCA+RP, alleviates this problem. LOPQ i.e., O(K(d2 + dk)). The query time overhead is O(Kd2 )
outperforms all methods by up to 30%. in the worst case, but much lower in practice.
4.4. Searching on a multi-index
5. Experiments
The case of a second-order multi-index is less trivial, as the
space overhead is prohibitive to locally optimize per cell as 5.1. Experimental setup
in section 4.1. Hence, we separately optimize per cell of the
two subspace quantizers and encode two sub-residuals. We Datasets. We conduct experiments on four publicly avail-
call this product optimization, or Multi-LOPQ. able datasets. Three of them are popular in state-of-the-art
Product optimization. Two subspace quantizers Q1 , Q2 ANN methods: SIFT1M, GIST1M [12] and SIFT1B [13]1 .
of K centroids each are built as in [3], with associated SIFT1M dataset contains 1 million 128-dimensional SIFT
codebooks E j = {ej1 , . . . , ejK } for j = 1, 2. Each data vectors and 10K query vectors; GIST1M contains 1 mil-
point x = (x1 , x2 ) ∈ X is quantized to cell Q(x) = lion 960-dimensional GIST vectors and 1000 query vectors;
(Q1 (x1 ), Q2 (x2 )). An inverted list Li1 i2 is kept for each SIFT1B contains 1 billion SIFT vectors and 10K queries.
cell (e1i1 , e2i2 ) on grid E = E 1 × E 2 , for i1 , i2 ∈ K. Given that LOPQ is effective on multi-modal distri-
At the same time, Q1 , Q2 are employed for residuals as butions, we further experiment on MNIST2 apart from
well, as in Multi-D-ADC [3]. That is, for each data point our synthetic dataset SYNTH1M discussed in section 4.3.
x = (x1 , x2 ) ∈ X , residuals xj − Qj (xj ) for j = 1, 2 MNIST contains 70K images of handwritten digits, each
are PQ-encoded. However, because the codebook induced represented as a 784-dimensional vector of raw pixel inten-
on Rd by Q1 , Q2 is extremely fine (K 2 cells on the grid), sities. As in [9, 8], we randomly sample 1000 vectors as
locally optimizing per cell is not an option—the total space queries and use the remaining as the data.
overhead e.g. would be O((d2 + dk)K 2 ). What we do is Evaluation. As in related literature [12, 9, 13, 3, 16, 15],
separately optimize per subspace: similarly to (9), let we measure search performance via the recall@R measure,
i.e. the proportion of queries having their nearest neighbor
Zij = {xj − eji : x ∈ X , Qj (xj ) = eji }. (13) ranked in the first R positions. Alternatively, recall@R is
the fraction of queries for which the nearest neighbor would
contain the residuals of points x ∈ X whose j-th sub-vector be correctly found if we verified the R top-ranking vectors
is quantized to cell eji for i ∈ K and j = 1, 2. We then using exact Euclidean distances. Recall@1 is the most im-
locally optimize each set Zij as in discussed section 4.2, portant, and is equivalent to the precision of [14].
yielding a rotation matrix Rij and a product quantizer qij .
Re-ranking. Following [13], second-order residuals can
Now, given a point x = (x1 , x2 ) ∈ X quantized to cell
be used for re-ranking along with LOPQ variants, but for
(ei1 , e2i2 ) ∈ E, its sub-residuals zj = xj − ejij are rotated
1
fair comparison we only apply it with a single index. This
and PQ-encoded as qijj (ẑj ) = qijj ((Rijj )T zj ) for j = 1, 2. new variant, LOPQ+R, locally optimizes second-order sub-
That is, encoding is separately adjusted per sub-centroid i1 quantizers per cell. However, rotation of second-order
(resp., i2 ) in the first (resp., second) subspace. residuals is only optimized globally; otherwise there would
Given a query y, rotations ŷijj = (Rijj )T (yj − ejij ) are be an additional query time overhead on top of [13].
lazy-evaluated for ij = 1, ..., K and j = 1, 2, i.e. com-
Settings. We always perform search in a non-exhaustive
puted on demand by multi-sequence and stored for re-use.
manner, either with a single or a multi-index. In all cases,
For each point index p fetched in cell (e1i1 , e2i2 ) ∈ E with
we use k = 256, i.e. 8 bits per sub-quantizer. Unless other-
associated residuals ẑjp for j = 1, 2, asymmetric distance
wise stated, we use 64-bit codes produced with m = 8. On
SIFT1B we also use 128-bit codes produced with m = 16,
kŷi11 − qi11 (ẑ1p )k2 + kŷi22 − qi22 (ẑ2p )k2 (14)
except when re-ranking, where m = m0 = 8 is used in-
is computed. Points are ranked according to this distance. stead, as in [13]. For all multi-index methods, T refers to
When considering the entire space Rd , this kind of opti- the target number of points fetched by multi-sequence.
mization is indeed local per cell, but more constrained than 1 http://corpus-texmex.irisa.fr/

in section 4.2. For instance, the effective rotation matrix 2 http://yann.lecun.com/exdb/mnist/


Compared methods (MNIST, SIFT1M, GIST1M). We 1
compare against three of the methods discussed in sec-
tion 4.3, all using a single index on a coarse quantizer and 0.8
PQ-encoded residuals, with any optimization being global.

recall@R
Method Ē
In particular, IVFADC [12], our I-PCA+RP, and our non- 0.6 IVFADC 70.1
I-PCA+RP 13.3
exhaustive adaptation of OPQ [9], using either OPQ P or
0.4 I − OPQ P 12.6
OPQNP global optimization. These non-exhaustive variants I − OPQNP 11.4
are not only faster, but also superior. OPQNP is too slow LOPQ 8.13
0.2
to train, so is only shown for MNIST; otherwise I-OPQ
refers to OPQ P . We do not consider transform coding [4] 100 101 102
or ITQ [10] since they are outperformed by I-OPQ in [9]. R

Compared methods (SIFT1B). After some experiments on Figure 3. Recall@R on MNIST with K = 64, found to be opti-
a single index comparing mainly to IVFADC and I-OPQ, mal, and w = 8. Ē = E/n: average distortion per point.
we focus on using a multi-index, comparing against Multi-
D-ADC [3] and its recent variant OMulti-D-OADC [8], cur-
rently the state-of-the-art. Both methods PQ-encode the 0.8
residuals of the subspace quantizers. Additionally, OMulti-

recall@R
D-OADC uses OPQNP to globally optimize both the ini-
0.6
tial data prior to multi-index construction and the residuals.
IVFADC
We also report results for IVFADC with re-ranking (IV-
I-PCA+RP
FADC+R) [13], Ck-means [15], KLSH-ADC [17], multi- 0.4
I-OPQ
index hashing (Multi-I-Hashing) [16], and the very recent LOPQ
joint inverted indexing (Joint-ADC) [19].
100 101 102 103 104
Implementation. Results followed by a citation are repro-
R
duced from the corresponding publication. For the rest we
use our own implementations in Matlab and C++ on a 8- Figure 4. Recall@R on SIFT1M with K = 1024, w = 8.
core machine with 64GB RAM. For k-means and exhaus-
tive nearest neighbor assignment we use yael3 .
0.8
5.2. Results on MNIST, SIFT1M, GIST1M
0.6
recall@R

MNIST is considered first. This is the only case where


we report results for OPQNP , since it is favored over OPQ P 0.4 IVFADC
in [9], and MNIST is small enough to allow for training. As I-PCA+RP
suggested in [9], we run 100 iterations for OPQNP using the 0.2 I-OPQ
implementation provided by the authors. LOPQ
Recall and distortion results are shown in Figure 3. Ob-
100 101 102 103 104
serve how the gain of OPQNP over OPQ P is very limited
R
now that global optimization is performed on the residuals.
This can be explained by the fact that residuals are expected Figure 5. Recall@R on GIST1M with K = 1024, w = 16.
to follow a rather unimodal distribution, hence closer to the
Gaussian assumption of OPQ P . The performance of our
simplified variant I-PCA+RP is very close to I-OPQ. Still, 8% over I-OPQ, which is close to the baseline. The gain is
separately optimizing the residual distribution of each cell lower for GIST1M but still 5% for R = 10. This is where I-
gives LOPQ a significant gain over all methods. OPQ improves most, in agreement with [9, 8], so LOPQ has
less to improve. This is attributed to GIST1M mostly being
SIFT1M and GIST1M results are shown in Figures 4 subject to one Gaussian distribution in [8]. I-PCA+RP is
and 5 respectively, only now OPQ is limited to OPQ P . always slightly below I-OPQ.
As in [9], we use the optimal dimension order for each
Figures 6 and 7 plot recall@10 versus bit allocation per
dataset for baseline method IVFADC [12], i.e. natural (resp.
point (through varying m) and soft assignment neighbor-
structured) order for SIFT1M (resp. GIST1M). In both
hood w, respectively. LOPQ enjoys superior performance
cases, LOPQ clearly outperforms all globally optimized ap-
in all cases, with the gain increasing with lower bit rates
proaches. For SIFT1M its gain at R = 1, 10 is more than
and more soft assignment. The latter suggests more precise
3 https://gforge.inria.fr/projects/yael distance measurements, hence lower distortion.
T Method R=1 10 100
0.8
Multi-I-Hashing [16] – – 0.420
recall@10

0.6 KLSH-ADC [17] – – 0.894


IVFADC
Joint-ADC [19] – – 0.938
I-PCA+RP
0.4 IVFADC+R [13] 0.262 0.701 0.962
I-OPQ 20K
LOPQ LOPQ+R 0.350 0.820 0.978
0.2
Multi-D-ADC [3] 0.304 0.665 0.740
16 32 64 128
10K OMulti-D-OADC [8] 0.345 0.725 0.794
bits
Multi-LOPQ 0.430 0.761 0.782
Figure 6. Recall@10 on SIFT1M versus bit allocation per point,
Multi-D-ADC [3] 0.328 0.757 0.885
with K = 1024 and w = 8. For 16, 32, 64 and 128 bits, m is
30K OMulti-D-OADC [8] 0.366 0.807 0.913
respectively 2, 4, 8 and 16.
Multi-LOPQ 0.463 0.865 0.905
Multi-D-ADC [3] 0.334 0.793 0.959
0.8 100K OMulti-D-OADC [8] 0.373 0.841 0.973
Multi-LOPQ 0.476 0.919 0.973
recall@10

0.7
IVFADC
0.6 Table 2. Recall@{1, 10, 100} on SIFT1B with 128-bit codes and
I-PCA+RP
0.5 I-OPQ
K = 213 = 8192 (resp. K = 214 ) for single index (resp. multi-
LOPQ index). For IVFADC+R and LOPQ+R, m0 = 8, w = 64. Results
0.4 for Joint-ADC and KLSH-ADC are taken from [19]. Rows includ-
1 2 4 8 16 32 64 ing citations reproduce authors’ results.
w

Figure 7. Recall@10 on SIFT1M versus w, with K = 1024 and


m = 8. 128-bit code results are presented in Table 2 and Figure 8,
with our solutions including a single index with re-ranking
Method R=1 R = 10 R = 100 (LOPQ+R) and a multi-index (Multi-LOPQ). Of the re-
Ck-means [15] – – 0.649 ranking methods, LOPQ+R has a clear advantage over IV-
IVFADC 0.106 0.379 0.748 FADC+R, where we adopt m = m0 = 8 since this option
IVFADC [13] 0.088 0.372 0.733 is shown to be superior in [13]. All remaining methods use
I-OPQ 0.114 0.399 0.777 m = 16. Multi-I-Hashing [16], KLSH-ADC [16] and Joint-
Multi-D-ADC [3] 0.165 0.517 0.860 ADC [19] are all inferior at R = 100, although the latter
two require 4 times more space.
LOR+PQ 0.183 0.565 0.889
The current state-of-the-art results come with the use of
LOPQ 0.199 0.586 0.909
a multi-index, also boasting lower query times. The recent
Table 1. Recall@{1, 10, 100} on SIFT1B with 64-bit codes, K = highly optimized OMulti-D-OADC [8] outperforms Multi-
213 = 8192 and w = 64. For Multi-D-ADC, K = 214 and D-ADC [3]. However, the performance of our product
T = 100K. Rows including citations reproduce authors’ results. optimization Multi-LOPQ is unprecedented, setting a new
state-of-the-art on SIFT1B at 128-bit codes and enjoying
nearly 10% gain over OMulti-D-OADC on the most impor-
5.3. Results on SIFT1B tant measure of precision (recall@1). Multi-index cells are
very fine, hence residuals are lower and local optimization
64-bit code (m = 8) results are shown in Table 1, includ- yields lower distortion, although constrained.
ing I-OPQ, Ck-means [15], Multi-D-ADC [3] and IVFADC
without re-ranking, since re-ranking does not improve per- 5.4. Overhead analysis
formance at this bit rate [13]. All methods are using a single
index except Multi-D-ADC that uses a multi-index and Ck- Both space and time overhead is constant in data size n.
means that is exhaustive. For IVFADC we both reproduce Space overhead on top of IVFADC (resp. Multi-D-ADC)
results of [13] and report on our re-implementation. To il- refers to local rotation matrices and sub-quantizer cen-
lustrate the individual gain from locally optimized rotation troids per cell. For rotation matrices, this is Kd2 (resp.
and sub-quantizers, we also include our sub-optimal variant 2K(d/2)2 ) for single index (resp. multi-index). In prac-
LOR+PQ as discussed in section 4.1. Both LOR+PQ and tice, this overhead is around 500MB on SIFT1B. For sub-
LOPQ are clearly superior to all methods, with a gain of quantizer centroids, overhead is Kdk in all cases. In prac-
18% over I-OPQ and 7% over Multi-D-ADC for recall@10, tice, this is 2GB on SIFT1B for Multi-LOPQ with K = 214 .
although the latter is using a multi-index. Given that the index space for SIFT1B with 128-bit codes
1 as in [1], while at the same time being able to search non-
exhaustively without reducing dimensionality.
0.8
References
recall@R

[1] R. Arandjelovic and A. Zisserman. Extremely low bit-rate


0.6 Multi-D-ADC [3] nearest neighbor search using a set compression tree. Tech-
OMulti-D-OADC [9] nical report, 2013. 1, 8
Joint-ADC [19] [2] Y. Avrithis. Quantize and conquer: A dimensionality-
0.4 KLSH-ADC [17] recursive solution to clustering, vector quantization, and im-
Multi-LOPQ
age retrieval. In ICCV. 2013. 2
[3] A. Babenko and V. Lempitsky. The inverted multi-index. In
100 101 102 103 104
CVPR, 2012. 1, 2, 3, 5, 6, 7, 8
R
[4] J. Brandt. Transform coding for fast approximate nearest
Figure 8. Recall@R on SIFT1B with 128-bit codes and T = neighbor search in high dimensions. In CVPR, 2010. 2, 6
100K, following Table 2. [5] V. Chandrasekhar, Y. Reznik, G. Takacs, D. M. Chen, S. S.
Tsai, R. Grzeszczuk, and B. Girod. Compressing feature sets
with digital search trees. In ICCV Workshops. IEEE, 2011. 1
is 21GB in the worst case, this overhead is reasonable.
[6] M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality-
Query time overhead on top of IVFADC (resp Multi-D- sensitive hashing scheme based on p-stable distributions. In
ADC) is the time needed to rotate the query for each soft- Symposium on Computational Geometry, pages 253–262.
assigned cell, which amounts to w (2K worst-case) multi- ACM New York, NY, USA, 2004. 2
plications of a d × d ( d2 × d2 ) matrix and a d ( d2 )-dimensional [7] J. Delhumeau, P. Gosselin, H. Jegou, and P. Perez. Revisiting
vector for single index (resp. multi-index). In practice, the the VLAD image representation. In ACM Multimedia, Oct
multiplications for multi-index are far less. The average 2013. 2
overhead on SIFT1B as measured for Multi-LOPQ is 0.776, [8] T. Ge, K. He, Q. Ke, and J. Sun. Optimized product quanti-
zation. Technical report, 2013. 2, 5, 6, 7
1.92, 4.04ms respectively for T = 10K, 30K, 100K.
[9] T. Ge, K. He, Q. Ke, and J. Sun. Optimized product quan-
Multi-D-ADC takes 49ms for T = 100K [3], bringing
tization for approximate nearest neighbor search. In CVPR,
total query time to 53ms. Or, with 7ms for T = 10K in [3], 2013. 1, 2, 3, 4, 5, 6, 8
Multi-LOPQ outperforms by 5% the previous state-of-the- [10] Y. Gong and S. Lazebnik. Iterative quantization: A pro-
art 128-bit precision on SIFT1B in less than 8ms. crustean approach to learning binary codes. In CVPR, 2011.
1, 2, 6
6. Discussion [11] K. He, F. Wen, and J. Sun. K-means hashing: an affinity-
preserving quantization method for learning binary compact
Beneath LOPQ lies the very simple idea that no single cen- codes. In CVPR, 2013. 1, 2
troid should be wasted by not representing actual data, but [12] H. Jégou, M. Douze, and C. Schmid. Product quantization
rather each centroid should contribute to lowering distor- for nearest neighbor search. PAMI, 33(1), 2011. 1, 2, 3, 4, 5,
tion. Hence, to take advantage of PQ, one should attempt to 6, 8
use and optimize product quantizers over parts of the data [13] H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. Search-
only. This idea fits naturally with a number of recent ad- ing in one billion vectors: Re-rank with source coding. In
vances, boosting large scale ANN search beyond the state- ICASSP, 2011. 3, 5, 6, 7
of-the-art without significant cost. [14] M. Muja and D. Lowe. Fast approximate nearest neighbors
It is straightforward to use LOPQ exhaustively as well, with automatic algorithm configuration. In ICCV, 2009. 1,
by visiting all cells. This of course requires computing 2, 5
[15] M. Norouzi and D. Fleet. Cartesian k-means. In CVPR,
K (for LOPQ) or 2K (for Multi-LOPQ) lookup tables
2013. 1, 2, 3, 4, 5, 6, 7
and rotation matrices instead of just one (e.g. for OPQ). [16] M. Norouzi, A. Punjani, and D. J. Fleet. Fast search in Ham-
However, given the superior performance of residual-based ming space with multi-index hashing. In CVPR, 2012. 1, 2,
schemes [3, 12], this overhead may still be acceptable. For 5, 6, 7
large scale, exhaustive search is not an option anyway. [17] L. Paulevé, H. Jégou, and L. Amsaleg. Locality sensitive
LOPQ resembles a two-stage fitting of a mixture distri- hashing: a comparison of hash function types and query-
bution: component means followed by conditional densi- ing mechanisms. Pattern Recognition Letters, 31(11):1348–
ties via PQ. Joint optimization of coarse and local quantiz- 1358, Aug. 2010. 2, 6, 7, 8
ers would then seem like a possible next step, but such an [18] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In
attempt still eludes us due to the prohibitive training cost. NIPS, 2008. 2
It would also make sense to investigate the connection to [19] Y. Xia, K. He, F. Wen, and J. Sun. Joint inverted indexing.
In ICCV, 2013. 2, 6, 7, 8
tree-based methods to ultimately compress sets of points

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy