A62 Vocabulary Tree
A62 Vocabulary Tree
tree
Vocabulary
tree
•
Recogni1on
can
scale
to
very
large
databases
using
the
Vocabulary
Tree
indexing
approach
[Nistér
and
Stewénius,
CVPR
2006].
Vocabulary
Tree
performs
instance
object
recogni1on.
It
does
not
perform
recogni1on
at
category-‐level.
•
Vocabulary
Tree
method
follows
three
steps:
1. organize
local
descriptors
of
images
a
in
a
tree
using
hierarchical
k-‐means
clustering.
Inverted
files
are
stored
at
each
node
with
scores
(offline).
2. generate
a
score
for
a
given
query
image
based
on
Term
Frequency
–
Inverse
Document
Frequency.
3.
find
the
images
in
the
database
that
best
match
that
score.
•
Vocabulary
tree supports
very
efficient
retrieval.
It
only
cares
about
the
distance
between
a
query
feature
and
each
node.
2
Building
the
Vocabulary
Tree
• The
vocabulary
tree
is
a
hierarchical
set
of
cluster
centers
and
their
corresponding
Voronoi
regions:
− For
each
image
in
the
database,
extract
MSER
regions
and
calculate
a
set
of
feature
point
descriptors
(e.g.
128
SIFT).
− Build
the
vocabulary
tree
using
hierarchical
k-‐means
clustering:
• run k-‐means
recursively
on
each
of
the
resul1ng
quan1za1on
cells
up
to
a
max
number
of
levels
L
(L=6 max
suggested)
• nodes
are
the
centroids;
leaves
are
the
visual
words.
• k defines
the
branch-‐factor
of
the
tree,
which
indicates
how
fast
the
tree
branches
(k=10
max
suggested)
3
• A
large
number
Scalable of
ellip1cal
Recognition with raegions
Vocabularyare
extracted
Tree from
the
image
and
warped
to
canonical
posi1ons.
A
descriptor
vector
is
computed
for
each
region.
The
descriptor
vector
is
then
hierarchically
DavidqNistér
uan1zed
by
the
and Henrik vocabulary
tree.
Stewénius
Center for Visualization and Virtual Environments
Department of Computer Science, University of Kentucky
• With
each
node
in
the
vocabulary
http://www.vis.uky.edu/∼dnister/
tree
there
is
an
associated
inverted
file
with
references
to
the
http://www.vis.uky.edu/∼stewe/
images
containing
an
instance
of
that
node.
Abstract
uction
recognition is one of the core problems in
Hierarchical
k-‐means
clustering
k =3 L=1 L=2
L=4
L=3
Slide from D. Nister
Slide from D. Nister
Slide from D. Nister
Perform
hierarchical
k-‐means
clustering
K=3
• Adding
an
image
to
the
database
requires
the
following
steps:
‒ Image
feature
descriptors
are
computed.
‒ Each
descriptor
vector
is
dropped
down
from
the
root
of
the
tree
and
quan1zed
into
a
path
down
the
tree
• In
the
online
phase,
each
descriptor
vector
is
propagated
down
the
tree
by
at
each
level
comparing
the
descriptor
vector
to
the
k
candidate
cluster
centers
(represented
by k children
in
the
tree)
and
choosing
the
closest
one.
• k
dot
products
are
performed
at
each
level,
resul1ng
in
a
total
of kL dot
products,
which
is
very
efficient
if
k
is
not
too
large.
The
path
down
the
tree
can
be
encoded
by
a
single
integer
and
is
then
available
for
use
in
scoring.
• The
relevance
of
a
database
image
to
the
query
image
based
on
how
similar
the
paths
down
the
vocabulary
tree
are
for
the
descriptors
from
the
database
image
and
the
query
image.
The
scheme
assigns
weights
to
the
tree
nodes
and
defines
relevance
scores
associated
to
images.
logarithmic in the number of leaf nodes. The memory usage
is linear in the number of leaf nodes k L . The total ! number
of descriptor vectors that must be represented is L i
i=1 k =
L+1
k −k
k−1 ≈ k L . For D-dimensional descriptors represented
as char the size of the tree is approximately Dk L bytes.
With our current implementation, a tree with D = 128, L =
Paths
6 and k = 10, resulting in 1Mof
tleaf
he
nodes,
tree
uses
for
143M
one
B of
memory.
image
with
400
features
4. Definition of Scoring
Once the quantization is defined, we wish to determine
the relevance of a database image to the query image based
on how similar the paths down the vocabulary tree are Figure 3. Three levels of a vocabulary tree with branch factor 10
Scoring
• At
each
node
i a
weight
wi
is
assigned
that
can
be
defined
according
to
one
of
different
schemes:
- a
constant
weigh1ng
scheme
wi = k
! $
N
- an
entropy
weigh-ng
scheme:
w
i
=
log
##
&&
(inverse
document
frequency)
" N i
%
where
N
is
the
number
of
database
images
and
Ni
is
the
number
of
images
with
at
least
one
descriptor
vector
path
through
node
i
• It
is
possible
to
use
stop
lists,
where
wi
is
set
to
zero
for
the
most
frequent
and/or
infrequent
symbols.
Node score
• Each
database
image
is
given
a
relevance
score
based
on
the
L1
normalized
difference
between
the
query
and
the
database
vectors
• Scores
for
the
images
in
the
database
are
accumulated.
The
winner
is
the
image
in
the
database
with
the
most
common
informa1on
with
the
input
image.
Inverted
file
index
• To
implement
scoring
efficiently,
an
inverted
file
index
is
associated
to
each
node
of
the
vocabulary
tree
(the
inverted
file
of
inner
nodes
is
the
concatena1on
of
it’s
children’s
inverted
files).
• Inverted
files
at
each
node
store
the
id-‐numbers
of
the
images
in
which
a
par1cular
node
occurs
and
the
term
frequency
of
that
image.
Indexes
back
to
the
new
image
are
then
added
to
the
relevant
inverted
files.
• Performance
of
vocabulary
tree
is
largely
dependent
upon
its
structure.
Most
important
factors
to
make
the
method
effec1ve
are:
− A
large
vocabulary
tree
(16M
words
against
10K
of
Video
Google)
− Using
informa1ve
features
vs.
uniform
(compute
informa-on
gain
of
features
and
select
the
most
informa1ve
to
build
the
tree
i.e.
features
found
in
all
images
of
a
loca1on,
features
not
in
any
image
of
another
loca1on)
29
Performance
figures
on
6376
images
Performance
increases
with
the
branch
factor
k
Performance
increases
when
the
amount
of
training
data
grows
From Tommasi