0% found this document useful (0 votes)
23 views30 pages

A62 Vocabulary Tree

Uploaded by

alex.muravev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views30 pages

A62 Vocabulary Tree

Uploaded by

alex.muravev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Vocabulary

 tree  
Vocabulary  tree

   
•          Recogni1on  can  scale  to  very  large  databases    using  the  Vocabulary  Tree  indexing  approach    
             [Nistér  and  Stewénius,  CVPR  2006].  Vocabulary  Tree  performs  instance  object  recogni1on.                      
             It  does  not  perform  recogni1on  at  category-­‐level.  
 
•          Vocabulary  Tree  method  follows  three  steps:  
1. organize  local  descriptors  of  images  a  in  a  tree  using  hierarchical  k-­‐means  clustering.          
Inverted  files  are  stored  at  each  node  with  scores  (offline).  
2. generate  a  score  for  a  given  query  image  based  on  Term  Frequency  –  Inverse  Document    
             Frequency.  
3.        find  the  images  in  the  database  that  best  match  that  score.  

•        Vocabulary  tree supports  very  efficient  retrieval.  It  only  cares  about  the  distance  between  a  query    
           feature  and  each  node.      
 

   

2  
Building  the  Vocabulary  Tree

• The  vocabulary  tree  is  a  hierarchical  set  of  cluster  centers  and  their  corresponding  Voronoi  
regions:    
− For  each  image  in  the  database,  extract  MSER  regions  and  calculate  a  set  of  feature  point  
descriptors  (e.g.  128  SIFT).    
− Build  the  vocabulary  tree  using  hierarchical  k-­‐means  clustering:  
• run k-­‐means  recursively  on  each  of  the  resul1ng  quan1za1on  cells  up  to  a  max  number  
of  levels  L    (L=6 max  suggested)                    
• nodes  are  the  centroids;  leaves  are  the  visual  words.    
• k defines  the  branch-­‐factor  of  the  tree,  which  indicates  how  fast  the  tree  branches          
(k=10  max  suggested)  

 
 
3  
• A   large  number  
Scalable of  ellip1cal  
Recognition with raegions  
Vocabularyare  extracted  
Tree from  the  image  and  warped  to  canonical  
posi1ons.  A  descriptor  vector  is  computed  for  each  region.  The  descriptor  vector  is  then  
hierarchically  
DavidqNistér
uan1zed   by  the  
and Henrik vocabulary  tree.      
Stewénius
Center for Visualization and Virtual Environments
Department of Computer Science, University of Kentucky
• With  each  node  in  the  vocabulary  
http://www.vis.uky.edu/∼dnister/
tree  there  is  an  associated  inverted  file  with  references  to  the  
http://www.vis.uky.edu/∼stewe/
images  containing  an  instance  of  that  node.    
   
Abstract

nition scheme that scales efficiently to a large


bjects is presented. The efficiency and quality is
a live demonstration that recognizes CD-covers
base of 40000 images of popular music CD’s.
me builds upon popular techniques of indexing
extracted from local regions, and is robust
und clutter and occlusion. The local region
are hierarchically quantized in a vocabulary
vocabulary tree allows a larger and more
ory vocabulary to be used efficiently, which we
imentally leads to a dramatic improvement in
uality. The most significant property of the
hat the tree directly defines the quantization. The
n and the indexing are therefore fully integrated,
being one and the same.
ognition quality is evaluated through retrieval
ase with ground truth, showing the power of
ary tree approach, going as high as 1 million

uction
recognition is one of the core problems in
Hierarchical  k-­‐means  clustering  

 
k =3 L=1 L=2
 
 

L=4
L=3
Slide from D. Nister
Slide from D. Nister
Slide from D. Nister
Perform  hierarchical  k-­‐means  clustering        

K=3  

Slide from D. Nister


Slide from D. Nister
Slide from D. Nister
Slide from D. Nister
Slide from D. Nister
Slide from D. Nister
Slide from D. Nister
Slide from D. Nister
Slide from D. Nister
Slide from D. Nister
As  the  vocabulary  tree  is  formed  and  is  on-­‐line,  new  images  can  be  inserted  in  the  
database.  

Slide from D. Nister


Adding  images  to  the  tree  

• Adding  an  image  to  the  database  requires  the  following  steps:  
 
‒ Image  feature  descriptors  are  computed.    
‒ Each  descriptor  vector  is  dropped  down  from  the  root  of  the  tree  and  quan1zed  into  a  path  
down  the  tree  

Slide from D. Nister


Slide from D. Nister
Slide from D. Nister
Slide from D. Nister
Querying  with  Vocabulary  Tree  

 
• In  the  online  phase,  each  descriptor  vector  is  propagated  down  the  tree  by  at  each  level  comparing  
the  descriptor  vector  to  the  k  candidate  cluster  centers  (represented  by k children  in  the  tree)  and  
choosing  the  closest  one.    

• k  dot  products  are  performed  at  each  level,  resul1ng  in  a  total  of kL dot  products,  which  is  very  
efficient  if  k  is  not  too  large.  The  path  down  the  tree  can  be  encoded  by  a  single  integer  and  is  then  
available  for  use  in  scoring.    

• The  relevance  of  a  database  image  to  the  query  image  based  on  how  similar  the  paths  down  the  
vocabulary  tree  are  for  the  descriptors  from  the  database  image  and  the  query  image.  The  scheme  
assigns  weights  to  the  tree  nodes  and  defines  relevance  scores  associated  to  images.        
logarithmic in the number of leaf nodes. The memory usage
is linear in the number of leaf nodes k L . The total ! number
of descriptor vectors that must be represented is L i
i=1 k =
L+1
k −k
k−1 ≈ k L . For D-dimensional descriptors represented
as char the size of the tree is approximately Dk L bytes.
With our current implementation, a tree with D = 128, L =
Paths  
6 and k = 10, resulting in 1Mof  tleaf
he  nodes,
tree  uses
for  143M
one  B of
memory.
image  with  400  features  
4. Definition of Scoring
Once the quantization is defined, we wish to determine
the relevance of a database image to the query image based
on how similar the paths down the vocabulary tree are Figure 3. Three levels of a vocabulary tree with branch factor 10
Scoring  

• At  each  node    i a  weight  wi    is  assigned  that  can  be  defined  according  to  one  of  different  schemes:  
- a  constant  weigh1ng  scheme          wi = k   ! $
N
- an  entropy  weigh-ng  scheme:    w      i      =      log
             ##                &&      (inverse  document  frequency)                
                " N i   %  where      N      is  the  number  of    database  images  
                   and  Ni  is  the  number  of  images  with  at  least  
                   one  descriptor  vector  path  through  node  i  
• It  is  possible  to  use  stop  lists,  where  wi  is  set  to  zero  for  the  most  frequent  and/or  infrequent  
symbols.    

Node  score  

N=4    Ni=2                  w = log(2)    

N=4    Ni=1                  w  =  log(4)  

Image from D. Nister


• Query  qi  and  database  vectors  di  are  defined  according  to  the  assigned  weights  as  :  
‒ qi = mi wi
‒ di = ni wi    

         where  mi  is  the  number    of  the  descriptor    


                                               vectors  of  the  query    with  a  path  along  the    
                                               node  i and    wi its  weight  

ni is  the  the  number  of  the  descriptor  vectors  of  


         each  database  image  with  a  path  along  the    
                                               node  i.
 
di = ni wi     S=  2  log(2)   S  =2  log(4)  

 
• Each  database  image  is  given  a  relevance  score  based  on  the  L1  normalized  difference    between  
the  query  and  the  database  vectors    

 
• Scores  for  the  images  in  the  database  are  accumulated.  The    winner  is  the  image  in  the  database  
with  the  most  common  informa1on  with  the  input  image.  
 
 
Inverted  file  index  
• To  implement  scoring  efficiently,  an  inverted  file  index  is  associated  to  each  node  of  the  vocabulary  
tree  (the  inverted  file  of  inner  nodes  is  the  concatena1on  of  it’s  children’s  inverted  files).  
• Inverted  files  at  each  node  store  the  id-­‐numbers  of  the  images  in  which  a  par1cular  node  occurs  
and  the  term  frequency  of  that  image.  Indexes  back  to  the  new  image  are  then  added  to  the  
relevant  inverted  files.    
 

nid  =  n.  1mes  visual  word  i  appears  in  doc  d


nd  =  n.  visual  words  in  doc  d

Inverted  file  index  

D1,  t11=1/n1,  Img1   D1,  t21=1/n2,  Img2  


D2,  t11=2/n1,  Img1  
Image from D.
Nister
Slide from D. Nister
   Performance  considera1ons  

• Performance  of  vocabulary  tree  is  largely  dependent  upon  its  structure.  Most  important  factors  
to  make  the  method  effec1ve  are:  
− A  large  vocabulary  tree  (16M  words  against  10K  of  Video  Google)  
− Using  informa1ve  features  vs.  uniform  (compute  informa-on  gain  of  features  and  select  
the  most  informa1ve  to  build  the  tree  i.e.  features  found  in  all  images  of  a  loca1on,  
features  not  in  any  image  of  another  loca1on)  

 
 
 

29  
Performance  figures  on  6376  images  

Performance  increases  significantly  with  the  number    of  leaf  nodes  

 
 
Performance  increases  with  the  branch  factor  k  
 
 
 
 
 
 
 
 
 
 
Performance  increases  when  the  amount  of  training  data  grows  
 

From Tommasi

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy