Rank Aggregation of Candidate Sets for Efficient Similarity Search

Novak, David; Zezula, Pavel

doi:10.1007/978-3-319-10085-2_4

David Novak²⁰ &
Pavel Zezula²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8645))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1469 Accesses

Abstract

Many current applications need to organize data with respect to mutual similarity between data objects. Generic similarity retrieval in large data collections is a tough task that has been drawing researchers’ attention for two decades. A typical general strategy to retrieve the most similar objects to a given example is to access and then refine a candidate set of objects; the overall search costs (and search time) then typically correlate with the candidate set size. We propose a generic approach that combines several independent indexes by aggregating their candidate sets in such a way that the resulting candidate set can be one or two orders of magnitude smaller (while keeping the answer quality). This achievement comes at the expense of higher computational costs of the ranking algorithm but experiments on two real-life and one artificial datasets indicate that the overall gain can be significant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Rank-Based Similarity Index (RBSI) in a Multidimensional DataSet

Flexible Aggregate Similarity Search in High-Dimensional Data Sets

An Adaptive Similarity Search in Massive Datasets

References

Amato, G., Gennaro, C., Savino, P.: MI-File: Using inverted files for scalable approximate similarity search. In: Multimedia Tools and Appl., pp. 1–30 (2012)
Google Scholar
Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubsky, J., Zezula, P.: Building a web-scale image similarity search system. Multimedia Tools and Appl. 47(3), 599–629 (2010)
Article Google Scholar
Batko, M., Novak, D., Zezula, P.: MESSIF: Metric Similarity Search Implementation Framework. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: R&D. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007)
Google Scholar
Beecks, C., Lokoč, J., Seidl, T., Skopal, T.: Indexing the signature quadratic form distance for efficient content-based multimedia retrieval. In: Proc. ACM Int. Conference on Multimedia Retrieval, pp. 1–8 (2011)
Google Scholar
Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: A Test Collection for Content-Based Image Retrieval. CoRR 0905.4 (2009)
Google Scholar
Chávez, E., Figueroa, K., Navarro, G.: Effective Proximity Retrieval by Ordering Permutations. IEEE Tran.,on Pattern Anal.,& Mach.,Intel. 30(9), 1647–1658 (2008)
Article Google Scholar
Edsberg, O., Hetland, M.L.: Indexing inexact proximity search with distance regression in pivot space. In: Proceedings of SISAP 2010, pp. 51–58. ACM Press, NY (2010)
Google Scholar
Esuli, A.: Use of permutation prefixes for efficient and scalable approximate similarity search. Information Processing & Management 48(5), 889–902 (2012)
Article Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proc. of the 14th Annual ACM-SIAM Symposium on Discrete Alg., Phil., USA, pp. 28–36 (2003)
Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proceedings of ACM SIGMOD 2003, pp. 301–312. ACM Press, New York (2003)
Chapter Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of VLDB 1999, pp. 518–529. Morgan Kaufmann (1999)
Google Scholar
Novak, D., Batko, M., Zezula, P.: Metric Index: An Efficient and Scalable Solution for Precise and Approximate Similarity Search. Information Systems 36(4), 721–733 (2011)
Article Google Scholar
Novak, D., Kyselak, M., Zezula, P.: On locality- sensitive indexing in generic metric spaces. In: Proc. of SISAP 2010, pp. 59–66. ACM Press (2010)
Google Scholar
Novak, D., Zezula, P.: Performance Study of Independent Anchor Spaces for Similarity Searching. The Computer Journal, 1–15 (October 2013)
Google Scholar
Patella, M., Ciaccia, P.: Approximate similarity search: A multi-faceted problem. Journal of Discrete Algorithms 7(1), 36–48 (2009)
Article MATH MathSciNet Google Scholar
Skala, M.: Counting distance permutations. Journal of Discrete Algorithms 7(1), 49–61 (2009)
Article MATH MathSciNet Google Scholar
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Masaryk University, Brno, Czech Republic
David Novak & Pavel Zezula

Authors

David Novak
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Zezula
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Tecnológico de Informática, 46022, Valencia, Spain
Hendrik Decker
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Department of Computer Science, The University of Auckland, 1010, Auckland, New Zealand
Sebastian Link
Knowledge Management, LMU University of Munich, Leopoldstraße 13, 80802, Munich, Germany
Marcus Spies
FAW, University of Linz, Altenbergerstrasse 69, 4040, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Novak, D., Zezula, P. (2014). Rank Aggregation of Candidate Sets for Efficient Similarity Search. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8645. Springer, Cham. https://doi.org/10.1007/978-3-319-10085-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-10085-2_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10084-5
Online ISBN: 978-3-319-10085-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Rank Aggregation of Candidate Sets for Efficient Similarity Search

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Rank-Based Similarity Index (RBSI) in a Multidimensional DataSet

Flexible Aggregate Similarity Search in High-Dimensional Data Sets

An Adaptive Similarity Search in Massive Datasets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Rank Aggregation of Candidate Sets for Efficient Similarity Search

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Rank-Based Similarity Index (RBSI) in a Multidimensional DataSet

Flexible Aggregate Similarity Search in High-Dimensional Data Sets

An Adaptive Similarity Search in Massive Datasets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.