Abstract
Many current applications need to organize data with respect to mutual similarity between data objects. Generic similarity retrieval in large data collections is a tough task that has been drawing researchers’ attention for two decades. A typical general strategy to retrieve the most similar objects to a given example is to access and then refine a candidate set of objects; the overall search costs (and search time) then typically correlate with the candidate set size. We propose a generic approach that combines several independent indexes by aggregating their candidate sets in such a way that the resulting candidate set can be one or two orders of magnitude smaller (while keeping the answer quality). This achievement comes at the expense of higher computational costs of the ranking algorithm but experiments on two real-life and one artificial datasets indicate that the overall gain can be significant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amato, G., Gennaro, C., Savino, P.: MI-File: Using inverted files for scalable approximate similarity search. In: Multimedia Tools and Appl., pp. 1–30 (2012)
Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubsky, J., Zezula, P.: Building a web-scale image similarity search system. Multimedia Tools and Appl. 47(3), 599–629 (2010)
Batko, M., Novak, D., Zezula, P.: MESSIF: Metric Similarity Search Implementation Framework. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: R&D. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007)
Beecks, C., Lokoč, J., Seidl, T., Skopal, T.: Indexing the signature quadratic form distance for efficient content-based multimedia retrieval. In: Proc. ACM Int. Conference on Multimedia Retrieval, pp. 1–8 (2011)
Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: A Test Collection for Content-Based Image Retrieval. CoRR 0905.4 (2009)
Chávez, E., Figueroa, K., Navarro, G.: Effective Proximity Retrieval by Ordering Permutations. IEEE Tran.,on Pattern Anal.,& Mach.,Intel. 30(9), 1647–1658 (2008)
Edsberg, O., Hetland, M.L.: Indexing inexact proximity search with distance regression in pivot space. In: Proceedings of SISAP 2010, pp. 51–58. ACM Press, NY (2010)
Esuli, A.: Use of permutation prefixes for efficient and scalable approximate similarity search. Information Processing & Management 48(5), 889–902 (2012)
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proc. of the 14th Annual ACM-SIAM Symposium on Discrete Alg., Phil., USA, pp. 28–36 (2003)
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proceedings of ACM SIGMOD 2003, pp. 301–312. ACM Press, New York (2003)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of VLDB 1999, pp. 518–529. Morgan Kaufmann (1999)
Novak, D., Batko, M., Zezula, P.: Metric Index: An Efficient and Scalable Solution for Precise and Approximate Similarity Search. Information Systems 36(4), 721–733 (2011)
Novak, D., Kyselak, M., Zezula, P.: On locality- sensitive indexing in generic metric spaces. In: Proc. of SISAP 2010, pp. 59–66. ACM Press (2010)
Novak, D., Zezula, P.: Performance Study of Independent Anchor Spaces for Similarity Searching. The Computer Journal, 1–15 (October 2013)
Patella, M., Ciaccia, P.: Approximate similarity search: A multi-faceted problem. Journal of Discrete Algorithms 7(1), 36–48 (2009)
Skala, M.: Counting distance permutations. Journal of Discrete Algorithms 7(1), 49–61 (2009)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Novak, D., Zezula, P. (2014). Rank Aggregation of Candidate Sets for Efficient Similarity Search. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8645. Springer, Cham. https://doi.org/10.1007/978-3-319-10085-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-10085-2_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10084-5
Online ISBN: 978-3-319-10085-2
eBook Packages: Computer ScienceComputer Science (R0)