Abstract
Matching users’ information needs and relevant documents is the basic goal of information retrieval systems. However, relevant documents do not necessarily contain the same terms as the ones in users’ queries. In this paper, we use semantics to better express users’ queries. Furthermore, we distinguish between two types of concepts: those extracted from a set of pseudo relevance documents, and those extracted from a semantic resource such as an ontology. With this distinction in mind we propose a Semantic Mixed query Expansion and Reformulation Approach (SMERA) that uses these two types of concepts to improve web queries. This approach considers several challenges such as the selective choice of expansion terms, the treatment of named entities, and the reformulation of the query in a user-friendly way. We evaluate SMERA on four standard web collections from INEX and TREC evaluation campaigns. Our experiments show that SMERA improves the performance of an information retrieval system compared to non-modified original queries. In addition, our approach provides a statistically significant improvement in precision over a competitive query expansion method while generating concept-based queries that are more comprehensive and easy to interpret.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In this paper we define an effective query is the one that obtains good results with standard measures used in evaluation campaigns, in particular, precision measures for the case of web queries.
- 2.
LSI: Latent Semantic Indexing (Deerwester et al. 1990).
- 3.
- 4.
- 5.
References
Audeh, B., Beaune, P., & Beigbeder, M. (2013). Recall-oriented evaluation for information retrieval systems. In: Information Retrieval Facility Conference (IRFC), Limassol, Chypre.
Barr, C., Jones, R., & Regelson, M. (2008). The linguistic structure of english web-search queries. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1021–1030). Association for Computational Linguistics.
Bendersky, M., & Croft, W. B. (2008). Discovering key concepts in verbose queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 491–498). ACM.
Bendersky, M., Metzler, D., & Croft, W. B. (2012). Effective query formulation with multiple information sources. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (pp. 443–452). ACM.
Bendersky, M., Rey, M., & Croft, W. B. (2011). Parameterized concept weighting in verbose queries. In SIGIR. ACM Press.
Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.
Brandao, W., Silva, A., Moura, E., & Ziviani, N. (2011). Exploiting entity semantics for query expansion. In IADIS International Conference WWW/Internet, Rio de Janeiro.
Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 299).
Deerwester, S., Dumais, S. T., Furnas, G. W., & Landauer, T. K. (1990). Indexing by latent semantic analysis. Society, 41, 391–407.
Deveaud, R., Bonnefoy, L., & Bellot, P. (2013). Quantification et identification des concepts implicites d’une requête. In CORIA 2013, La dixième édition de la COnférence en Recherche d’Information et Applications, Neuchâtel.
Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI.
Hoffart, J., Yosef, M. A., Bordino, I., Furstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., & Weikum, G. (2011). Robust disambiguation of named entities in text. In EMNLP 2011 Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782–792).
Huston, S., & Croft, W. B. (2010). Evaluating verbose query processing techniques. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 291–298). ACM.
Jansen, B. J., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: A study and analysis of user queries on the web. Information Processing and Management, 36, 207–227.
Kumaran, G., & Carvalho, V. R. (2009). Reducing long queries using query quality predictors. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 564). NY, USA: ACM Press.
Lavrenko, V., & Croft, W. B. (2001). Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 120–127). NY, USA: ACM Press.
Maxwell, K. T., & Croft, W. B. (2013). Compact query term selection using topically related text. In Proceedings of the 36th International ACM SIGIR (pp. 583–592).
Metzler, D., & Croft, W. B. (2004). Combining the language model and inference network approaches to retrieval. Information Processing and Management, 40, 735–750.
Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 472). NY, USA: ACM Press.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–244.
Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 275–281). ACM.
Qiu, Y., & Frei, H. (1993). Concept based query expansion. In Proceedings of the International ACM SIGIR Conference on Research and Development in Informaion Retrieval (Vol. 11, p. 212). NY: ACM.
Rocchio, J. J., & Salton, G. (1965). Information search optimization and iterative retrieval techniques. In Fall Joint Computer Conference (pp. 293–305).
Shah, C., & Croft, W. B. (2004). Evaluating high accuracy retrieval techniques chirag shah. In SIGIR. ACM Press.
Strohman, T., Metzler, D., Turtle, H., & Croft, W. (2004). Indri: A language-model based search engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis.
Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web (pp. 697–706). ACM.
Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In SIGIR 1994. ACM Press.
Xu, Y., Ding, F., & Wang, B. (2008). Entity-based query reformulation using wikipedia. In Proceeding of the 17th ACM Conference on Information and Knowledge Mining - CIKM 2008 (p. 1441). NY, USA: ACM Press.
Zhao, L., & Callan, J. (2010). Term necessity prediction. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (pp. 259–268). ACM.
Zobel, J. (2004). Questioning query expansion: An examination of behaviour and parameters. In SIGIR. ACM Press.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Audeh, B., Beaune, P., Beigbeder, M. (2017). SMERA: Semantic Mixed Approach for Web Query Expansion and Reformulation. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-45763-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45762-8
Online ISBN: 978-3-319-45763-5
eBook Packages: EngineeringEngineering (R0)