The Role of Query Sessions in Extracting Instance Attributes from Web Search Queries

Paşca, Marius; Alfonseca, Enrique; Robledo-Arnuncio, Enrique; Martin-Brualla, Ricardo; Hall, Keith

doi:10.1007/978-3-642-12275-0_9

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

European Conference on Information Retrieval

2320 Accesses
3 Citations

Abstract

Per-instance attributes are acquired using a weakly supervised extraction method which exploits anonymized Web-search query sessions, as an alternative to isolated, individual queries. Examples of these attributes are top speed for chevrolet corvette, or population density for brazil). Inherent challenges associated with using sessions for attribute extraction, such as a large majority of within-session queries not being related to attributes, are overcome by using attributes globally extracted from isolated queries as an unsupervised filtering mechanism. In a head-to-head qualitative comparison, the ranked lists of attributes generated by merging attributes extracted from query sessions, on one hand, and from isolated queries, on another hand, are about 12% more accurate on average, than the attributes extracted from isolated queries by a previous method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

PROCLAIM: An Unsupervised Approach to Discover Domain-Specific Attribute Matchings from Heterogeneous Sources

DataGorri: a tool for automated data collection of tabular web content

Article 01 October 2018

Data Discovery

References

Grishman, R., Sundheim, B.: Message Understanding Conference-6: a brief history. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 1, pp. 466–471 (1996)
Google Scholar
Chklovski, T., Gil, Y.: An analysis of knowledge collected from volunteer contributors. In: Proceedings of the National Conference on Artificial Intelligence, p. 564 (2005)
Google Scholar
Etzioni, O., Banko, M., Soderland, S., Weld, S.: Open information extraction from the web. Communications of the ACM 51(12) (December 2008)
Google Scholar
Sekine, S.: On-demand information extraction. In: Proceedings of the COLING/ACL on Main conference poster sessions, pp. 731–738 (2006)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 2670–2676 (2007)
Google Scholar
Tokunaga, K., Kazama, J., Torisawa, K.: Automatic discovery of attribute words from web documents. In: Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP 2005), Jeju Island, Korea, pp. 106–118 (2005)
Google Scholar
Yoshinaga, N., Torisawa, K.: Open-domain attribute-value acquisition from semi-structured texts. In: Proceedings of the Workshop on Ontolex, pp. 55–66 (2007)
Google Scholar
Cafarella, M., Halevy, A., Wang, D., Zhang, Y.: Webtables: Exploring the power of tables on the web. Proceedings of the VLDB Endowment archive 1(1), 538–549 (2008)
Google Scholar
Wu, F., Hoffmann, R., Weld, D.: Information extraction from Wikipedia: Moving down the long tail. In: Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 731–739 (2008)
Google Scholar
Paşca, M., Van Durme, B.: What you seek is what you get: Extraction of class attributes from query logs. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 2832–2837 (2007)
Google Scholar
Paşca, M.: Organizing and searching the World Wide Web of facts - step two: Harnessing the wisdom of the crowds. In: Proceedings of the 16th World Wide Web Conference (WWW 2007), Banff, Canada, pp. 101–110 (2007)
Google Scholar
Pustejovsky, J.: The Generative Lexicon: a Theory of Computational Lexical Semantics. The MIT Press, Cambridge (1991)
Google Scholar
Guarino, N.: Concepts, attributes and arbitrary relations. Data and Knowledge Engineering 8, 249–261 (1992)
Article Google Scholar
Schubert, L.: Turing’s dream and the knowledge challenge. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), Boston, Massachusetts (2006)
Google Scholar
Bellare, K., Talukdar, P., Kumaran, G., Pereira, F., Liberman, M., McCallum, A., Dredze, M.: Lightly-supervised attribute extraction. In: NIPS 2007 Workshop on Machine Learning for Web Search (2007)
Google Scholar
Probst, K., Ghani, R., Krema, M., Fano, A., Liu, Y.: Semi-supervised learning of attribute-value pairs from product descriptions. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 2838–2843 (2007)
Google Scholar
Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large Web search engine query log. In: ACM SIGIR Forum, pp. 6–12 (1999)
Google Scholar
Jansen, B., Spink, A., Taksa, I.: Handbook of Research on Web Log Analysis. Information Science Reference (2008)
Google Scholar
He, D., Goker, A.: Detecting session boundaries from web user logs. In: Proceedings of the BCS-IRSG 22nd Annual Colloquium on Information Retrieval Research, pp. 57–66 (2000)
Google Scholar
Wen, J., Nie, J., Zhang, H.: Clustering user queries of a search engine. In: Proceedings of the International Conference on World Wide Web (2001)
Google Scholar
Zhang, Z., Nasraoui, O.: Mining search engine query logs for query recommendation. In: Proceedings of the 15th International Conference on World Wide Web, pp. 1039–1040 (2006)
Google Scholar
Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Google Scholar
Jones, R., Rey, B., Madani, O., Greiner, W.: Generating query substitutions. In: Proceedings of the 15th International Conference on World Wide Web, pp. 387–396 (2006)
Google Scholar
Rey, B., Jhala, P.: Mining associations from Web query logs. In: Proceedings of the Web Mining Workshop, Berlin, Germany (2006)
Google Scholar
Xue, G.R., Zeng, H.J., Chen, Z., Yu, Y., Ma, W.Y., Xi, W., Fan, W.: Optimizing Web search using Web click-through data. In: CIKM 2004: Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pp. 118–126 (2004)
Google Scholar
Ma, H., Yang, H., King, I., Lyu, M.R.: Learning latent semantic relations from clickthrough data for query suggestion. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 709–718 (2008)
Google Scholar
Lau, T., Horvitz, E.: Patterns of search: Analyzing and modeling web query refinement. In: Proceedings of the International User Modelling Conference (1999)
Google Scholar
Jones, R., Klinkner, K.L.: Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In: CIKM 2008: Proceeding of the 17th ACM conference on Information and Knowledge Management, pp. 699–708 (2008)
Google Scholar
Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., Vigna, S.: The query-flow graph: model and applications. In: CIKM 2008: Proceeding of the 17th ACM conference on Information and Knowledge Management, pp. 609–618 (2008)
Google Scholar
Baeza-Yates, R., Tiberi, A.: Extracting semantic relations from query logs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 76–85 (2007)
Google Scholar
Shen, D., Qin, M., Chen, W., Yang, Q., Chen, Z.: Mining Web query hierarchies from clickthrough data. In: Proceedings of the National Conference on Artificial Intelligence (2007)
Google Scholar
Paşca, M., Van Durme, B.: Weakly-supervised acquisition of open-domain classes and class attributes from web documents and query logs. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008), Columbus, Ohio, pp. 19–27 (2008)
Google Scholar
Komachi, M., Makimoto, S., Uchiumi, K., Sassano, M.: Learning semantic categories from clickthrough logs. In: Proceedings of the ACL-IJCNLP 2009 Conference, Short Papers, pp. 189–192 (2009)
Google Scholar
Pennacchiotti, M., Pantel, P.: Entity extraction via ensemble semantics. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore, pp. 238–247 (2009)
Google Scholar
Wang, X., Chakrabarti, D., Punera, K.: Mining broad latent query aspects from search sessions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 867–876 (2009)
Google Scholar
Wong, T., Lam, W.: An unsupervised method for joint information extraction and feature mining across different web sites. Data & Knowledge Engineering 68(1), 107–125 (2009)
Article Google Scholar
Ravi, S., Paşca, M.: Using structured text for large-scale attribute extraction. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM 2008), pp. 1183–1192 (2008)
Google Scholar
Suchanek, F., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge unifying WordNet and Wikipedia. In: Proceedings of the 16th World Wide Web Conference (WWW 2007), Banff, Canada, pp. 697–706 (2007)
Google Scholar
Nastase, V., Strube, M.: Decoding Wikipedia categories for knowledge acquisition. In: Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI 2008), Chicago, Illinois, pp. 1219–1224 (2008)
Google Scholar
Wu, F., Weld, D.: Automatically refining the Wikipedia infobox ontology. In: Proceedings of the 17th World Wide Web Conference (WWW 2008), Beijing, China, pp. 635–644 (2008)
Google Scholar
Raju, S., Pingali, P., Varma, V.: An unsupervised approach to product attribute extraction. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, pp. 796–800 (2009)
Google Scholar
Paşca, M., Van Durme, B., Garera, N.: The role of documents vs. queries in extracting class attributes from text. In: Proceedings of the 16th International Conference on Information and Knowledge Management (CIKM 2007), Lisbon, Portugal, pp. 485–494 (2007)
Google Scholar
Spink, A., Jansen, B., Wolfram, D., Saracevic, T.: From e-sex to e-commerce: Web search changes. IEEE Computer 35(3), 107–109 (2002)
Google Scholar
Hogan, K.: Interpreting hitwise statistics on longer queries. Technical report, Ask.com (2009)
Google Scholar
Barr, C., Jones, R., Regelson, M.: The linguistic structure of english web-search queries. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 1021–1030 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Google Inc,
Marius Paşca, Enrique Alfonseca, Enrique Robledo-Arnuncio, Ricardo Martin-Brualla & Keith Hall

Authors

Marius Paşca
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Alfonseca
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Robledo-Arnuncio
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Martin-Brualla
View author publications
You can also search for this author in PubMed Google Scholar
Keith Hall
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Adaptive Information Cluster, Dublin City University, Dublin, 9, Ireland
Cathal Gurrin
The Open University, Walton Hall, MK7 6HF, Milton Keynes, UK
Yulan He
Microsoft Research Ltd, 7 JJ Thomson Avenue, CB3 0FB, Cambridge, UK
Gabriella Kazai
Department of Computer Science, University of Essex, Wivenhoe Park, CO4 3SQ, Colchester, UK
Udo Kruschwitz
The Open University, Walton Hall, Milton Keynes, UK
Suzanne Little
University of London, London, UK
Thomas Roelleke
Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Department of Computing Science, University of Glasgow, 17 Lilybank Gardens, G12 8QQ, Glasgow, UK
Keith van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paşca, M., Alfonseca, E., Robledo-Arnuncio, E., Martin-Brualla, R., Hall, K. (2010). The Role of Query Sessions in Extracting Instance Attributes from Web Search Queries. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-12275-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12274-3
Online ISBN: 978-3-642-12275-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Role of Query Sessions in Extracting Instance Attributes from Web Search Queries

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

PROCLAIM: An Unsupervised Approach to Discover Domain-Specific Attribute Matchings from Heterogeneous Sources

DataGorri: a tool for automated data collection of tabular web content

Data Discovery

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

The Role of Query Sessions in Extracting Instance Attributes from Web Search Queries

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

PROCLAIM: An Unsupervised Approach to Discover Domain-Specific Attribute Matchings from Heterogeneous Sources

DataGorri: a tool for automated data collection of tabular web content

Data Discovery

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.