skip to main content
10.1145/1367497.1367594acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Exploring social annotations for information retrieval

Published: 21 April 2008 Publication History

Abstract

Social annotation has gained increasing popularity in many Web-based applications, leading to an emerging research area in text analysis and information retrieval. This paper is concerned with developing probabilistic models and computational algorithms for social annotations. We propose a unified framework to combine the modeling of social annotations with the language modeling-based methods for information retrieval. The proposed approach consists of two steps: (1) discovering topics in the contents and annotations of documents while categorizing the users by domains; and (2) enhancing document and query language models by incorporating user domain interests as well as topical background models. In particular, we propose a new general generative model for social annotations, which is then simplified to a computationally tractable hierarchical Bayesian network. Then we apply smoothing techniques in a risk minimization framework to incorporate the topical information to language models. Experiments are carried out on a real-world annotation data set sampled from del.icio.us. Our results demonstrate significant improvements over the traditional approaches.

References

[1]
T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, 284(5):34--43, 2001.
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 2003.
[3]
S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. Semtag and seeker: bootstrapping the semantic web via automated semantic annotation. In Proceedings of the 12th international conference on World Wide Web, pages 178--186, 2003.
[4]
S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, pages 198--208, 2006.
[5]
T. Griffiths and M. Steyvers. Finding scientific topics. In National Academy of Sciences, 2004.
[6]
A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011 of LNAI, pages 411--426, Heidelberg, June 2006. Springer.
[7]
P. Jackson. Introduction to expert systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1986.
[8]
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pages 41--48, 2000.
[9]
F. Jelinek and R. Mercer. Interpolated estimation of markov source parameters from sparse data. In Pattern recognition in Practice, 1980.
[10]
R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th international conference on World Wide Web, pages 568--576, 2003.
[11]
O. Kurland, L. Lee, and C. Domshlak. Better than the real thing?: iterative pseudo-query processing using cluster-based language models. In SIGIR ?05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--26, New York, NY, USA, 2005. ACM Press.
[12]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR ?01: Proceedings of the 24th annual international conference on Research and development in information retrieval, pages 111--119, 2001.
[13]
A. K. McCallum. Multi-label text classification with a mixture model trained by em. In AAAI?09 Workshop on Text Learning, 1999.
[14]
Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In KDD ?05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 198--207, New York, NY, USA, 2005. ACM Press.
[15]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR ?98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281, New York, NY, USA, 1998. ACM Press.
[16]
C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer Publisher, 2nd Edition, 2005.
[17]
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI ?04: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pages 487--494. UAI Press, 2004.
[18]
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In KDD ?04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 306--315. ACM Press, 2004.
[19]
T. Tao, X. Wang, Q. Mei, and C. Zhai. Language model information retrieval with document expansion. In HLT-NAACL, 2006.
[20]
X. Wu, L. Zhang, and Y. Yu. Exploring social annotations for the semantic web. In WWW ?06: Proceedings of the 15th international conference on World Wide Web, pages 417--426, New York, NY, USA, 2006. ACM Press.
[21]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transaction of information System, 22(2):179--214, 2004.
[22]
D. Zhou, E. Manavoglu, J. Li, C. L. Giles, and H. Zha. Probabilistic models for discovering e-communities. In WWW ?06: Proceedings of the 15th international conference on World Wide Web, pages 173--182. ACM Press, 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

WWW '08: Proceedings of the 17th international conference on World Wide Web
April 2008
1326 pages
ISBN:9781605580852
DOI:10.1145/1367497
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. folksonomy
  2. information retrieval
  3. language modeling
  4. social annotations

Qualifiers

  • Research-article

Conference

WWW '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Tweet Contextualization Approach Using a Semantic Query ExpansionProcedia Computer Science10.1016/j.procs.2021.08.040192:C(387-396)Online publication date: 1-Jan-2021
  • (2020)The Extended Dawid-Skene ModelMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-43823-4_11(121-136)Online publication date: 28-Mar-2020
  • (2019)Investigating Users' Tagging Behavior in Online Academic Community Based on Growth ModelInformation Systems Frontiers10.1007/s10796-018-9891-221:4(761-772)Online publication date: 1-Aug-2019
  • (2019)User-aware topic modeling of online reviewsMultimedia Systems10.1007/s00530-017-0557-625:1(59-69)Online publication date: 1-Feb-2019
  • (2019)Integrating social annotations into topic models for personalized document retrievalSoft Computing10.1007/s00500-019-03998-1Online publication date: 24-Apr-2019
  • (2019)Exploiting Social Data to Enhance Web SearchFuture Data and Security Engineering10.1007/978-3-030-35653-8_38(593-607)Online publication date: 20-Nov-2019
  • (2018)Social SearchSocial Information Access10.1007/978-3-319-90092-6_7(213-276)Online publication date: 3-May-2018
  • (2017)Discovery and Dynamic Prediction of User's Interest Based on ARIMA2017 Portland International Conference on Management of Engineering and Technology (PICMET)10.23919/PICMET.2017.8125452(1-8)Online publication date: Jul-2017
  • (2017)A Topic Model Based on Poisson DecompositionProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132942(1489-1498)Online publication date: 6-Nov-2017
  • (2017)A method of optimizing LDA result purity based on semantic similarity2017 32nd Youth Academic Annual Conference of Chinese Association of Automation (YAC)10.1109/YAC.2017.7967434(361-365)Online publication date: May-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy