research-article

Exploring social annotations for information retrieval

Authors:

C. Lee GilesAuthors Info & Claims

WWW '08: Proceedings of the 17th international conference on World Wide Web

Pages 715 - 724

https://doi.org/10.1145/1367497.1367594

Published: 21 April 2008 Publication History

Abstract

Social annotation has gained increasing popularity in many Web-based applications, leading to an emerging research area in text analysis and information retrieval. This paper is concerned with developing probabilistic models and computational algorithms for social annotations. We propose a unified framework to combine the modeling of social annotations with the language modeling-based methods for information retrieval. The proposed approach consists of two steps: (1) discovering topics in the contents and annotations of documents while categorizing the users by domains; and (2) enhancing document and query language models by incorporating user domain interests as well as topical background models. In particular, we propose a new general generative model for social annotations, which is then simplified to a computationally tractable hierarchical Bayesian network. Then we apply smoothing techniques in a risk minimization framework to incorporate the topical information to language models. Experiments are carried out on a real-world annotation data set sampled from del.icio.us. Our results demonstrate significant improvements over the traditional approaches.

References

[1]

T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, 284(5):34--43, 2001.

[2]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 2003.

Digital Library

[3]

S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. Semtag and seeker: bootstrapping the semantic web via automated semantic annotation. In Proceedings of the 12th international conference on World Wide Web, pages 178--186, 2003.

Digital Library

[4]

S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, pages 198--208, 2006.

Digital Library

[5]

T. Griffiths and M. Steyvers. Finding scientific topics. In National Academy of Sciences, 2004.

[6]

A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011 of LNAI, pages 411--426, Heidelberg, June 2006. Springer.

Digital Library

[7]

P. Jackson. Introduction to expert systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1986.

Digital Library

[8]

K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pages 41--48, 2000.

Digital Library

[9]

F. Jelinek and R. Mercer. Interpolated estimation of markov source parameters from sparse data. In Pattern recognition in Practice, 1980.

[10]

R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th international conference on World Wide Web, pages 568--576, 2003.

Digital Library

[11]

O. Kurland, L. Lee, and C. Domshlak. Better than the real thing?: iterative pseudo-query processing using cluster-based language models. In SIGIR ?05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--26, New York, NY, USA, 2005. ACM Press.

Digital Library

[12]

J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR ?01: Proceedings of the 24th annual international conference on Research and development in information retrieval, pages 111--119, 2001.

Digital Library

[13]

A. K. McCallum. Multi-label text classification with a mixture model trained by em. In AAAI?09 Workshop on Text Learning, 1999.

[14]

Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In KDD ?05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 198--207, New York, NY, USA, 2005. ACM Press.

Digital Library

[15]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR ?98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281, New York, NY, USA, 1998. ACM Press.

Digital Library

[16]

C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer Publisher, 2nd Edition, 2005.

Digital Library

[17]

M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI ?04: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pages 487--494. UAI Press, 2004.

Digital Library

[18]

M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In KDD ?04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 306--315. ACM Press, 2004.

Digital Library

[19]

T. Tao, X. Wang, Q. Mei, and C. Zhai. Language model information retrieval with document expansion. In HLT-NAACL, 2006.

Digital Library

[20]

X. Wu, L. Zhang, and Y. Yu. Exploring social annotations for the semantic web. In WWW ?06: Proceedings of the 15th international conference on World Wide Web, pages 417--426, New York, NY, USA, 2006. ACM Press.

Digital Library

[21]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transaction of information System, 22(2):179--214, 2004.

Digital Library

[22]

D. Zhou, E. Manavoglu, J. Li, C. L. Giles, and H. Zha. Probabilistic models for discovering e-communities. In WWW ?06: Proceedings of the 15th international conference on World Wide Web, pages 173--182. ACM Press, 2006.

Digital Library

Cited By

Dhokar AHlaoua LRomdhane L(2021)Tweet Contextualization Approach Using a Semantic Query ExpansionProcedia Computer Science10.1016/j.procs.2021.08.040192:C(387-396)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1016/j.procs.2021.08.040
Camilleri MWilliams C(2020)The Extended Dawid-Skene ModelMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-43823-4_11(121-136)Online publication date: 28-Mar-2020
https://doi.org/10.1007/978-3-030-43823-4_11
Xu YYin DZhou D(2019)Investigating Users' Tagging Behavior in Online Academic Community Based on Growth ModelInformation Systems Frontiers10.1007/s10796-018-9891-221:4(761-772)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s10796-018-9891-2
Show More Cited By

Index Terms

Exploring social annotations for information retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Exploring categorization property of social annotations for information retrieval
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

User generated social annotations provide extra information for describing document contents. In this paper, we propose an effective method to model the categorization property of social annotations and explore the potential of combining it with ...
Integrating social annotations into topic models for personalized document retrieval
Abstract
Social annotations are valuable resources generated by users on the Web, which encode abundant information on user preferences for certain documents. Social annotation-based information retrieval has been studied in recent years for personalizing ...
Using social annotations to improve language model for information retrieval
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

This poster is concerned with the problem of exploring the use of social annotations for improving language models for information retrieval (denoted as LMIR). Two properties of social annotations, namely keyword property and structure property are ...

Comments

comments powered by Disqus.

Information & Contributors

Information

Published In

WWW '08: Proceedings of the 17th international conference on World Wide Web

April 2008

1326 pages

ISBN:9781605580852

DOI:10.1145/1367497

General Chairs:
Jinpeng Huai
Beihang University, China
,
Robin Chen
AT&T Labs, USA
,
Hsiao-Wuen Hon
Microsoft Research Asia, China
,
Yunhao Liu
HK University of Science and Technology, Hong Kong
,
Program Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Andrew Tomkins
Yahoo! Research, USA
,
Xiaodong Zhang
The Ohio State University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '08

Sponsor:

ACM

WWW '08: The 17th International World Wide Web Conference

April 21 - 25, 2008

Beijing, China

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

122
Total Citations
View Citations
1,295
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dhokar AHlaoua LRomdhane L(2021)Tweet Contextualization Approach Using a Semantic Query ExpansionProcedia Computer Science10.1016/j.procs.2021.08.040192:C(387-396)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1016/j.procs.2021.08.040
Camilleri MWilliams C(2020)The Extended Dawid-Skene ModelMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-43823-4_11(121-136)Online publication date: 28-Mar-2020
https://doi.org/10.1007/978-3-030-43823-4_11
Xu YYin DZhou D(2019)Investigating Users' Tagging Behavior in Online Academic Community Based on Growth ModelInformation Systems Frontiers10.1007/s10796-018-9891-221:4(761-772)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s10796-018-9891-2
Pu XWu GYuan C(2019)User-aware topic modeling of online reviewsMultimedia Systems10.1007/s00530-017-0557-625:1(59-69)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s00530-017-0557-6
Xu BLin HLin YGuan Y(2019)Integrating social annotations into topic models for personalized document retrievalSoft Computing10.1007/s00500-019-03998-1Online publication date: 24-Apr-2019
https://doi.org/10.1007/s00500-019-03998-1
Phuc VNguyen VTuan L(2019)Exploiting Social Data to Enhance Web SearchFuture Data and Security Engineering10.1007/978-3-030-35653-8_38(593-607)Online publication date: 20-Nov-2019
https://doi.org/10.1007/978-3-030-35653-8_38
Brusilovsky PSmyth BShapira B(2018)Social SearchSocial Information Access10.1007/978-3-319-90092-6_7(213-276)Online publication date: 3-May-2018
https://doi.org/10.1007/978-3-319-90092-6_7
Ren XChen X(2017)Discovery and Dynamic Prediction of User's Interest Based on ARIMA2017 Portland International Conference on Management of Engineering and Technology (PICMET)10.23919/PICMET.2017.8125452(1-8)Online publication date: Jul-2017
https://doi.org/10.23919/PICMET.2017.8125452
Jiang HZhou RZhang LWang HZhang YLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)A Topic Model Based on Poisson DecompositionProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132942(1489-1498)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132942
Jingrui ZQinglin WYu LYuan L(2017)A method of optimizing LDA result purity based on semantic similarity2017 32nd Youth Academic Annual Conference of Chinese Association of Automation (YAC)10.1109/YAC.2017.7967434(361-365)Online publication date: May-2017
https://doi.org/10.1109/YAC.2017.7967434
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy