Query-based unsupervised learning for improving social media search

Albishre, Khaled; Li, Yuefeng; Xu, Yue; Huang, Wei

doi:10.1007/s11280-019-00747-0

Query-based unsupervised learning for improving social media search

Published: 27 November 2019

Volume 23, pages 1791–1809, (2020)
Cite this article

World Wide Web Aims and scope Submit manuscript

433 Accesses
3 Citations
Explore all metrics

Abstract

In the current information era over the internet, social media has become one of the essential information sources for users. While the text is the primary information representation, finding relevant information is a challenging mission for researchers due to its nature (e.g., short length, sparseness). Acquiring high-quality search results from massive data, such as social media needs a set of representative query terms that are not always available. In this paper, we propose a novel query-based unsupervised learning model to represent the implicit relationships in the short text from social media. This bridges the gap of the lack of word co-occurrences without requiring many parameters to be estimated and external evidence to be collected. To confirm the proposed model effectiveness, we compare the proposed model with state-of-the-art lexical, topic model and temporal models on the large-scale TREC microblog 2011-2014 collections. The experimental results show that the proposed model significantly improved overall state-of-the-art lexical, topic model and temporal models with the maximum percentage of increase reaching 33.97% based on MAP value and 21.38% based on Precision at top 30 documents. The proposed model can improve the social media search effectiveness in potential closely retrieval tasks, such as question answering and timeline summarisation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online Topic Modeling for Short Texts

Topic Modeling for Short Texts: A Novel Modeling Method

Short-text topic modeling with dual reinforcement from internal and external semantics

Article 21 October 2024

Notes

The proposed model called query-based unsupervised short text mining (QUSTM)
http://github.com/lintool/twitter-tools
http://github.com/shuyo/ldig
The experiments are performed in a PC with an Intel(R) Core(TM) i7-4790 CPU @ 3.60 GHz and 16 GB memory running a Windows 7 operating system.

References

Abdul-Jaleel, N., Allan, J., Croft, W.B., Diaz, F., Larkey, L., Li, X., Smucker, M.D., Wade, C.: Umass at Trec 2004: Novelty and hard. In: TREC (2004)
Albakour, M., Macdonald, C., Ounis, I., et al.: On sparsity and drift for effective real-time filtering in microblogs. In: Proceedings of CIKM, pp 419–428 (2013)
Albishre, K., Albathan, M., Li, Y.: Effective 20 newsgroups dataset cleaning. In: Proceedings of WI-IAT, vol. 3, pp 98–101 (2015)
Albishre, K., Li, Y., Xu, Y.: Effective pseudo-relevance for microblog retrieval. In: Proceedings of the Australasian Computer Science Week Multiconference, ACSW ’17, pp. 51:1–51:6 (2017)
Albishre, K., Li, Y., Xu, Y.: Query-based automatic training set selection for microblog retrieval. In: Proceedings of PAKDD, pp. 325–336 (2018)
Chapter Google Scholar
Atefeh, F., Khreich, W.: A survey of techniques for event detection in twitter. Comput. Intell. 31(1), 132–164 (2015)
Article MathSciNet Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3(Jan), 993–1022 (2003)
MATH Google Scholar
Chen, Q., Hu, Q., Huang, J., He, L.: Taker: Fine-grained time-aware microblog search with kernel density estimation. IEEE Transactions on Knowledge and Data Engineering (2018)
Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Zheng, Z., Zha, H.: Time is of the essence: improving recency ranking using twitter data. In: Proceedings of WWW, pp. 331–340 (2010)
Efron, M., Golovchinsky, G.: Estimation methods for ranking recent information. In: Proceedings of SIGIR, pp. 495–504 (2011)
Efron, M., Lin, J., He, J., De Vries, A.: Temporal feedback for tweet search with non-parametric density estimation. In: Proceedings of SIGIR, pp. 33–42 (2014)
Fan, F., Qiangm, R., Lv, C., Yang, J.: Improving microblog retrieval with feedback entity model. In: Proceedings of CIKM, pp. 573–582 (2015)
Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE Trans. Knowl. Data Eng. 27(6), 1629–1642 (2015)
Article Google Scholar
Gao, Y., Li, Y., Lau, R., Xu, Y., Bashar, M.: Finding semantically valid and relevant topics by association-based topic selection model. ACM Transactions on Intelligent Systems and Technology 9(1), 3:1–3:22 (2017)
Article Google Scholar
Gao, Y., Wenbo, W., Qian, L., Heyan, H., Li, Y.: Extending embedding representation by incorporating latent relations. IEEE Access 6, 52682–52690 (2018)
Article Google Scholar
Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics, pp. 80–88 (2010)
Huang, J., Peng, M., Wang, H., Cao, J., Gao, W., Zhang, X.: A probabilistic method for emerging topic tracking in microblog stream. World Wide Web 20(2), 325–350 (2017)
Article Google Scholar
Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of CIKM, pp. 775–784 (2011)
Li, X., Croft, W.B.: Time-based language models. In: Proceedings of CIKM, pp. 469–475 (2003)
Li, Y., Algarni, A., Albathan, M., Shen, Y., Bijaksana, M.A.: Relevance feature discovery for text mining. IEEE Transactions on Knowledge and Data Engineering 27 (6), 1656–1669 (2015)
Article Google Scholar
Lin, J., Efron, M.: Overview of the trec-2013 microblog track. In: TREC, pp. 1–5 (2013)
Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: Proceedings of WWW, pp. 539–550 (2014)
Liu, S., Cheng, X., Li, F.: Ranking tweets by labeled and collaboratively selected pairs with transitive closure. In: Proceedings of AAAI, pp. 1235–1241 (2014)
Luo, Z., Osborne, M., Wang, T.: An effective approach to tweets opinion retrieval. World Wide Web 18(3), 545–566 (2015)
Article Google Scholar
Lv, C., Fan, F., Qiang, R., Fei, Y., Yang, J.: Pkuicst at trec 2014 Microblog Track: Feature Extraction for Effective Microblog Search and Adaptive Clustering Algorithms for Ttg. In: TREC (2014)
Martins, F., Magalhães, J., Callan, J.: Barbara made the news: Mining the behavior of crowds for time-aware learning to rank. In: Proceedings of WSDM, pp. 667–676 (2016)
Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving lda topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of SIGIR, pp. 889–892 (2013)
Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of CIKM, pp. 439–448 (2013)
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the trec-2011 microblog track. In: TREC, vol. 32 (2011)
Qiang, R., Liang, F., Yang, J.: Exploiting ranking factorization machines for microblog retrieval. In: Proceedings of CIKM, pp. 1783–1788 (2013)
Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. In: Proceedings of AAAI, vol. 10, p 16 (2010)
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at trec-3. Nist Special Publication 109, 109 (1995)
Google Scholar
Severyn, A., Moschitti, A., Tsagkias, M., Berendsen, R., De Rijke, M.: A syntax-aware re-ranker for microblog retrieval. In: Proceedings of SIGIR, pp. 1067–1070 (2014)
Shi, T., Kang, K., Choo, J., Reddy, C.K.: Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In: Proceedings of WWW, pp. 1105–1114 (2018)
Wang, Y., Huang, H., Feng, C.: Query expansion based on a feedback concept model for microblog retrieval. In: Proceedings of WWW, pp. 559–568 (2017)
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of WSDM, pp. 261–270 (2010)
Wu, S., Huang, C.: Search result diversification via data fusion. In: Proceedings of SIGIR, pp. 827–830 (2014)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of WWW, pp. 1445–1456 (2013)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334–342 (2001)
Zhang, Z., Wang, Q., Si, L., Gao, J.: Learning for efficient supervised query expansion via two-stage feature selection. In: Proceedings of SIGIR, pp. 265–274 (2016)
Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.P., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: Proceedings of ECIR, pp. 338–349 (2011)
Google Scholar
Zhong, N., Li, Y., Wu, S.T.: Effective pattern discovery for text mining. IEEE Trans. Knowl. Data Eng. 24(1), 30–44 (2012)
Article Google Scholar
Zhong, N., Liu, J., Yao, Y.: Web intelligence. Springer Science & Business Media (2013)
Zhong, N., Liu, J., Shi, Y., Yao, Y.: An interview with professor raj reddy on Web intelligence (wi) and computational social science (css). Web Intelligence 16 (3), 143–146 (2018)
Article Google Scholar
Zuo, Y., Wu, J., Zhang, H., Lin, H., Wang, F., Xu, K., Xiong, H.: Topic modeling of short texts: A pseudo-document view. In: Proceedings of KDD, pp. 2105–2114 (2016)

Download references

Acknowledgements

This paper was partially supported by Grant DP140103157 from the Australian Research Council (ARC).

Author information

Authors and Affiliations

School of EECS, Queensland University of Technology (QUT), Brisbane, Australia
Khaled Albishre, Yuefeng Li & Yue Xu
Umm Al-Qura University, Makkah, Saudi Arabia
Khaled Albishre
School of Economy and Management, Hubei University of Technology, Wuhan, 430064, Hubei, China
Wei Huang

Authors

Khaled Albishre
View author publications
You can also search for this author in PubMed Google Scholar
Yuefeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yue Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khaled Albishre.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Computational Social Science as the Ultimate Web Intelligence

Guest Editors: Xiaohui Tao, Juan D. Velasquez, Jiming Liu, and Ning Zhong

Rights and permissions

Reprints and permissions

About this article

Cite this article

Albishre, K., Li, Y., Xu, Y. et al. Query-based unsupervised learning for improving social media search. World Wide Web 23, 1791–1809 (2020). https://doi.org/10.1007/s11280-019-00747-0

Download citation

Received: 02 March 2019
Revised: 31 July 2019
Accepted: 09 October 2019
Published: 27 November 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11280-019-00747-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Query-based unsupervised learning for improving social media search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online Topic Modeling for Short Texts

Topic Modeling for Short Texts: A Novel Modeling Method

Short-text topic modeling with dual reinforcement from internal and external semantics

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Query-based unsupervised learning for improving social media search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online Topic Modeling for Short Texts

Topic Modeling for Short Texts: A Novel Modeling Method

Short-text topic modeling with dual reinforcement from internal and external semantics

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.