Skip to main content

Text Mining for News and Blogs Analysis

  • Living reference work entry
  • First Online:
Encyclopedia of Machine Learning and Data Mining
  • 607 Accesses

Abstract

News and blogs are temporally indexed online texts and play a key role in today’s information distribution and consumption. News communicate selected information on current events, written by professional or citizen journalists; blogs are updated publications on the Web that span a much wider range of topics, styles, and authors. Particularly important in recent years have been microblogs such as Twitter. The entry gives an overview of how text mining (for tasks such as description, classification, prediction, search, recommendation, or summarization) is applied to analyze the textual parts of news and blogs, extracting topics, events, opinions, sentiments, and other aspects of content. Often, textual analysis is complemented by the analysis of further data such as the social network of authors and readers. The properties of news and blogs data structures and language use require methods for preprocessing and analyzing that are tailored to news and (micro)blogs, and the tasks often profit from an interactive approach in which the user plays an active role in sensemaking. The methods are deployed in a wide range of applications and services.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Recommended Reading

  • Abel F, Gao Q, Houben G-J, Tao K (2011) Semantic enrichment of Twitter posts for user profile construction on the social web. In: Proceedings of ESWC (2), pp 375–389

    Google Scholar 

  • Allan J (ed) (2002) Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, Norwell

    MATH  Google Scholar 

  • Allen ND, Templon JR, McNally PS, Birnbaum L, Hammond K (2010) StatsMonkey: a data-driven sports narrative writer. In: Proceedings of 2010 AAAI fall symposium series. AAAI Press. http://www.aaai.org/ocs/index.php/FSS/FSS10/paper/view/2305

  • Atkinson M, Van der Goot E (2009) Near real time information mining in multilingual news. In: Proceedings of the 18th international conference on World Wide Web (WWW’09). ACM, New York, pp 1153–1154

    Chapter  Google Scholar 

  • Berendt B, Last M, Subašić I, Verbeke M (2014) New formats and interfaces for multi-document news summarization and its evaluation. In: Fiori, pp 231–255

    Google Scholar 

  • Berendt B, Trümper D (2009) Semantics-based analysis and navigation of heterogeneous text corpora: the Porpoise news and blogs engine. In: Ting I-H, Wu H-J (eds) Web mining applications in e-commerce and e-services. Springer, Berlin

    Google Scholar 

  • Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  MathSciNet  Google Scholar 

  • Castillo C, Davison BD (2011) Adversarial web search. Found Trends Inf Retr 4(5):377–486. doi:10.1561/1500000021

    Article  MATH  Google Scholar 

  • Cheng X, Roth D (2013) Relational inference for Wikification. In: Proceedings of EMNLP 2013, pp 1787–1796

    Google Scholar 

  • Diakopoulos N, De Choudhury M, Naaman M (2012) Finding and assessing social media information sources in the context of journalism. In: Proceedings of CHI 2012. ACM, pp 2451–2460

    Google Scholar 

  • Eisenstein J (2017) Written dialect variation in online social media. In: Boberg C, Nerbonne J, Watt D (eds) The handbook of dialectology. Wiley-Blackwell, Hoboken. Preprint available at http://www.cc.gatech.edu/jeisenst/papers/dialectology-chapter.pdf

    Google Scholar 

  • Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89

    Article  Google Scholar 

  • Fiori A (ed) (2014) Innovative document summarization techniques: revolutionizing knowledge understanding. IGI Global, Hershey

    Google Scholar 

  • Gamon M, Basu S, Belenko D, Fisher D, Hurst M, König AC (2008) BLEWS: using blogs to provide context for news articles. In: Adar E, Hurst M, Finin T, Glance N, Nicolov N, Tseng B, Salvetti F (eds) Proceedings of the second international conference on weblogs and social media (ICWSM’08), Seattle/Menlo Park. http://www.aaai.org/Papers/ICWSM/2008/ICWSM08-015.pdf

  • Gangemi A, Presutti V, Reforgiato Recupero D (2014) Frame-based detection of opinion holders and topics: a model and a tool. IEEE Comput Intell Mag 9(1):20–30

    Article  Google Scholar 

  • Guille A, Hacid H, Favre C, Zighed DA (2013) Information diffusion in online social networks: a survey. SIGMOD Rec 42(2):17–28

    Article  Google Scholar 

  • Hale S, Gaffney D, Graham M (2012) Where in the world are you? Geolocation and language identification in Twitter. In: Proceedings of ICWSM’12, pp 518–521

    Google Scholar 

  • Hayes C, Avesani P, Bojars U (2007) An analysis of bloggers, topics and tags for a blog recommender system. In: Berendt B, Hotho A, Mladeni D, Semeraro G (eds) From web to social web: discovering and deploying user and content profiles. LNAI 4737. Springer, Berlin

    Google Scholar 

  • Kleinberg JM (2002) Bursty and hierarchical structure in streams. In: Proceedings of SIGKDD 2002, pp 91–101

    Google Scholar 

  • Kolari P, Java A, Finin T, Oates T, Joshi A (2006) Detecting spam blogs: a machine learning approach. In: Proceedings of the 21st national conference on artificial intelligence. AAAI, Boston

    Google Scholar 

  • Kuzey E, Vreeken J, Weikum G (2014) A fresh look on knowledge bases: distilling named events from news. In: Proceedings of CIKM 2014, pp 1689–1698

    Google Scholar 

  • Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of WWW. ACM, pp 591–600

    Google Scholar 

  • Leban G, Fortuna B, Brank J, Grobelnik M (2014) Event registry: learning about world events from news. In: Proceedings of WWW 2014 (companion volume), pp 107–110

    Google Scholar 

  • Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Elder IV JF, Fogelman-Soulié F, Flach PA, Zaki MJ (eds) Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris/New York

    Google Scholar 

  • Li C, Weng J, He Q, Yao Y, Datta A, Sun A, Lee B-S (2012) TwiNER: named entity ecognition in targeted Twitter stream. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (SIGIR’12). ACM, New York, pp 721–730. doi:10.1145/2348283.2348380

    Google Scholar 

  • Mackie S, McCreadie R, Macdonald C, Ounis I (2014) Comparing algorithms for microblog summarisation. In: Proceedings of CLEF 2014, pp 153–159

    Google Scholar 

  • McCreadie R, Macdonald C, Ounis I, Osborne M, Petrovic S (2013) Scalable distributed event detection for Twitter. In: Proceedings of BigData conference 2013, pp 543–549

    Google Scholar 

  • McCreadie R, Soboroff I, Lin J, Macdonald C, Ounis I, McCullough D (2012) On building a reusable Twitter corpus. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (SIGIR’12). ACM, New York, pp 1113–1114. doi:10.1145/2348283.2348495

    Google Scholar 

  • Mei Q, Cai D, Zhang D, Zhai C (2008) Topic modeling with network regularization. In: Huai J, Chen R (eds) Proceeding of the 17th international conference on world wide web (WWW’08), Beijing/New York. doi:10.1007/978-0-387-30164-8_827

    Google Scholar 

  • Mishne G (2007) Using blog properties to improve retrieval. In: Glance N, Nicolov N, Adar E, Hurst M, Liberman M, Salvetto F (eds) Proceedings of the international conference on weblogs and social media (ICWSM), Boulder. http://www.icwsm.org/papers/paper25.html

  • Morstatter F, Pfeffer J, Liu H, Carley KM (2013) Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s firehose. In: Proceedings of ICWSM 2013. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6071

    Google Scholar 

  • Odijk D, Burscher B, Vliegenthart R, de Rijke M (2013) Automatic thematic content analysis: finding frames in news. In: Social informatics 2013. LNCS 8238. Springer, Berlin, pp 333–345

    Google Scholar 

  • Pang B, Lee L (2007) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Google Scholar 

  • Pollak S, Coesemans R, Daelemans W, Lavraè N (2011) Detecting contrast patterns in newspaper articles by combining discourse analysis and text mining. Pragmatics 21(4):647–683

    Article  Google Scholar 

  • Pon RK, Cardenas AF, Buttler D, Critchlow T (2007) Tracking multiple topics for finding interesting articles. In: Berkhin P, Caruana R, Wu X (eds) Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose/New York

    Google Scholar 

  • Potts (2013) Introduction to sentiment analysis. (slide set). http://www.stanford.edu/class/cs224u/slides/2013/cs224u-slides-02-26.pdf. Retrieved 15 Feb 2015

  • Radinsky K, Horvitz E (2013) Mining the web to predict future events. In: Proceedings of WSDM 2013, pp 255–264

    Google Scholar 

  • Radsch CC (2013) Digital dissidence & political change: cyberactivism and citizen journalism in Egypt. Doctoral Dissertation, School of International Service, American University. Available at SSRN:http://ssrn.com/abstract=2379913

  • Recasens M, Danescu-Niculescu-Mizil C, Jurafsky D (2013) Linguistic models for analyzing and detecting biased language. In: Proceedings of ACL

    Google Scholar 

  • Ren Z, Liang S, Meij E, de Rijke M (2013) Personalized time-aware tweets summarization. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval (SIGIR’13). ACM, New York, pp 513–522. doi:10.1145/2484028.2484052

    Google Scholar 

  • Ritter A, Clark S, Mausam, Etzioni O (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP’11). Association for Computational Linguistics, Stroudsburg, pp 1524–1534

    Google Scholar 

  • Shahaf D, Guestrin C (2010) Connecting the dots between news articles. In: Proceedings of SIGKDD 2010, pp 623–632

    Google Scholar 

  • Sudhahar S, de Fazio G, Franzosi R, Cristianini N (2015) Network analysis of narrative content in large corpora. Nat Lang Eng 21(1):81–112

    Article  Google Scholar 

  • Štajner T, Rusu D, Dali L, Fortuna B, Mladenic D, Grobelnik M (2010) A service oriented framework for natural language text enrichment. Informatica (Ljublj.) 34(3):307–313

    Google Scholar 

  • Subašić I, Berendt B (2013) Story graphs: tracking document set evolution using dynamic graphs. Intell Data Anal 17(1):125–147

    Google Scholar 

  • Veale T, Hao Y (2010) Detecting ironic intent in creative comparisons. In: Coelho H, Studer R, Wooldridge M (eds) Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence. IOS Press, Amsterdam, pp 765–770

    Google Scholar 

  • Wang X, Wei F, Liu X, Zhou M, Zhang M (2011) Topic sentiment analysis in Twitter: a graph-based hashtag sentiment classification approach. In: Berendt B, de Vries A, Fan W, Macdonald C, Ounis I, Ruthven I (eds) Proceedings of the 20th ACM international conference on information and knowledge management (CIKM’11). ACM, New York, pp 1031–1040. doi:10.1145/2063576.2063726

    Google Scholar 

  • Wilson R (2013) Trending on Twitter: a look at algorithms behind trending topics. Ignite social media blog. http://www.ignitesocialmedia.com/twitter-marketing/trending-on-twitter-a-look-at-algorithms-behind-trending-topics/. Retrieved 15 Feb 2015

  • Zafarani R, Abbasi MA, Liu H (2014) Social media mining: an introduction. Cambridge University Press, Cambridge

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bettina Berendt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this entry

Cite this entry

Berendt, B. (2016). Text Mining for News and Blogs Analysis. In: Sammut, C., Webb, G. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7502-7_833-1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4899-7502-7_833-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Online ISBN: 978-1-4899-7502-7

  • eBook Packages: Living Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy