Abstract
News and blogs are temporally indexed online texts and play a key role in today’s information distribution and consumption. News communicate selected information on current events, written by professional or citizen journalists; blogs are updated publications on the Web that span a much wider range of topics, styles, and authors. Particularly important in recent years have been microblogs such as Twitter. The entry gives an overview of how text mining (for tasks such as description, classification, prediction, search, recommendation, or summarization) is applied to analyze the textual parts of news and blogs, extracting topics, events, opinions, sentiments, and other aspects of content. Often, textual analysis is complemented by the analysis of further data such as the social network of authors and readers. The properties of news and blogs data structures and language use require methods for preprocessing and analyzing that are tailored to news and (micro)blogs, and the tasks often profit from an interactive approach in which the user plays an active role in sensemaking. The methods are deployed in a wide range of applications and services.
Similar content being viewed by others
Recommended Reading
Abel F, Gao Q, Houben G-J, Tao K (2011) Semantic enrichment of Twitter posts for user profile construction on the social web. In: Proceedings of ESWC (2), pp 375–389
Allan J (ed) (2002) Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, Norwell
Allen ND, Templon JR, McNally PS, Birnbaum L, Hammond K (2010) StatsMonkey: a data-driven sports narrative writer. In: Proceedings of 2010 AAAI fall symposium series. AAAI Press. http://www.aaai.org/ocs/index.php/FSS/FSS10/paper/view/2305
Atkinson M, Van der Goot E (2009) Near real time information mining in multilingual news. In: Proceedings of the 18th international conference on World Wide Web (WWW’09). ACM, New York, pp 1153–1154
Berendt B, Last M, Subašić I, Verbeke M (2014) New formats and interfaces for multi-document news summarization and its evaluation. In: Fiori, pp 231–255
Berendt B, Trümper D (2009) Semantics-based analysis and navigation of heterogeneous text corpora: the Porpoise news and blogs engine. In: Ting I-H, Wu H-J (eds) Web mining applications in e-commerce and e-services. Springer, Berlin
Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Castillo C, Davison BD (2011) Adversarial web search. Found Trends Inf Retr 4(5):377–486. doi:10.1561/1500000021
Cheng X, Roth D (2013) Relational inference for Wikification. In: Proceedings of EMNLP 2013, pp 1787–1796
Diakopoulos N, De Choudhury M, Naaman M (2012) Finding and assessing social media information sources in the context of journalism. In: Proceedings of CHI 2012. ACM, pp 2451–2460
Eisenstein J (2017) Written dialect variation in online social media. In: Boberg C, Nerbonne J, Watt D (eds) The handbook of dialectology. Wiley-Blackwell, Hoboken. Preprint available at http://www.cc.gatech.edu/jeisenst/papers/dialectology-chapter.pdf
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89
Fiori A (ed) (2014) Innovative document summarization techniques: revolutionizing knowledge understanding. IGI Global, Hershey
Gamon M, Basu S, Belenko D, Fisher D, Hurst M, König AC (2008) BLEWS: using blogs to provide context for news articles. In: Adar E, Hurst M, Finin T, Glance N, Nicolov N, Tseng B, Salvetti F (eds) Proceedings of the second international conference on weblogs and social media (ICWSM’08), Seattle/Menlo Park. http://www.aaai.org/Papers/ICWSM/2008/ICWSM08-015.pdf
Gangemi A, Presutti V, Reforgiato Recupero D (2014) Frame-based detection of opinion holders and topics: a model and a tool. IEEE Comput Intell Mag 9(1):20–30
Guille A, Hacid H, Favre C, Zighed DA (2013) Information diffusion in online social networks: a survey. SIGMOD Rec 42(2):17–28
Hale S, Gaffney D, Graham M (2012) Where in the world are you? Geolocation and language identification in Twitter. In: Proceedings of ICWSM’12, pp 518–521
Hayes C, Avesani P, Bojars U (2007) An analysis of bloggers, topics and tags for a blog recommender system. In: Berendt B, Hotho A, Mladeni D, Semeraro G (eds) From web to social web: discovering and deploying user and content profiles. LNAI 4737. Springer, Berlin
Kleinberg JM (2002) Bursty and hierarchical structure in streams. In: Proceedings of SIGKDD 2002, pp 91–101
Kolari P, Java A, Finin T, Oates T, Joshi A (2006) Detecting spam blogs: a machine learning approach. In: Proceedings of the 21st national conference on artificial intelligence. AAAI, Boston
Kuzey E, Vreeken J, Weikum G (2014) A fresh look on knowledge bases: distilling named events from news. In: Proceedings of CIKM 2014, pp 1689–1698
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of WWW. ACM, pp 591–600
Leban G, Fortuna B, Brank J, Grobelnik M (2014) Event registry: learning about world events from news. In: Proceedings of WWW 2014 (companion volume), pp 107–110
Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Elder IV JF, Fogelman-Soulié F, Flach PA, Zaki MJ (eds) Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris/New York
Li C, Weng J, He Q, Yao Y, Datta A, Sun A, Lee B-S (2012) TwiNER: named entity ecognition in targeted Twitter stream. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (SIGIR’12). ACM, New York, pp 721–730. doi:10.1145/2348283.2348380
Mackie S, McCreadie R, Macdonald C, Ounis I (2014) Comparing algorithms for microblog summarisation. In: Proceedings of CLEF 2014, pp 153–159
McCreadie R, Macdonald C, Ounis I, Osborne M, Petrovic S (2013) Scalable distributed event detection for Twitter. In: Proceedings of BigData conference 2013, pp 543–549
McCreadie R, Soboroff I, Lin J, Macdonald C, Ounis I, McCullough D (2012) On building a reusable Twitter corpus. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (SIGIR’12). ACM, New York, pp 1113–1114. doi:10.1145/2348283.2348495
Mei Q, Cai D, Zhang D, Zhai C (2008) Topic modeling with network regularization. In: Huai J, Chen R (eds) Proceeding of the 17th international conference on world wide web (WWW’08), Beijing/New York. doi:10.1007/978-0-387-30164-8_827
Mishne G (2007) Using blog properties to improve retrieval. In: Glance N, Nicolov N, Adar E, Hurst M, Liberman M, Salvetto F (eds) Proceedings of the international conference on weblogs and social media (ICWSM), Boulder. http://www.icwsm.org/papers/paper25.html
Morstatter F, Pfeffer J, Liu H, Carley KM (2013) Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s firehose. In: Proceedings of ICWSM 2013. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6071
Odijk D, Burscher B, Vliegenthart R, de Rijke M (2013) Automatic thematic content analysis: finding frames in news. In: Social informatics 2013. LNCS 8238. Springer, Berlin, pp 333–345
Pang B, Lee L (2007) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Pollak S, Coesemans R, Daelemans W, Lavraè N (2011) Detecting contrast patterns in newspaper articles by combining discourse analysis and text mining. Pragmatics 21(4):647–683
Pon RK, Cardenas AF, Buttler D, Critchlow T (2007) Tracking multiple topics for finding interesting articles. In: Berkhin P, Caruana R, Wu X (eds) Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose/New York
Potts (2013) Introduction to sentiment analysis. (slide set). http://www.stanford.edu/class/cs224u/slides/2013/cs224u-slides-02-26.pdf. Retrieved 15 Feb 2015
Radinsky K, Horvitz E (2013) Mining the web to predict future events. In: Proceedings of WSDM 2013, pp 255–264
Radsch CC (2013) Digital dissidence & political change: cyberactivism and citizen journalism in Egypt. Doctoral Dissertation, School of International Service, American University. Available at SSRN:http://ssrn.com/abstract=2379913
Recasens M, Danescu-Niculescu-Mizil C, Jurafsky D (2013) Linguistic models for analyzing and detecting biased language. In: Proceedings of ACL
Ren Z, Liang S, Meij E, de Rijke M (2013) Personalized time-aware tweets summarization. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval (SIGIR’13). ACM, New York, pp 513–522. doi:10.1145/2484028.2484052
Ritter A, Clark S, Mausam, Etzioni O (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP’11). Association for Computational Linguistics, Stroudsburg, pp 1524–1534
Shahaf D, Guestrin C (2010) Connecting the dots between news articles. In: Proceedings of SIGKDD 2010, pp 623–632
Sudhahar S, de Fazio G, Franzosi R, Cristianini N (2015) Network analysis of narrative content in large corpora. Nat Lang Eng 21(1):81–112
Štajner T, Rusu D, Dali L, Fortuna B, Mladenic D, Grobelnik M (2010) A service oriented framework for natural language text enrichment. Informatica (Ljublj.) 34(3):307–313
Subašić I, Berendt B (2013) Story graphs: tracking document set evolution using dynamic graphs. Intell Data Anal 17(1):125–147
Veale T, Hao Y (2010) Detecting ironic intent in creative comparisons. In: Coelho H, Studer R, Wooldridge M (eds) Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence. IOS Press, Amsterdam, pp 765–770
Wang X, Wei F, Liu X, Zhou M, Zhang M (2011) Topic sentiment analysis in Twitter: a graph-based hashtag sentiment classification approach. In: Berendt B, de Vries A, Fan W, Macdonald C, Ounis I, Ruthven I (eds) Proceedings of the 20th ACM international conference on information and knowledge management (CIKM’11). ACM, New York, pp 1031–1040. doi:10.1145/2063576.2063726
Wilson R (2013) Trending on Twitter: a look at algorithms behind trending topics. Ignite social media blog. http://www.ignitesocialmedia.com/twitter-marketing/trending-on-twitter-a-look-at-algorithms-behind-trending-topics/. Retrieved 15 Feb 2015
Zafarani R, Abbasi MA, Liu H (2014) Social media mining: an introduction. Cambridge University Press, Cambridge
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this entry
Cite this entry
Berendt, B. (2016). Text Mining for News and Blogs Analysis. In: Sammut, C., Webb, G. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7502-7_833-1
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7502-7_833-1
Received:
Accepted:
Published:
Publisher Name: Springer, Boston, MA
Online ISBN: 978-1-4899-7502-7
eBook Packages: Living Reference Computer SciencesReference Module Computer Science and Engineering