The 10 million follower fallacy: audience size does not prove domain-influence on Twitter

Cataldi, Mario; Aufaure, Marie-Aude

doi:10.1007/s10115-014-0773-8

The 10 million follower fallacy: audience size does not prove domain-influence on Twitter

Regular Paper
Published: 17 August 2014

Volume 44, pages 559–580, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

1336 Accesses
18 Citations
Explore all metrics

Abstract

With the advent of social networks and micro-blogging systems, the way of communicating with other people and spreading information has changed substantially. Persons with different backgrounds, age and education exchange information and opinions, spanning various domains and topics, and have now the possibility to directly interact with popular users and authoritative information sources usually unreachable before the advent of these environments. As a result, the mechanism of information propagation changed deeply, the study of which is indispensable for the sake of understanding the evolution of information networks. To cope up with this intention, in this paper, we propose a novel model which enables to delve into the spread of information over a social network along with the change in the user relationships with respect to the domain of discussion. For this, considering Twitter as a case study, we aim at analyzing the multiple paths the information follows over the network with the goal of understanding the dynamics of the information contagion with respect to the change of the topic of discussion. We then provide a method for estimating the influence among users by evaluating the nature of the relationship among them with respect to the topic of discussion they share. Using a vast sample of the Twitter network, we then present various experiments that illustrate our proposal and show the efficacy of the proposed approach in modeling this information spread.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

http://www.twitter.com.
http://www.tumblr.com.
http://www.facebook.com.
http://plus.google.com/.
A divulgative article expressing the same high-level idea can be found at http://www.nytimes.com/external/readwriteweb/2010/03/19/19readwriteweb-the-million-follower-fallacy-audience-size-d-3203.html.
www.facebook.com/.
http://secondlife.com/.
www.informatik.uni-trier.de/~ley/db/.
Please notice that, as pointed out in literature, the inverse document frequency factor cannot be positively applied in our work. In fact, it diminishes the weight of terms that occur very frequently in the corpus, while increasing the weight of terms that occur rarely. In our case, we believe that this is not a suitable weighting scheme. Terms that appear in most of the documents in the corpus are likely to be highly relevant for the domain. Please also notice that, in order to exclude common function words (such as conjunctions and articles), we have removed stop-words with common techniques, and we have only considered nouns in our computation.
An interesting observation is possible: the highest ranked $n$-grams are mostly uni-grams and simply reflect the distribution of the letters of the alphabet in the language of the document. In other words, the most frequent $n$-grams are most of the time correlated to the language. Thus, considering that the most frequent $n$-grams for the considered topic profiles resulted to be very similar due to this fact (while they start differing consistently in the lowest part of ranked $n$-grams list), we excluded from our analysis the uni-grams.
The sampling rate of the used Twitter account is 10 % over an average of 200 millions per day. More information are available at http://apiwiki.Twitter.com.
http://www.weibo.com.
http://www.facebook.com.

References

Adar E, Adamic LA (2005) Tracking information epidemics in blogspace. In: IEEE/WIC/ACM international conference on web intelligence, WI’05. IEEE Computer Society, pp 207–214. doi:10.1109/WI.2005.151
Aral S, Muchnik L, Sundararajan A (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc Natl Acad Sci 106(51):21544–21549
Article Google Scholar
Bakshy E, Karrer B, Adamic LA (2009) Social influence and the diffusion of user-created content. In: Proceedings of the 10th ACM conference on electronic commerce, EC’09. ACM, pp 325–334
Barabasi AL, Jeong H, Neda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Phys A: Stat Mech Appl 311(3–4):590–614
Article MathSciNet MATH Google Scholar
Castillo C, Mendoza M, Poblete B (2011) Information credibility on twitter. In: Proceedings of the 20th international conference on World wide web, WWW’11, pp 675–684. ACM, New York, NY, USA. doi:10.1145/1963405.1963500
Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: MDMKDD’10, pp 4:1–4:10. ACM, New York, NY, USA
Cataldi M, Di Caro L, Schifanella C (2014) Personalized emerging topic detection based on a term aging model. ACM Trans Intell Syst Technol 5(1):27. doi:10.1145/2542182.2542189
Cataldi M, Mittal N, Aufaure MA (2013) Estimating domain-based user influence in social networks. In: Proceedings of the 28th symposium on applied computing, SAC 2013. ACM, New York, NY, USA
Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, pp 161–175
Cha M, Benevenuto F, Ahn YY, Gummadi KP (2012) Delayed information cascades in flickr: measurement, analysis, and modeling. Comput Netw 56(3):1066–1076. doi:10.1016/j.comnet.2011.10.020
Article Google Scholar
Cha M, Benevenuto F, Haddadi H, Gummadi PK (2012) The world of connections and information flow in twitter. IEEE Trans Syst Man Cybern Part A 42(4):991–998
Article Google Scholar
Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring User Influence in Twitter: the million follower fallacy. In: Proceedings of the 4th international AAAI conference on weblogs and social media (ICWSM), The AAAI Press, Menlo Park, California, pp 10–17
Chubin DE (1976) The conceptualization of scientific specialties. Sociol Q 17(4):448–476
Article Google Scholar
Crane D (1969) Social structure in a group of scientists: a test of the “invisible college” hypothesis. Am Sociol Rev 3:335–352
Article Google Scholar
de Beaver D, Rosen R (1979) Studies in scientific collaboration. Scientometrics 1(2):133–149
Article Google Scholar
Di Caro L, Cataldi M, Schifanella C (2012) The d-index: discovering dependences among scientific collaborators from their bibliographic data records. Int J Scientometr. pp 1–25. doi:10.1007/s11192-012-0762-1
Erceg V, Greenstein LJ, Tjandra SY, Parkoff SR, Gupta A, Kulic B, Julius AA, Bianchi R (2006) An empirically based path loss model for wireless channels in suburban environments. IEEE J Sel A Commun 17(7):1205–1211. doi:10.1109/49.778178
Favenza A, Cataldi M, Sapino ML, Messina A (2008) Topic development based refinement of audio-segmented television news. In: NLDB’08, Springer, Berlin, Heidelberg, pp 226–232
Friedman N (2000) Being bayesian about network structure. In: Machine learning, pp 201–210
Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Mach Learn Res 3:679–707
MathSciNet Google Scholar
Goldenberg J, Libai B, Muller E (2001) Talk of the network: a complex systems look at the underlying process of word-of-mouth. Mark Lett 12(3):211–223
Article Google Scholar
Goyal A, Bonchi F, Lakshmanan LV (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining, WSDM’10ACM, New York, NY, USA, pp 241–250
Granovetter M (1978) Threshold models of collective behavior. Am J Sociol 83(6):1420–1443. doi:10.1086/226707
Article Google Scholar
Gruhl D, Guha R, Liben-Nowell D, Tomkins A (2004) Information diffusion through blogspace. In: WWW’04, pp 491–501. ACM
Gruhl D, Liben-Nowell D, Guha R, Tomkins A (2004) Information diffusion through blogspace. SIGKDD Explor Newsl 6(2):43–52. doi:10.1145/1046456.1046462
Hou H, Kretschmer H, Liu Z (2008) The structure of scientific collaboration networks in scientometrics. Scientometrics 75(2):189–202
Article Google Scholar
Joachims T (1997) A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In: Proceedings of the fourteenth international conference on machine learning, ICML’97 Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 143–151
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning, ECML ’98, Springer, London, UK, pp 137–142
Katz JS, Martin BR (1997) What is research collaboration? Res Policy 26:1–18
Article Google Scholar
Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03, ACM, New York, NY, USA, pp 137–146. doi:10.1145/956750.956769
Khanafiah D, Situngkir H (2004) Social balance theory: revisiting Heider’s balance theory for many agents. Technical report
Kumar R, Novak J, Raghavan P, Tomkins A (2004) Structure and evolution of blogspace. Commun ACM 47(12):35–39. doi:10.1145/1035134.1035162
Article Google Scholar
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: WWW’10, pp 591–600. ACM
Leskovec J, Adamic LA, Huberman BA (2007) The dynamics of viral marketing. ACM Trans Web 1(1). doi:10.1145/1232722.1232727
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: KDD ’06, pp 631–636. ACM. doi:10.1145/1150402.1150479
Liben-Nowell D, Kleinberg J (2003) The link prediction problem for social networks. In: CIKM ’03, pp 556–559. ACM
Melin G, Persson O (1996) Studying research collaboration using co-authorships. Scientometrics 36: 363–377
Moon S, You J, Kwak H, Kim D, Jeong H (2010) Understanding topological mesoscale features in community mining. In: 2010 second international conference on communication systems and networks (COMSNETS), IEEE Press, Piscataway, NJ, USA, pp 1–10
Newman MEJ (2001) Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E 64(1): 016131
Page L, Brin S, Motwani R, Winograd T (1998) The pagerank citation ranking: Bringing order to the web. In: WWW’98, pp 161–172
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Rocchio J (1971) Relevance feedback in information retrieval, pp 313–323
Romero DM, Galuba W, Asur S, Huberman BA (2011) Influence and passivity in social media. In: Proceedings of the 2011 European conference on machine learning and knowledge discovery in databases—Volume Part III, ECML PKDD’11. Springer, Berlin, Heidelberg, pp 18–33. http://dl.acm.org/citation.cfm?id=2034161.2034164
Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Boston
Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24:513–523
Article Google Scholar
Schifanella C, Caro LD, Cataldi M, Aufaure MA (2012) The d-index: a web environment for analyzing dependences among scientific collaborators. In: KDD, pp 1520–1523. ACM
Shapin S (1981) Laboratory life. The social construction of scientific facts. Med Hist 25(3):341–342
Article Google Scholar
Suen CY (1979) n-gram Statistics for natural language understanding and text processing. IEEE Trans Pattern Anal Mach Intell 1(2):164–172. doi:10.1109/TPAMI.1979.4766902
Article Google Scholar
Sun E, Rosenn I, Marlow C, Lento TM (2009) Gesundheit! modeling contagion through facebook news feed. In: Proceedings of International AAAI conference on weblogs and social media, 1–8
Weng J, Lim EP, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM international conference on Web search and data mining, WSDM ’10, pp. 261–270. ACM, New York, NY, USA. doi:10.1145/1718487.1718520
Wu S, Hofman JM, Mason WA, Watts DJ (2011) Who says what to whom on twitter. In: Proceedings of the 20th international conference on world wide web, WWW ’11, pp 705–714. ACM, New York, NY, USA. doi:10.1145/1963405.1963504
Yang Y (1999) An evaluation of statistical approaches to text categorization. J Inf Retr 1:67–88
Google Scholar
Zhao Q, Mitra P, Chen B (2007) Temporal and information flow based event detection from social text streams. In: Proceedings of the 22nd national conference on artificial intelligence, vol 2., AAAI’07AAAI Press, Menlo Park, California, pp 1501–1506
Zipf G (1949) Human behaviour and the principle of least-effort. Addison-Wesley, Cambridge
Google Scholar

Download references

Author information

Authors and Affiliations

LIASD - Université Paris 8, Saint-Denis, France
Mario Cataldi
Mas Laboratory - Ecole Centrale Paris, Châtenay-Malabry, France
Marie-Aude Aufaure

Authors

Mario Cataldi
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Aude Aufaure
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mario Cataldi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cataldi, M., Aufaure, MA. The 10 million follower fallacy: audience size does not prove domain-influence on Twitter. Knowl Inf Syst 44, 559–580 (2015). https://doi.org/10.1007/s10115-014-0773-8

Download citation

Received: 25 July 2013
Revised: 16 June 2014
Accepted: 27 July 2014
Published: 17 August 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10115-014-0773-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The 10 million follower fallacy: audience size does not prove domain-influence on Twitter

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of social interaction network properties and growth on Twitter

A Note on Modeling Retweet Cascades on Twitter

Utilizing the average node degree to assess the temporal growth rate of Twitter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

The 10 million follower fallacy: audience size does not prove domain-influence on Twitter

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of social interaction network properties and growth on Twitter

A Note on Modeling Retweet Cascades on Twitter

Utilizing the average node degree to assess the temporal growth rate of Twitter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.