Abstract
Social media networks are playing increasingly prominent role in people’s daily life. Community structure is one of the salient features of social media network and has been applied to practical applications, such as recommendation system and network marketing. With the rapid expansion of social media size and surge of tremendous amount of information, how to identify the communities in big data scenarios has become a challenge. Based on our previous work and the map equation (an equation from information theory for community mining), we develop a novel distributed community structure mining framework. In the framework, (1) we propose a new link information update method to try to avoid data writing related operations and try to speedup the process. (2) We use the local information from the nodes and their neighbors, instead of the pagerank, to calculate the probability distribution of the nodes. (3) We exclude the network partitioning process from our previous work and try to run the map equation directly on MapReduce. Empirical results on real-world social media networks and artificial networks show that the new framework outperforms our previous work and some well-known algorithms, such as Radetal, FastGN, in accuracy, velocity and scalability.







Similar content being viewed by others
References
Andreev, K., Racke, H.: Balanced graph partitioning. Theory Comput. Syst. 39(6), 929–939 (2006)
Borthakur, D.: HDFS architecture guide, HADOOP APACHE PROJECT. http://hadoop.apache.org/common/docs/current/hdfs_design (2008)
Cambria, E., Rajagopal, D., Olsher, D., Das, D.: Big social data analysis. Big Data Comput. 401–414 (2013)
Chen, Y., Huang, C., Zhai, K.: Scalable community detection algorithm with MapReduce. Commun. ACM 53, 359–366 (2009)
Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)
Cooper, S.: The largest social networks in the world include some big surprises, Business Insider, New York, USA. http://www.businessinsider.com/the-largest-social-networks-in-the-world-2013-12 Accessed Jan 2014
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Gleiser, P.M., Danon, L.: Community structure in jazz. Adv. Complex Syst. 6(04), 565–573 (2003)
Huffman, D.A.: A method for the construction of minimum redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)
Ihara, S.: Information Theory for Continuous Systems. World Scientific, Singapore (1993)
Jin, S., Li, A., Yang, S., Lin, W., Deng, B., Li, S.: A MapReduce and information compression based social community structure mining method, IEEE 16th International Conference on Computational Science and Engineering (CSE), 2013, pp. 971–980. (2013)
Jin, S., Yu, P., Li, S., Yang, S.: A parallel community structure mining method in big social networks, mathematical problems in engineering, (in Press) http://downloads.hindawi.com/journals/mpe/aip/934301 (2014)
Kalyanaraman, R.A.: An efficient MapReduce algorithm for parallelizing large-scale graph clustering,In: ParGraph—Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs, Held in conjunction with HiPC’11. Bengaluru, India (2011)
Kernighan, B.W., Lin, S.: An efficient Heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–308 (1970)
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)
Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)
Leskovec, J., Lang, K. J., & Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web, 631–640 (2010)
MacQueen, J.: Some methods for classification and analysis of multivariate observations, In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 1(14), 281–297 (1967)
Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)
Orman, G.K., Labatut, V., Cherifi, H.: Comparative evaluation of community detection algorithms: a topological approach. J. Stat. Mech. Theory Exp. 2012(08), P08001 (2012)
Pasco, R.C.: Source coding algorithms for fast data compression. Stanford University, Ph.D. dissertation (1976)
Plantié, M., Michel, C.: Survey on Social Community Detection, Social Media Retrieval. Springer, London (2013)
Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430 (1990)
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 101(9), 2658–2663 (2004)
Riedy, E.J., Meyerhenke, H., Ediger, D., Bader, D.A.: Parallel community detection for massive graphs. In: Parallel Processing and Applied Mathematics, pp. 286–296. Springer, Berlin Heidelberg (2012)
Rosvall, M., Esquivel, A., Lancichinetti, A., West, J., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 2014, doi:10.1038/ncomms5630
Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. 104(18), 7327–7331 (2007)
Rosvall, M., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Comput. Commun. Rev. 5(1), 3–55 (2001)
Staudt, C.L., Meyerhenke, H.: Engineering parallel algorithms for community detection in massive networks. arXiv:1304.4453 (2014)
Yang, B., Liu, D., Liu, J.: Discovering communities from social networks: methodologies and applications. In: Furht, B. (ed.) Handbook of Social Network Technologies and Applications, pp. 331–346. Springer, New York, USA (2010)
Acknowledgments
The authors would like to express our sincere gratitude to Professor Philip S. Yu from University of Illinois at Chicago, Mr. Zhang Yuchao from Beijing Institute of System Engineering for providing great assistance through the entire research process. Besides, this work was supported in part by the National High-Tech Research and Development Program of China (2012AA012600), National Natural Science Foundation of China (61202362, 61472433).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jin, S., Lin, W., Yin, H. et al. Community structure mining in big data social media networks with MapReduce. Cluster Comput 18, 999–1010 (2015). https://doi.org/10.1007/s10586-015-0452-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-015-0452-x