Abstract
Heterogeneous information networks have drawn much attention in recent years due to their significant applications, such as text mining, e-commerce, social networks, and bioinformatics. Clustering different types of objects simultaneously based upon not only their relations of the same type, but also the relations between different types of objects can improve the clustering quality mutually. In this paper, we propose a general model, in which both the homogeneous and heterogeneous relations are considered simultaneously, to describe the structure of the heterogeneous information networks and devise a novel parametric free multi-type overlapped clustering approach. In this model, different types of relations between different types of objects are represented by a group of matrices. In this way, we transfer the multi-type clustering problem into the information compression problem. Subsequently, greedy search approaches, which aim at describing the group of relational matrices with least bits, are proposed. Moreover, by discovering the discriminative clusters among different types of objects, we devise effective parameter-free strategies to discover either overlapping or non-overlapping structure among different types of clusters. Extensive experiments on real-world and synthetic data sets demonstrate our methods are effective and efficient.















Similar content being viewed by others
Notes
All logarithms are based on 2 in this paper.
References
Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466:761–764
Banerjee A, Basu S, Merugu S (2007) Multi-way clustering on relation graphs. Proceedings of the 7th SIAM international conference on data mining. SIAM, Minneapolis, MN, USA, pp 145–156
Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J Mach Learn Res 8:1919–1986
Barron A, Rissanen J, Yu B (1998) The minimum description length principle in coding and modeling. IEEE Trans Inf Theory 44(6):2743–2760
Bekkerman R, Mccallum A (2005) Multi-way distributional clustering via pairwise interactions. Proceedings of the 22nd international conference on machine learning. ACM, Bonn, pp 41–48
Bekkerman R, Jeon J (2007) Multi-modal clustering for multimedia collections. Computer society conference on computer vision and pattern recognition. IEEE Computer Society, Minneapolis, MN, USA, pp 1–8
Chakrabarti D, Papadimitriou S, Modha DS, Faloutsos C (2004) Fully automatic cross-associations. Proceedings of the 10th international conference on knowledge discovery and data mining. ACM, Seattle, Washington, DC, USA, pp 79–88
Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semisupervised heterogeneous data coclustering. IEEE Trans Knowl Data Eng 22(10):1459–1474
Cheng YZ, Church GM (2000) Biclustering of expression data. International conference on intelligent systems for molecular biology 8:93–103
Cho H, Dhillon IS, Guan YQ, Sra S (2004) Minimum sum-squared residue co-clustering of gene expression data. Proceedings of the 4th international conference on data mining. SIAM, Lake Buena Vista, FL, USA, pp 114–125
Collins LM, Dent CM (1998) Omega: a general formulation of the rand index of cluster recovery suitable for non-disjoint solutions. Multivar Behav Res 23(2):231–242
Cook DJ, Holder LB (1994) Substructure discovery using minimum description length and background knowledge. J Artif Intell Res 1:231–255
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the 7th international conference on knowledge discovery and data mining. ACM, San Francisco, CA, USA, pp 269–274
Dhillon IS, Guan YQ (2003) Information theoretic clustering of sparse co-occurrence data. Proceedings of the 9th international conference on knowledge discovery and data mining. IEEE Computer Society, Melbourne, FL, USA, pp 517–528
Dhillon IS, Mallela S, Modha DS (2003) Information theoretic co-clustering. Proceedings of the 9th international conference on knowledge discovery and data mining. ACM, Washington DC, pp 89–98
Gao B, Liu TY, Zheng X, Cheng QS, Ma WY (2005) Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. Proceedings of the 11th international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 41–50
Gao B, Liu TY, Ma WY (2006) Star-structured high-order heterogeneous data co-clustering based on consistent information theory. 6th international conference on data mining. IEEE Computer Society, Hong Kong, pp 880–884
Gossen T, Kotzyba M, Nürnberger A (2014) Graph clusterings with overlaps: adapted quality indices and a generation model. Neurocomputing 123:13–22
Gregory S (2009) Finding overlapping communities using disjoint community detection algorithms. In: Results of the 2009 international workshop on complex networks, Catania, pp 47–61
Guimerá R, Amaral LAN (2005) Functional cartography of complex metabolic networks. Nature 433(7028):895–900
Han EH, Karypis G (2000) Centroid-based document classification: analysis and experimental results. Proceedings of the 4th European conference on principles of data mining and knowledge discovery. Springer, Lyon, pp 424–431
Havemann F, Heinz M, Struck A, Gläser J (2011) Identification of overlapping communities and their hierarchy by locally calculating community-changing resolution levels. J Stat Mech Theory Exp 01:P01023
He JR, Tong H, Papadimitriou S, Rad TE, Faloutsos C, Carbonell J (2009) Pack: scalable parameter-free clustering on k-partite graphs. In: SDM workshop on link analysis. SIAM, John Ascuagas Nugget
Hubert L, Arabie P (1985) Comparing partitions. J Classif 1:193–218
Ienco D, Robardet C, Pensa R, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26(2):217–254
Koutra D, Kang U, Vreeken J, Faloutsos C (2014) VOG: summarizing and understanding large graphs. Proceedings of the 2014 international conference on data mining. SIAM, Philadelphia, PA, USA, pp 91–99
Lancichinetti A, Fortunato S, Kertesz J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033015
Lazzeroni L, Owen A (2000) Plaid models for gene expression data. Stat Sin 12:61–86
Lin WQ, Zhao YC, Yu PS, Deng B (2014) An effective approach on overlapping structures discovery for co-clustering. 16th Asia-Pacific web conference in web technologies and applications. Springer, Changsha, pp 56–67
Long B, Zhang ZF, Yu PS (2010) A general framework for relation graph clustering. Knowl Inf Syst 24:393–413
Long B, Wu YX, Zhang ZF, Yu PS (2006) Unsupervised learning on k-partite graphs. Proceedings of the 12th international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 317–326
Long B, Zhang ZF, Wu XY, Yu PS (2006) Spectral clustering for multi-type relational data. Proceedings of the 23rd international conference on machine learning. ACM, Apia, pp 585–592
Long B, Zhang ZF, Yu PS (2005) Co-clustering by block value decomposition. Proceedings of the 11th international conference on knowledge discovery and data mining. IEEE Computer Society, Binghamton, pp 635–640
Meo PD, Ferrara E, Fiumara G, Provetti A (2014) Mixing local and global information for community detection in large networks. J Comput Syst Sci 80(1):72–87
Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814
Papadimitriou S, Gionis A, Tsaparas P, Vaisanen RA, Mannila H, Faloutsos C (2005) Parameter-free spatial data mining using MDL. Proceedings of the 5th international conference on data mining. IEEE Computer Society, Houston, TX, USA, pp 346–353
Papadimitriou S, Sun J, Faloutsos C, Yu PS (2008) Hierarchical, parameter-free community discovery. European conference in machine learning and knowledge discovery in databases. Springer, Antwerp, Belgium, pp 170–187
Rosvall M, Bergstrom CT (2007) An information-theoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci USA 104:7327–7331
Sales MP, Guimerà R, Moreira A, Amaral L (2007) Extracting the hierarchical organization of complex systems. Proc Natl Acad Sci 104(39):15224–15229
Shiga M, Takigawa I, Mamitsuka H (2007) A spectral clustering approach to optimally combining numerical vectors with a modular network. Proceedings of the 13th international conference on knowledge discovery and data mining. ACM, San Jose, CA, USA, pp 647–656
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Sun YS, Yu YT, Han HW (2009) Ranking-based clustering of heterogeneous information networks with star network schema. Proceedings of the 15th international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 797–806
Tian Y, Hankins R, Patel J (2008) Efficient aggregation for graph summarization. Proceedings of the international conference on management of data (SIGMOD 2008). ACM, Vancouver, pp 567–580
Tsai C, Chiu C (2008) Developing a feature weight self-adjustment mechanism for a k-means clustering algorithm. Comput Stat Data Anal 52:4658–4672
Wakita K, Tsurumi T (2007) Finding community structure in mega-scale social networks. Proceedings of the 16th international conference on world wide web. ACM, Banff, AB, Canada, pp 1275–1276
Wang JD, Zeng HJ, Chen Z, Lu HJ, Tao L, Ma WY (2003) Recom:reinforcement clustering of multi-type interrelated data objects. Proceedings of the 26th annual international conference on research and development in information retrieval. ACM, New York, NY, USA, pp 274–281
Wang XF, Tang L, Gao HJ, Liu H (2010) Discovering overlapping groups in social media. 10th international conference on data mining. IEEE Computer Society, Sydney, pp 569–578
Xu X, Yuruk N, Feng Z, Schweiger TAJ (2007) Scan: A structural clustering algorithm for networks. Proceedings of the 13th international conference on knowledge discovery and data mining. ACM, San Jose, CA, USA, pp 824–833
Acknowledgments
Wangqun Lin and Bo Deng are supported by National Natural Science Foundation of China through Grant 61271252. Philip S. Yu and Yuchen Zhao are supported by NSF through Grant CNS-1115234, Google Research Award, and the Pinnacle Lab at Singapore Management University.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Lin, W., Yu, P.S., Zhao, Y. et al. Multi-type clustering in heterogeneous information networks. Knowl Inf Syst 48, 143–178 (2016). https://doi.org/10.1007/s10115-015-0869-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0869-9