COMPLEX NETWORKS 2017 Paper 190
COMPLEX NETWORKS 2017 Paper 190
1 Introduction
Many complex networks lend themselves to the use of graphs for analyzing and
modelling their structure. Usually, vertices of the graph stand for the nodes of the
network and the edges between vertices stand for (possible) interactions between
nodes of the network. This approach has proven to be useful to identify non triv-
ial properties of the structure of networks in very different contexts, ranging from
computer science (the Internet, peer-to-peer networks, the web), to biology (protein-
Raphael Tackx
Sorbonne Universités, CNRS, LIP6, UMR 7606, e-mail: raphael.tackx@lip6.fr
Fabien Tarissan
Universités Paris-Saclay, CNRS, ISP, cole Normale Suprieure de Paris-Saclay e-mail:
fabien.tarissan@ens-paris-saclay.fr
Jean-Loup Guillaume
University of La Rochelle, L3I e-mail: jean-loup.guillaume@univ-lr.fr
1
2 Raphael Tackx, Fabien Tarissan, and Jean-Loup Guillaume
2.1 Notations
Fig. 2 Example of the weighted >-projection of B using common neighbors as similarity function.
For instance, if one is interested in the >-projection, one can study how > nodes
connect according to their similarity measured by their links towards common ⊥
nodes. Formally, such a similarity is captured by a similarity function θ . This allows
to formally define the weighted projected graph G> = (>, θ ) where θ : > × > 7→
R+ . This graph thus indicates the strength of the relations between > nodes. The
>-projection of the bipartite graph in Figure 1 will therefore result in the graph
depicted Figure 2.
Note that in the rest of the paper, we will use the standard common neighbor
function θ (x, y) = |N> (x) ∩ N> (y)|. But this approach easily extends to other simi-
larity functions such as jaccard index [14], resource allocation [15] or adamic-adar
coefficient [16]2 .
2.2 C OM S IM algorithm
2 Depending on the similarity function used, the projection might result in a directed weighted
graph if θ is not symmetric.
C OM S IM: A bipartite community detection algorithm using cycle and node’s similarity 5
return P and K
return P0 and R
In order to evaluate the relevance of C OM S IM, we will compare the detected com-
munities with the ones of the three baseline detection algorithms described below.
6 Raphael Tackx, Fabien Tarissan, and Jean-Loup Guillaume
3 Evaluation of C OM S IM
This section is devoted to assess the relevance of the proposed method. We start by
investigating how the different algorithms behave on two small networks equipped
with existing communities (Section 3.1) before showing how C OM S IM scales up
when dealing with large-scale networks (Section 3.2).
We first apply our algorithm to two networks which are small but are provided with
a notion of ground-truth communities that we use as a reference to compare the four
algorithms.
Fig. 3 Evaluation of the quality of the partitions detected by the algorithms on 20 newsgroups and
Southern Women.
In order to test the performance of our algorithm both in terms of efficiency and
quality, we rely here on a large dataset extracted from the Internet Movie Database
(IMDb). This dataset [27] presents a bipartite network composed of 118 258 actors
(⊥) who played in 122 131 movies (>) between 1980 and 20103 .
Table 1 presents the performances in terms of execution time and memory peak
for the four algorithms on the three datasets. This shows that Louvain remains the
most efficient algorithm in terms both of time and memory, revealing to be slower
only on the smallest dataset.
However, it should be highlighted here that the performances of Louvain shown
in Table 1 have been recorded after the >-projection. This means that part of the
computation load related to the θ function has been avoided, which is not the case
for the other algorithms. It thus mechanically favour the Louvain approach.
To that regard, it is worth noticing that our algorithm presents good results. On
IMDb in particular, C OM S IM is only slightly slower than Louvain and three times
faster than Infomap.
The results above show that our algorithm can scale up to large networks but that
it provides no insight on the quality of the detected communities. In contrast to the
previous section where we had ground-truth knowledge of the good partitions, no
study conducted on the IMDb dataset proposes an objective and external partition
of the nodes. It is thus impossible to use here either NMI or F1-score to compare
the three remaining algorithms 4 .
In order to assess the quality of the proposed communities, we follow instead
the proposition made in [28] where the authors introduce two goodness functions in
an attempt to quantify how relevant a community is regarding two properties that
we adapted for the case of bipartite graphs: the Density (or Internal Density) and
Separability.
3 For an homogeneous analysis, we removed all TV shows and documentaries and kept only the 7
first actors listed in the casting.
4 Since LP BRIM does not scale up to the size of IMDb, we avoid mentioning this approach in the
Fig. 4 Scatter plot displaying the relation between properties of the communities and their size for
C OM S IM (top), Louvain (middle) and Infomap (bottom) on IMDb.
of the proposed approach, both in terms of efficiency and quality of the detected
communities.
4 Conclusions
Acknowledgements
References
1. Duncan J Watts and Steven H Strogatz. Collective dynamics of ’small-world’ networks. na-
ture, 393(6684):440–442, 1998.
2. Ramon Ferrer i Cancho and Richard V Solé. The small world of human language. Proceedings
of the Royal Society of London. Series B: Biological Sciences, 268(1482):2261–2265, 2001.
3. Mark EJ Newman, Duncan J Watts, and Steven H Strogatz. Random graph models of social
networks. Proceedings of the National Academy of Sciences of the United States of America,
99(Suppl 1):2566–2572, 2002.
4. Stefano Battiston and Michele Catanzaro. Statistical properties of corporate board and direc-
tor networks. The European Physical Journal B-Condensed Matter and Complex Systems,
38(2):345–352, 2004.
5. Fabrice Le Fessant, Sidath Handurukande, A-M Kermarrec, and Laurent Massoulié. Clus-
tering in peer-to-peer file sharing workloads. In Peer-to-Peer Systems III, pages 217–226.
Springer, 2005.
6. Christophe Prieur, Dominique Cardon, Jean-Samuel Beuscart, Nicolas Pissard, and Pas-
cal Pons. The stength of weak cooperation: A case study on flickr. arXiv preprint
arXiv:0802.2317, 2008.
7. Yong-Yeol Ahn, Sebastian E Ahnert, James P Bagrow, and Albert-László Barabási. Flavor
network and the principles of food pairing. Scientific reports, 1, 2011.
8. Santo Fortunato. Community detection in graphs. Physics reports, 486(3):75–174, 2010.
9. Mark EJ Newman, Steven H Strogatz, and Duncan J Watts. Random graphs with arbitrary
degree distributions. Physics Reviews E, 64, 2001.
10. Xin Liu and Tsuyoshi Murata. Community detection in large-scale bipartite networks. In
Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence
and Intelligent Agent Technology - Volume 01, WI-IAT ’09, pages 50–57, Washington, DC,
USA, 2009. IEEE Computer Society.
11. Daniel B Larremore, Aaron Clauset, and Abigail Z Jacobs. Efficiently inferring community
structure in bipartite networks. Physical Review E, 90(1):012805, 2014.
12. Arnau Prat-Pérez, David Dominguez-Sal, and Josep-Lluis Larriba-Pey. High quality, scalable
and parallel community detection for large real graphs. In Proceedings of the 23rd interna-
tional conference on World wide web, pages 225–236. ACM, 2014.
13. Sune Lehmann, Martin Schwartz, and Lars Kai Hansen. Biclique communities. Physical
Review E, 78(1):016108, 2008.
14. Paul Jaccard. Le coefficient generique et le coefficient de communaute dans la flore marocaine.
Impr. Commerciale, 1926.
15. Tao Zhou, Linyuan Lü, and Yi-Cheng Zhang. Predicting missing links via local information.
The European Physical Journal B-Condensed Matter and Complex Systems, 71(4):623–630,
2009.
16. Lada A Adamic and Eytan Adar. Friends and neighbors on the web. Social networks,
25(3):211–230, 2003.
17. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast
unfolding of communities in large networks. Journal of statistical mechanics: theory and
experiment, 2008(10):P10008, 2008.
18. Mark EJ Newman. Modularity and community structure in networks. Proceedings of the
national academy of sciences, 103(23):8577–8582, 2006.
19. Martin Rosvall, Daniel Axelsson, and Carl T Bergstrom. The map equation. The European
Physical Journal-Special Topics, 178(1):13–23, 2009.
20. M. J. Barber. Modularity and community detection in bipartite networks. Physical Review E,
76(6):066102, December 2007.
21. Allison Davis, Burleigh B. Gardner, and Mary R. Gardner. Deep South; a Social Anthropo-
logical Study of Caste and Class. The University of Chicago Press, Chicago, 1941.
22. Elna C Green. Southern strategies: Southern women and the woman suffrage question. Univ
of North Carolina Press, 1997.
12 Raphael Tackx, Fabien Tarissan, and Jean-Loup Guillaume
23. Linton C Freeman. Finding social groups: A meta-analysis of the southern women data. na,
2003.
24. Ken Lang. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth Interna-
tional Conference on Machine Learning, pages 331–339, 1995.
25. Andrea Lancichinetti, Santo Fortunato, and János Kertész. Detecting the overlapping and hi-
erarchical community structure in complex networks. New Journal of Physics, 11(3):033015,
2009.
26. Jaewon Yang and Jure Leskovec. Overlapping community detection at scale: a nonnegative
matrix factorization approach. In Proceedings of the sixth ACM international conference on
Web search and data mining, pages 587–596. ACM, 2013.
27. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christo-
pher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual
Meeting of the Association for Computational Linguistics: Human Language Technologies,
pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguis-
tics.
28. Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on
ground-truth. Knowledge and Information Systems, 42(1):181–213, 2015.