Modularity and Community Detection in Bipartite Networks
Modularity and Community Detection in Bipartite Networks
Michael J. Barber*
Austrian Research Centers GmbH—ARC, Bereich Systems Research, Vienna, Austria
共Received 16 July 2007; revised manuscript received 13 September 2007; published 7 December 2007兲
The modularity of a network quantifies the extent, relative to a null model network, to which vertices cluster
into community groups. We define a null model appropriate for bipartite networks, and use it to define a
bipartite modularity. The bipartite modularity is presented in terms of a modularity matrix B; some key
properties of the eigenspectrum of B are identified and used to describe an algorithm for identifying modules
in bipartite networks. The algorithm is based on the idea that the modules in the two parts of the network are
dependent, with each part mutually being used to induce the vertices for the other part into the modules. We
apply the algorithm to real-world network data, showing that the algorithm successfully identifies the modular
structure of bipartite networks.
nected based on joint participation in many teams. The re- From Eqs. 共1兲–共3兲, it is apparent that the choice of null
sulting modularity is thus focused on identifying modules in model has a profound impact on the modularity. Thus, for
only one part of the network at a time. Interesting, Guimerà example, a Bernoulli random graph with constant Pij = p for
et al. point out the possibilities of classifying both partite sets all i and j is a poor representation of most real-world net-
of the network simultaneously and of customizing spectral works, so would be an inappropriate choice of null model.
methods for bipartite networks, which is essentially the ap- Instead, the usual choice of null model 关19兴 assigns edges at
proach taken in the present work. random with the expected degrees of model vertices con-
As of this writing, we are aware of no other attempts to strained to match the degrees in the actual network.
define modularity for bipartite networks. However, bipartite In much the same fashion, bipartite networks have spe-
networks, or “two mode networks,” have undergone several cific constraints that should be reflected in the null model.
related studies in the sociology community using other meth- The vertices of a bipartite network can be partitioned into
ods 共see, e.g., Refs. 关21,22兴, and references cited therein兲. two disjoint sets such that no two vertices within the same
The structure of the paper is as follows: in Sec. II we set are adjacent. An equivalent, but more visual, definition is
define a modularity matrix and measure for bipartite net- that the vertices in a bipartite graph can be assigned one of
works. We discuss using the bipartite modularity matrix to two colors, say red and blue, with no neighboring vertices
identify modules in Sec. III, and apply the algorithm therein bearing the same color. In the remainder of this section, we
devised to two real-world networks in Sec. IV. Finally, we will define a null model with the above requirement that the
conclude in Sec. V with an assessment of the present inves- expected degrees match the degrees in the real network,
tigation and an outlook for future work. along with the additional constraint that each edge links a red
vertex and a blue vertex.
II. BIPARTITE MODULARITY Let p be the number of red vertices and q be the number
of blue vertices; this implies n = p + q. Without loss of gener-
In this section, we develop a modularity matrix for bipar- ality, assume that the vertices are indexed so that red vertices
tite networks. Structurally and notationally, the development are labeled 1 , 2 , . . . , p and the blue vertices are labeled p
parallels the discussion of the modularity matrix by Newman + 1 , p + 2 , . . . , p + q. The adjacency matrix then has a block
关19兴. off-diagonal form of
冋 册
Consider a network with n vertices and m edges defined
by an adjacency matrix A. Each vertex i is assigned to a O p⫻p à p⫻q
community group or module, denoted by gi. The modularity A= , 共4兲
Q for such an assignment reflects the extent, relative to a null 共ÃT兲q⫻p Oq⫻q
model, to which edges are formed within modules instead of
between modules. Formally, the modularity is defined as where Oi⫻j is the all-zero matrix with i rows and j columns.
Require the same block structure for P that is exhibited by A,
1
Q= 兺 共Aij − Pij兲␦共gi,g j兲, 共1兲
giving
冋 册
2m i,j
O p⫻p P̃ p⫻q
where the Aij are the adjacency matrix elements and the Pij P= . 共5兲
are probabilities in the null model that an edge exists be- 共P̃ 兲q⫻p Oq⫻q
T
冋 册
an index vector showing membership in module i; a value of
1 in position j of si indicates that vertex j belongs to module O p⫻p B̃ p⫻q
i. Given that each vertex is assigned to exactly one module, B= , 共6兲
共B̃ 兲q⫻p Oq⫻q
T
each row of S has a single unit value and the index vectors
are thus orthogonal.
Further, a modularity matrix B is defined with elements where B̃ = Ã − P̃. The all-zero blocks on the diagonal are the
potential modularity contributions from pairs of vertices of
Bij = Aij − Pij . 共2兲 the same color being present in a module; all meaningful
contributions, positive or negative, to the modularity thus are
Using S and B, the modularity becomes
made by pairs of vertices with distinct colors. In contrast,
1 with the usual null model based on unipartite networks 关19兴,
Q= Tr STBS. 共3兲 the corresponding blocks contain only negative elements 共or
2m
zeros for isolated nodes of degree zero兲, always providing a
The eigenspectrum of B has a fundamental relationship with modularity penalty for pairs of like-colored vertices in the
the modular nature of the network, as Newman 关19兴 has ex- same module.
plored. Equation 共1兲 can be rewritten as
066102-2
MODULARITY AND COMMUNITY DETECTION IN … PHYSICAL REVIEW E 76, 066102 共2007兲
d j = 兺 P̃ij = C 兺 kid j = 共Cm兲d j , 共14兲 Recall that the index vectors s1 and s2 take on values from
i=1 i=1 兵0, 1其. It is clear how to maximize the modularity in Eq. 共17兲:
so that C = 1 / m and thus when xi, the ith element of x, is positive, assign vertex i to
the first module by setting the ith entry of s1 to one, and
k id j when xi is negative, assign vertex i to the second module by
P̃ij = . 共15兲
m setting the ith entry of s2 to one 关30兴.
The use of multiple eigenvectors allows more than two
The same result can be obtained from Eqs. 共10兲 and 共13兲
modules to be considered 共c ⬎ 2兲, with at most one module
instead of Eqs. 共11兲 and 共12兲. With Eq. 共15兲, we have fully
more than the number of positive eigenvalues of B 关19兴.
defined the modularity Q for a bipartite network.
Additional eigenvectors of B can also be used for SOM 关19兴
III. MODULE IDENTIFICATION in a vector partitioning algorithm adapted from spectral par-
titioning 关31,32兴. In the present work, we will not make use
A. Spectral methods for module identification of this algorithm, nor of a recursive bipartitioning approach,
Using the modularity defined in Sec. II, we can assess the instead developing an alternative technique that capitalizes
quality of any partitioning of the vertices of a bipartite graph on the bipartite nature of the networks.
066102-3
MICHAEL J. BARBER PHYSICAL REVIEW E 76, 066102 共2007兲
B. Module identification in bipartite networks Substituting the partitioned matrices into Eq. 共3兲, we obtain
In Sec. III A, we have seen how to identify community 1
groups of networks by using the Newman vector to maxi- Q= Tr RTB̃T. 共22兲
m
mize Q. However, we made no use of the bipartite character
of the networks. For a bipartite network, the eigenvalue Our goal then becomes to assign network vertices to modules
equation Bxi = ixi can be written as such that Eq. 共22兲 is maximized.
冋 册冋 册 冋 册
O
B̃ T
B̃
O
ui
vi
= i
ui
vi
, 共18兲
One approach to optimizing the modularity as expressed
in Eq. 共22兲 is essentially the same as the Newman vector
approach considered in Sec. III A. Without loss of generality,
label the singular values such that 1 艌 2 艌 ¯ 艌 r ⬎ 0. Ap-
where ui is a p ⫻ 1 vector and vi is a q ⫻ 1 vector. The left- proximate B̃ as
hand side of Eq. 共18兲 can be multiplied out, giving
冋 册冋 册 冋 册 冋 册
O
B̃T O
B̃ ui
vi
=
B̃vi
B̃Tui
= i
ui
vi
, 共19兲
B̃ ⬇ 1u1vT1 .
Now, we bipartition the vertices with R = 关r1 兩 r2兴 and T
= 关t1 兩 t2兴, so that
共23兲
冋 册冋 册 冋 册 冋 册
O
B̃T O
B̃ ui
− vi
=
− B̃vi
B̃Tui
= − i
ui
− vi
. 共20兲
As with the Newman vector approach, Q is maximized by
assigning the vertices to modules based on the signs of the
corresponding component of u1 or v1, as appropriate. This
maximizes the magnitude of the inner products in Eq. 共24兲,
Hence, for any eigenvalue i of B, −i is also an eigenvalue with consistent assignment of both red and blue vertices to
of B. the same module based on the signs ensuring that positive
Since only the eigenvectors corresponding to positive ei- contributions are made to the modularity.
genvalues of B can give positive contributions to Q, we can
focus on just the positive eigenvalues i = 兩i兩 ⬎ 0. In this C. Recursive identification of bipartite modules
case, ui and vi are, respectively, left and right singular vec-
In Secs. III A and III B, we have seen how the leading
tors of B̃. If we shift our attention from the spectral decom-
eigenvector of B and the leading singular vectors of B̃ can be
position of B to the singular value decomposition 共SVD兲 of
used to bipartition network vertices. Extending these meth-
B̃, we therefore automatically exclude the eigenvectors of B ods to use the full modularity matrices and to handle more
that correspond to negative eigenvalues. than two modules is, in general, nontrivial. However, for the
The appearance of the singular vectors of B̃ is not surpris- bipartite case at least, there is a relatively straightforward
ing. All the information about the linkage structure of the extension that leads to a useful algorithm.
network is contained in B̃, and the singular value decompo- First, we assume that the blue vertices are all assigned to
sition is the natural generalization of the spectral decompo- modules through some mechanism. Maximizing the modu-
larity then consists solely of assigning the red vertices to
sition used for B to asymmetric matrices like B̃. What is
modules. This is a comparatively simple task. To see this,
more, the singular values and singular vectors of B̃ can rewrite Eq. 共22兲, giving
sometimes provide more information than the eigenvalues
and eigenvectors of B. 1 1
For example, the number of modules is at most one more Q= Tr RTB̃T = Tr RTT̃, 共25兲
m m
than the number of positive eigenvalues of B. Since, for each
vertex, the expected degree in the null model equals the ac- where we have aggregated the fixed terms into the matrix
tual degree in the network, the rows and columns of B̃ all T̃ = B̃T. We now write Eq. 共25兲 in terms of explicit sums, so
that
sum to zero. The rank r of B̃, which equals the number of
singular values of B̃, must then be less than both p and q.
From this, we conclude that the number of communities is at
most equal to the smaller of p and q.
1
c p
1
Q = 兺 兺 RikT̃ik = 兺
m k=1 i=1 m i=1
p
冉兺 冊
c
k=1
RikT̃ik . 共26兲
To assign vertices to modules using B̃, we first partition The inner sum in Eq. 共26兲 is a sum across the rows of R.
the index matrix S so that Since each row of R consists of a single 1 with all other
冋册
elements being 0, the modularity is now simple to maximize:
R
S= . 共21兲 we just assign red vertex i to module k such that T̃ik is the
T
maximum of the ith row of T̃ 关33兴.
The matrices R and T have dimensions p ⫻ c and q ⫻ c, re- Conversely, if the red vertices are all assigned to modules,
spectively, indexing the red and blue vertices into c modules. maximizing Q consists of assigning the blue vertices to mod-
066102-4
MODULARITY AND COMMUNITY DETECTION IN … PHYSICAL REVIEW E 76, 066102 共2007兲
冉兺 冊
q c
community x in scheme X and to community y in scheme Y
1 is proportional to the corresponding element Nxy of the con-
Q= 兺
m j=1 k=1
T jkR̃ jk . 共27兲 fusion matrix, so that
1
As with the red vertices, we maximize Q by assigning the jth P共X = x,Y = y兲 = Nxy . 共29兲
n
blue vertex to the module k such that R̃ jk is the maximum of
Using the probability as defined in Eq. 共29兲, we can calculate
the jth row of R̃. the normalized mutual information as
Taken together, these two maximization procedures define
an algorithm that we call BRIM 共bipartite, recursively in- 2I共X,Y兲
duced modules兲. The BRIM algorithm is an iterative algo- Inorm共X,Y兲 = . 共30兲
H共X兲 + H共Y兲
rithm for maximizing Q, with the sets of red and blue verti-
ces each recursively drawing the other into modular Equation 共30兲 is expressed in terms of the usual mutual in-
structures. For each iteration, Q is guaranteed never to de- formation I共X , Y兲 and entropies H共X兲 and H共X兲 关36兴, defined
crease, as it is always possible at least to maintain the previ- as
ous vertex partitioning and keep the modularity the same.
P共X,Y兲
Therefore, the BRIM algorithm will always find a partition at I共X,Y兲 = 兺 P共X,Y兲log , 共31兲
a maximum of Q. In general, the identified partition will x,y P共X兲P共Y兲
correspond to a local maximum in Q, not the global maxi-
mum. H共X兲 = − 兺 P共X兲log P共X兲, 共32兲
Note that the BRIM algorithm can work with the entire B̃ x
matrix, or a rank-restricted approximation calculated by
omitting the smallest singular values. By using the full B̃ H共Y兲 = − 兺 P共Y兲log P共Y兲. 共33兲
matrix, we automatically include all positive contributions to y
the modularity. As well, the algorithm can work with any
assumed number of modules; however, no constraint exists In Eqs. 共30兲–共33兲, we have made use of the common short-
to ensure that each module is occupied. hand abbreviations P共X = x , Y = y兲 = P共X , Y兲, P共X = x兲 = P共X兲,
To test the efficacy of the BRIM algorithm, we apply it to and P共Y = y兲 = P共Y兲. The base of the logarithms in Eqs.
a simple model network. The model consists of Nmod mod- 共31兲–共33兲 is arbitrary, as the computed measures only appear
ules, each containing Nred red and Nblue blue vertices. An in the ratio in Eq. 共30兲.
edge exists between a red vertex and a blue vertex with prob- The normalized mutual information is a measure of the
ability pin if they are in the same module and with probabil- amount of information common to the two partitioning
ity pout if they are in different modules. No edges exist be- schemes. By taking one of the partitions to be the assumed
tween vertices with the same color. modular structure of the network and one to be the structure
The qualitative behavior of the model depends on pin and found using the BRIM algorithm, we can thus explore the
pout. When pin ⬎ pout, there is a greater probability of vertices efficacy of the algorithm. When the found modules match the
within a module being linked than vertices in different mod- real ones, we have Inorm = 1, and when the found modules are
ules, matching our intuitive notion of modularity. With pin independent of the real ones, we have Inorm = 0.
sufficiently close to one and pout small, the actual modular We now set Nmod = 5, Nred = 12, and Nblue = 8, giving n
structure of a particular realization of the model should cor- = 100 vertices in the network. With various choices of pin and
respond to the assumed modular structure. As pout → pin, the pout, we repeatedly instantiate the model network and deter-
network becomes more uniform, with the assumed modular mine the assignment of vertices to modules using the BRIM
structure ultimately vanishing and all vertices belonging to a algorithm. The algorithm is initialized by assigning each of
single module 关34兴. Lower values of pin introduce additional the blue vertices to a unique module. For each sample, we
substructure into the modules; the general behavior as pout calculate Inorm.
varies should be similar to the previous case, but with an In Fig. 1, we show results of applying the BRIM algo-
overall reduced correspondence between the assumed mod- rithm to the model network. The points show the mean value
ules and the actual modules in networks instantiated from the of Inorm, averaged over 100 instantiations of the network. The
model. error bars show the standard error of the mean. The general
Following Danon et al. 关14兴, we make precise the above behavior is as anticipated, lending confidence to the algo-
qualitative description in terms of the normalized mutual in- rithm definition.
formation Inorm. Consider two schemes X and Y for dividing
the n vertices into community groups, represented by two
D. Determining the number of modules
index matrices SX and SY 关35兴. The two index matrices are
used to calculate the so-called confusion matrix N, which The BRIM algorithm is silent on the issue of how many
takes the simple form modules should be used. As noted in Sec. III B, the number
066102-5
MICHAEL J. BARBER PHYSICAL REVIEW E 76, 066102 共2007兲
066102-6
MODULARITY AND COMMUNITY DETECTION IN … PHYSICAL REVIEW E 76, 066102 共2007兲
6 Modules Q Inorm
5
3 12
3 7 11 BRIM 0.34554 1
1
12 Spectral 0.32117 0.56897
2 4
13 Davis 1 0.31057 0.44657
10
2 15 Davis 2 0.31839 0.45126
4 13
5 7 14
Doreian 0.29390 0.60766
Unipartite 0.21866 0.28019
14
066102-7
MICHAEL J. BARBER PHYSICAL REVIEW E 76, 066102 共2007兲
066102-8
MODULARITY AND COMMUNITY DETECTION IN … PHYSICAL REVIEW E 76, 066102 共2007兲
introduced in this work has a similar resolution limit, with The eigenvalues of the graph Laplacian are closely related
similar consequences. to many important properties and invariants of the graph
One of the key themes in this paper has been that the 关40兴. In contrast, relatively little is known about the spectra
bipartite structure of the network can be beneficially incor- of modularity matrices, be they for unipartite or bipartite
porated into its mathematical description and its computa- networks. We are optimistic that the eigenvalues of the
tional treatment. This theme was realized in the BRIM algo- modularity matrix usefully relate to important and interesting
rithm, where the assignment of vertices to modules in one network properties.
part of the network, when held fixed, provides a stable
modularity landscape in which it is straightforward to parti- ACKNOWLEDGMENTS
tion the vertices of the other part into modules. We expect
that the characteristics of other specialized classes of net- The author thanks Ludwig Streit, Philippe Blanchard, and
works could be taken advantage of in an analogous fashion Thomas Roediger-Schluga for useful comments and sugges-
to define appropriate null model networks, modularity mea- tions. This work has been supported in part by the European
sures, and community detection algorithms. FP6-NEST-Adventure Programme, Contract No. 028875.
关1兴 C. Christensen and R. Albert, in Complex Networks Structure 关25兴 M. J. Barber, A. Krueger, T. Krueger, and T. Roediger-
and Dynamics, special issue of Int. J. Bifurcation Chaos Appl. Schluga, Phys. Rev. E 73, 036132 共2006兲.
Sci. Eng. 17, 2201 共2007兲. 关26兴 T. Roediger-Schluga and M. J. Barber, in Innovation Networks,
关2兴 M. E. J. Newman, in The New Palgrave Dictionary of Eco- special issue of IJFIP 共in press兲.
nomics, edited by S. N. Durlauf and L. E. Blume 共Palgrave 关27兴 M. A. Porter, P. J. Mucha, M. E. J. Newman, and A. J. Friend,
Macmillan, Basingstoke, 2008兲, 2nd ed. 共in press兲. Physica A 386, 414 共2007兲.
关3兴 M. E. J. Newman, SIAM Rev. 45, 167 共2003兲. 关28兴 R. Guimerà, M. Sales-Pardo, and L. A. N. Amaral, Phys. Rev.
关4兴 R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 共2002兲. E 76, 036102 共2007兲.
关5兴 L. Angelini, S. Boccaletti, D. Marinazzo, M. Pellicoro, and S. 关29兴 J. Duch and A. Arenas, Phys. Rev. E 72, 027104 共2005兲.
Stramaglia, Chaos 17, 023114 共2007兲. 关30兴 The assignment when xi = 0 is arbitrary, and makes no contri-
关6兴 V. Gol’dshtein and G. A. Koganov arXiv:physics/0607159 共un- bution to the modularity.
published兲. 关31兴 C. J. Alpert and S.-Z. Yao, in DAC ’95: Proceedings of the
关7兴 M. B. Hastings, Phys. Rev. E 74, 035102 共2006兲. 32nd ACM/IEEE Conference on Design Automation 共ACM
关8兴 M. E. J. Newman and E. A. Leicht, Proc. Natl. Acad. Sci. Press, New York, 1995兲, pp. 195–200.
U.S.A. 104, 9564 共2007兲. 关32兴 C. J. Alpert, A. B. Kahng, and D. S. Yao, Discrete Appl. Math.
关9兴 J. Reichardt and S. Bornholdt, Phys. Rev. E 74, 016110 90, 3 共1999兲.
共2006兲. 关33兴 An arbitrary rule is needed to break ties, for example, random
关10兴 G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, Nature 共London兲 assignment of the vertex to one of the modules that maximizes
435, 814 共2005兲. Q.
关11兴 M. E. J. Newman and M. Girvan, Phys. Rev. E 69, 026113 关34兴 More customary 共see, e.g., Ref. 关14兴兲 is to fix the expected
共2004兲. degree of the network vertices and vary the expected number
关12兴 A. Clauset, M. E. J. Newman, and C. Moore, Phys. Rev. E 70, of edges linking vertices in different modules, with pin and pout
066111 共2004兲. calculated from the expectation values. However, for the bipar-
关13兴 M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. U.S.A. tite network model under consideration, the base case, with
99, 7821 共2002兲. edges only existing between vertices in the same module, will
关14兴 L. Danon, A. Díaz-Guilera, J. Duch, and A. Arenas, J. Stat. often be excluded using this approach.
Mech.: Theory Exp. 共2005兲, P09008. 关35兴 Analogous measures can be defined in a straightforward fash-
关15兴 M. E. J. Newman, Eur. Phys. J. B 38, 321 共2004兲. ion using the portions of the index matrices that correspond to
关16兴 G.-C. Rota, Am. Math. Monthly 71, 498 共1964兲. just the red or blue vertices.
关17兴 J. M. Pujol, J. Bejar, and J. Delgado, Phys. Rev. E 74, 016107 关36兴 T. M. Cover and J. A. Thomas, Elements of Information
共2006兲. Theory, Wiley Series in Telecommunications 共Wiley-
关18兴 M. E. J. Newman, Phys. Rev. E 69, 066133 共2004兲. Interscience, New York, 1991兲.
关19兴 M. E. J. Newman, Phys. Rev. E 74, 036104 共2006兲. 关37兴 J. Scott and M. Hughes, The Anatomy of Scottish Capital:
关20兴 A. Davis, B. B. Gardner, and M. R. Gardner, Deep South 共Uni- Scottish Companies and Scottish Capital, 1900–1979 共Croom
versity of Chicago Press, Chicago, 1941兲. Helm, London, 1980兲.
关21兴 L. Freeman, in Dynamic Social Network Modeling and Analy- 关38兴 V. Batagelj and A. Mrvar, Pajek Datasets available at http://
sis, edited by R. Breiger, K. Carley, and P. Pattison 共The Na- vlado.fmf.uni-lj.si/pub/networks/data/
tional Academies Press, Washington, D.C., 2003兲. 关39兴 S. Fortunato and M. Barthelemy, Proc. Natl. Acad. Sci. U.S.A.
关22兴 P. Doreian, V. Batagelj, and A. Ferligoj, Soc. Networks 26, 29 104, 36 共2007兲.
共2004兲. 关40兴 F. R. K. Chung, Spectral Graph Theory, CBMS Regional Con-
关23兴 M. E. J. Newman, Phys. Rev. E 64, 016131 共2001兲. ference Series in Mathematics 共American Mathematical Soci-
关24兴 M. E. J. Newman, Phys. Rev. E 64, 016132 共2001兲. ety, Providence, RI, 1997兲.
066102-9