0% found this document useful (0 votes)
45 views9 pages

Modularity and Community Detection in Bipartite Networks

This document presents a study on modularity and community detection in bipartite networks, introducing a bipartite modularity matrix to quantify the clustering of vertices into community groups. An algorithm is proposed for identifying modules based on the mutual dependence of the two parts of the bipartite network, and the effectiveness of this algorithm is demonstrated using real-world data. The research emphasizes the importance of a suitable null model for accurately assessing modularity in bipartite networks.

Uploaded by

matheus lima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views9 pages

Modularity and Community Detection in Bipartite Networks

This document presents a study on modularity and community detection in bipartite networks, introducing a bipartite modularity matrix to quantify the clustering of vertices into community groups. An algorithm is proposed for identifying modules based on the mutual dependence of the two parts of the bipartite network, and the effectiveness of this algorithm is demonstrated using real-world data. The research emphasizes the importance of a suitable null model for accurately assessing modularity in bipartite networks.

Uploaded by

matheus lima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

PHYSICAL REVIEW E 76, 066102 共2007兲

Modularity and community detection in bipartite networks

Michael J. Barber*
Austrian Research Centers GmbH—ARC, Bereich Systems Research, Vienna, Austria
共Received 16 July 2007; revised manuscript received 13 September 2007; published 7 December 2007兲
The modularity of a network quantifies the extent, relative to a null model network, to which vertices cluster
into community groups. We define a null model appropriate for bipartite networks, and use it to define a
bipartite modularity. The bipartite modularity is presented in terms of a modularity matrix B; some key
properties of the eigenspectrum of B are identified and used to describe an algorithm for identifying modules
in bipartite networks. The algorithm is based on the idea that the modules in the two parts of the network are
dependent, with each part mutually being used to induce the vertices for the other part into the modules. We
apply the algorithm to real-world network data, showing that the algorithm successfully identifies the modular
structure of bipartite networks.

DOI: 10.1103/PhysRevE.76.066102 PACS number共s兲: 89.75.Hc, 02.10.Ud

I. INTRODUCTION works have additional constraints that could be and, indeed,


should be reflected in the null model.
Networks have attracted a burst of attention in the last A significant such class of networks is that of bipartite
decade 共useful reviews include Refs. 关1–4兴兲, with applica- networks. The vertices of a bipartite network can be parti-
tions to natural, social, and technological networks. Of great tioned into two disjoint sets such that no two vertices within
current interest is the identification of the modular structure the same set are adjacent. There are thus two distinct kinds of
of the network. Detecting modules, or communities, allows vertices, providing a natural representation for many affilia-
quantitative investigation of relevant subnetworks, which tion or interaction networks, with one kind of vertex repre-
may have different properties from the aggregate properties senting actors and the other representing relations. Examples
of the network as a whole, e.g., modules in the World Wide of actor-relation pairs include people attending events
Web are sets of topically related web pages. 关20–22兴, court justices making decisions 关22兴, scientists
Informally, a network module is a subgraph whose verti- jointly publishing articles 关23,24兴, organizations collaborat-
ces are more likely to be connected to one another than to the ing in projects 关25,26兴, and legislators serving on committees
vertices outside the subgraph. A variety of approaches 关5–13兴 关27兴. Arguably, bipartite networks are the empirically stan-
have been taken to explore this concept. See Refs. 关14,15兴 dard case for social networks and other interaction networks,
for useful reviews. with unipartite networks appearing—often implicitly—as
In this work we focus on the measure called modularity, projections.
introduced by Newman and Girvan 关11兴. Modularity reflects In the statistical physics community, the usual approach
taken to identify modules in bipartite networks is to first
the extent, relative to a null model network, to which edges
construct a unipartite projection of one part of the network,
are formed within modules instead of between modules. Us-
and then identify modules in that projection using methods
ing the modularity, we can assess the quality of any assign- for unipartite networks. For example, in the scientist-
ment of vertices to modules. Further, the module identifica- publication network mentioned above, a network of scien-
tion problem becomes a modularity optimization problem. tists is created by linking scientists when they have jointly
However, exact maximization of the modularity is in general published. These unipartite projections can be illuminating,
an intractable problem, because the number of ways to par- but intrinsically lose information—indeed, Guimerà et al.
tition the set of vertices grows extremely rapidly 关16兴. In 关28兴 demonstrate that analysis of an unweighted, unipartite
light of this, a number of effective algorithms have been projection can give unreliable or incorrect results.
introduced to find high modularity partitions of the vertices The principal contribution in this work is a proposed defi-
关17,18兴. The modularity can be also be defined in terms of a nition of a modularity for bipartite networks. The approach
so-called modularity matrix, the eigenspectrum of which has taken is based on defining a bipartite modularity matrix B as
a fundamental relationship with the modular nature of the an extension of the recent work by Newman 关19兴. Some key
network 关19兴. properties of the eigenspectrum of B are identified and used
Given the explicit dependence of the modularity upon a to specialize Newman’s matrix-based algorithms to bipartite
null model, it is clear that the specific choice of a null model networks. An additional algorithm fundamentally based on
has a profound impact on the modularity. Surprisingly, only the bipartite character of the networks is introduced; we call
one null model has been so far explored at length: networks the algorithm BRIM, for bipartite, recursively induced mod-
with edges randomly assigned such that the expected degrees ules.
of model-network vertices equal the actual degrees of corre- In parallel, Guimerà et al. 关28兴 have independently inves-
sponding real-network vertices 关19兴. Specific classes of net- tigated modularity in bipartite networks. They proceed by
first identifying the two parts of the network as actors and
teams, and then formulating a bipartite modularity in which
*michael.barber@arcs.ac.at modules consist of groups of actors that are closely intercon-

1539-3755/2007/76共6兲/066102共9兲 066102-1 ©2007 The American Physical Society


MICHAEL J. BARBER PHYSICAL REVIEW E 76, 066102 共2007兲

nected based on joint participation in many teams. The re- From Eqs. 共1兲–共3兲, it is apparent that the choice of null
sulting modularity is thus focused on identifying modules in model has a profound impact on the modularity. Thus, for
only one part of the network at a time. Interesting, Guimerà example, a Bernoulli random graph with constant Pij = p for
et al. point out the possibilities of classifying both partite sets all i and j is a poor representation of most real-world net-
of the network simultaneously and of customizing spectral works, so would be an inappropriate choice of null model.
methods for bipartite networks, which is essentially the ap- Instead, the usual choice of null model 关19兴 assigns edges at
proach taken in the present work. random with the expected degrees of model vertices con-
As of this writing, we are aware of no other attempts to strained to match the degrees in the actual network.
define modularity for bipartite networks. However, bipartite In much the same fashion, bipartite networks have spe-
networks, or “two mode networks,” have undergone several cific constraints that should be reflected in the null model.
related studies in the sociology community using other meth- The vertices of a bipartite network can be partitioned into
ods 共see, e.g., Refs. 关21,22兴, and references cited therein兲. two disjoint sets such that no two vertices within the same
The structure of the paper is as follows: in Sec. II we set are adjacent. An equivalent, but more visual, definition is
define a modularity matrix and measure for bipartite net- that the vertices in a bipartite graph can be assigned one of
works. We discuss using the bipartite modularity matrix to two colors, say red and blue, with no neighboring vertices
identify modules in Sec. III, and apply the algorithm therein bearing the same color. In the remainder of this section, we
devised to two real-world networks in Sec. IV. Finally, we will define a null model with the above requirement that the
conclude in Sec. V with an assessment of the present inves- expected degrees match the degrees in the real network,
tigation and an outlook for future work. along with the additional constraint that each edge links a red
vertex and a blue vertex.
II. BIPARTITE MODULARITY Let p be the number of red vertices and q be the number
of blue vertices; this implies n = p + q. Without loss of gener-
In this section, we develop a modularity matrix for bipar- ality, assume that the vertices are indexed so that red vertices
tite networks. Structurally and notationally, the development are labeled 1 , 2 , . . . , p and the blue vertices are labeled p
parallels the discussion of the modularity matrix by Newman + 1 , p + 2 , . . . , p + q. The adjacency matrix then has a block
关19兴. off-diagonal form of

冋 册
Consider a network with n vertices and m edges defined
by an adjacency matrix A. Each vertex i is assigned to a O p⫻p à p⫻q
community group or module, denoted by gi. The modularity A= , 共4兲
Q for such an assignment reflects the extent, relative to a null 共ÃT兲q⫻p Oq⫻q
model, to which edges are formed within modules instead of
between modules. Formally, the modularity is defined as where Oi⫻j is the all-zero matrix with i rows and j columns.
Require the same block structure for P that is exhibited by A,
1
Q= 兺 共Aij − Pij兲␦共gi,g j兲, 共1兲
giving

冋 册
2m i,j
O p⫻p P̃ p⫻q
where the Aij are the adjacency matrix elements and the Pij P= . 共5兲
are probabilities in the null model that an edge exists be- 共P̃ 兲q⫻p Oq⫻q
T

tween vertices i and j.


The modularity can be given an equivalent definition in This form for P assigns zero likelihood to edges between
matrix form. First, the n community indices gi with values vertices with the same color, precluding any such edges in
taken from 兵1 , 2 , . . . , c其 are replaced by an n ⫻ c index matrix the null model.
S = 关s1 兩 s2兩 ¯ 兩sc兴, where c is the number of modules. All ele- The modularity matrix B in turn has a block off-diagonal
ments of S take on either a 0 or 1 value, so that column si is form of

冋 册
an index vector showing membership in module i; a value of
1 in position j of si indicates that vertex j belongs to module O p⫻p B̃ p⫻q
i. Given that each vertex is assigned to exactly one module, B= , 共6兲
共B̃ 兲q⫻p Oq⫻q
T
each row of S has a single unit value and the index vectors
are thus orthogonal.
Further, a modularity matrix B is defined with elements where B̃ = Ã − P̃. The all-zero blocks on the diagonal are the
potential modularity contributions from pairs of vertices of
Bij = Aij − Pij . 共2兲 the same color being present in a module; all meaningful
contributions, positive or negative, to the modularity thus are
Using S and B, the modularity becomes
made by pairs of vertices with distinct colors. In contrast,
1 with the usual null model based on unipartite networks 关19兴,
Q= Tr STBS. 共3兲 the corresponding blocks contain only negative elements 共or
2m
zeros for isolated nodes of degree zero兲, always providing a
The eigenspectrum of B has a fundamental relationship with modularity penalty for pairs of like-colored vertices in the
the modular nature of the network, as Newman 关19兴 has ex- same module.
plored. Equation 共1兲 can be rewritten as

066102-2
MODULARITY AND COMMUNITY DETECTION IN … PHYSICAL REVIEW E 76, 066102 共2007兲

p q into modules. A partitioning can be determined using any


1
Q = 兺 兺 B̃ij␦共gi,h j兲, 共7兲 method. Two general approaches seem relevant. First, the
m i=1 j=1 modularity defined in Sec. II can be maximized using stan-
where h j = g j+p. Since Q = 0 when all vertices are in the same dard optimization algorithms such as genetic algorithms,
module, we can set all gi and h j equal, giving greedy search methods 关18兴, or extremal optimization 关29兴;
this is generally straightforward and will not be discussed at
p q
length in this work. Second, the spectral properties of B or
兺 兺 共Ãij − P̃ij兲 = 0,
i=1 j=1
共8兲 other matrices associated with the graph can be analyzed to
partition the vertices into modules.
so that For example, one standard partitioning approach is to as-
p q p q
sign the vertices to modules using spectral partitioning 共SP兲.
In spectral partitioning, the eigenvectors of the network La-
兺 兺 P̃ij = 兺 兺 Ãij = m.
i=1 j=1 i=1 j=1
共9兲 placian are used to minimize the number of edges running
between groups. The SP approach has a significant draw-
Thus, the expected number of edges in the null model must back: the vertices are assigned to modules of predetermined
equal the number of edges in the actual network. size. This is problematic for the investigation of real-world
The degrees of the red vertices are given by 兺qj=1Ãij = ki, networks, where the number and sizes of community groups
while those of the blue vertices are given by 兺i=1
p
Ãij = d j. By are not generally known in advance.
constraining the expected degrees in the null model to match An analogous approach based on the spectral properties of
the actual degrees, as discussed above, we obtain the modularity matrix B has recently been proposed 关19兴.
Since the modularity is conceptually closer to our under-
q
standing of network community structure, this spectral opti-

j=1
P̃ij = ki , 共10兲 mization of modularity 共SOM兲 is better tailored for real-
world networks.
p
An important special case in both spectral partitioning and
spectral optimization of modularity is to assign the vertices

i=1
P̃ij = d j . 共11兲
to two groups based on a single eigenvector of the Laplacian
共SP兲 or modularity 共SOM兲 matrix. In the case of SP, we are
Since interested in the eigenvector corresponding to the smallest
p positive eigenvalue; this is the Fiedler vector. For SOM, we

i=1
ki = m, 共12兲 are interested in the leading eigenvector x, corresponding to
the largest positive eigenvalue ␭ of B; we propose calling
this the Newman vector. Using the Newman vector, we ap-
q proximate B as
兺 d j = m, 共13兲
j=1
B ⬇ ␭xxT . 共16兲
Eqs. 共10兲 and 共11兲 ensure that Eq. 共9兲 holds.
In the usual null model, the probability of an edge being With just two modules, S = 关s1 兩 s2兴, so that the modularity in
present between two vertices is proportional to the product of Eq. 共3兲 becomes
the degrees of the vertices. For the bipartite case, this be-
comes P̃ij = Ckidi for some constant C. Combining this defi- ␭
nition with Eqs. 共11兲 and 共12兲, we obtain Q= 共具s1,x典2 + 具s2,x典2兲. 共17兲
2m
p p

d j = 兺 P̃ij = C 兺 kid j = 共Cm兲d j , 共14兲 Recall that the index vectors s1 and s2 take on values from
i=1 i=1 兵0, 1其. It is clear how to maximize the modularity in Eq. 共17兲:
so that C = 1 / m and thus when xi, the ith element of x, is positive, assign vertex i to
the first module by setting the ith entry of s1 to one, and
k id j when xi is negative, assign vertex i to the second module by
P̃ij = . 共15兲
m setting the ith entry of s2 to one 关30兴.
The use of multiple eigenvectors allows more than two
The same result can be obtained from Eqs. 共10兲 and 共13兲
modules to be considered 共c ⬎ 2兲, with at most one module
instead of Eqs. 共11兲 and 共12兲. With Eq. 共15兲, we have fully
more than the number of positive eigenvalues of B 关19兴.
defined the modularity Q for a bipartite network.
Additional eigenvectors of B can also be used for SOM 关19兴
III. MODULE IDENTIFICATION in a vector partitioning algorithm adapted from spectral par-
titioning 关31,32兴. In the present work, we will not make use
A. Spectral methods for module identification of this algorithm, nor of a recursive bipartitioning approach,
Using the modularity defined in Sec. II, we can assess the instead developing an alternative technique that capitalizes
quality of any partitioning of the vertices of a bipartite graph on the bipartite nature of the networks.

066102-3
MICHAEL J. BARBER PHYSICAL REVIEW E 76, 066102 共2007兲

B. Module identification in bipartite networks Substituting the partitioned matrices into Eq. 共3兲, we obtain
In Sec. III A, we have seen how to identify community 1
groups of networks by using the Newman vector to maxi- Q= Tr RTB̃T. 共22兲
m
mize Q. However, we made no use of the bipartite character
of the networks. For a bipartite network, the eigenvalue Our goal then becomes to assign network vertices to modules
equation Bxi = ␭ixi can be written as such that Eq. 共22兲 is maximized.

冋 册冋 册 冋 册
O
B̃ T

O
ui
vi
= ␭i
ui
vi
, 共18兲
One approach to optimizing the modularity as expressed
in Eq. 共22兲 is essentially the same as the Newman vector
approach considered in Sec. III A. Without loss of generality,
label the singular values such that ␴1 艌 ␴2 艌 ¯ 艌 ␴r ⬎ 0. Ap-
where ui is a p ⫻ 1 vector and vi is a q ⫻ 1 vector. The left- proximate B̃ as
hand side of Eq. 共18兲 can be multiplied out, giving

冋 册冋 册 冋 册 冋 册
O
B̃T O
B̃ ui
vi
=
B̃vi
B̃Tui
= ␭i
ui
vi
, 共19兲
B̃ ⬇ ␴1u1vT1 .
Now, we bipartition the vertices with R = 关r1 兩 r2兴 and T
= 关t1 兩 t2兴, so that
共23兲

i.e., B̃vi = ␭iui and B̃Tui = ␭ivi. ␴1


Additionally, we can construct a vector from ui and −vi, Q= 共具r1,u1典具t1,v1典 + 具r2,u1典具t2,v1典兲. 共24兲
so that m

冋 册冋 册 冋 册 冋 册
O
B̃T O
B̃ ui
− vi
=
− B̃vi
B̃Tui
= − ␭i
ui
− vi
. 共20兲
As with the Newman vector approach, Q is maximized by
assigning the vertices to modules based on the signs of the
corresponding component of u1 or v1, as appropriate. This
maximizes the magnitude of the inner products in Eq. 共24兲,
Hence, for any eigenvalue ␭i of B, −␭i is also an eigenvalue with consistent assignment of both red and blue vertices to
of B. the same module based on the signs ensuring that positive
Since only the eigenvectors corresponding to positive ei- contributions are made to the modularity.
genvalues of B can give positive contributions to Q, we can
focus on just the positive eigenvalues ␴i = 兩␭i兩 ⬎ 0. In this C. Recursive identification of bipartite modules
case, ui and vi are, respectively, left and right singular vec-
In Secs. III A and III B, we have seen how the leading
tors of B̃. If we shift our attention from the spectral decom-
eigenvector of B and the leading singular vectors of B̃ can be
position of B to the singular value decomposition 共SVD兲 of
used to bipartition network vertices. Extending these meth-
B̃, we therefore automatically exclude the eigenvectors of B ods to use the full modularity matrices and to handle more
that correspond to negative eigenvalues. than two modules is, in general, nontrivial. However, for the
The appearance of the singular vectors of B̃ is not surpris- bipartite case at least, there is a relatively straightforward
ing. All the information about the linkage structure of the extension that leads to a useful algorithm.
network is contained in B̃, and the singular value decompo- First, we assume that the blue vertices are all assigned to
sition is the natural generalization of the spectral decompo- modules through some mechanism. Maximizing the modu-
larity then consists solely of assigning the red vertices to
sition used for B to asymmetric matrices like B̃. What is
modules. This is a comparatively simple task. To see this,
more, the singular values and singular vectors of B̃ can rewrite Eq. 共22兲, giving
sometimes provide more information than the eigenvalues
and eigenvectors of B. 1 1
For example, the number of modules is at most one more Q= Tr RTB̃T = Tr RTT̃, 共25兲
m m
than the number of positive eigenvalues of B. Since, for each
vertex, the expected degree in the null model equals the ac- where we have aggregated the fixed terms into the matrix
tual degree in the network, the rows and columns of B̃ all T̃ = B̃T. We now write Eq. 共25兲 in terms of explicit sums, so
that
sum to zero. The rank r of B̃, which equals the number of
singular values of B̃, must then be less than both p and q.
From this, we conclude that the number of communities is at
most equal to the smaller of p and q.
1
c p
1
Q = 兺 兺 RikT̃ik = 兺
m k=1 i=1 m i=1
p

冉兺 冊
c

k=1
RikT̃ik . 共26兲

To assign vertices to modules using B̃, we first partition The inner sum in Eq. 共26兲 is a sum across the rows of R.
the index matrix S so that Since each row of R consists of a single 1 with all other

冋册
elements being 0, the modularity is now simple to maximize:
R
S= . 共21兲 we just assign red vertex i to module k such that T̃ik is the
T
maximum of the ith row of T̃ 关33兴.
The matrices R and T have dimensions p ⫻ c and q ⫻ c, re- Conversely, if the red vertices are all assigned to modules,
spectively, indexing the red and blue vertices into c modules. maximizing Q consists of assigning the blue vertices to mod-

066102-4
MODULARITY AND COMMUNITY DETECTION IN … PHYSICAL REVIEW E 76, 066102 共2007兲

ules. Analogously to the previous case, we define R̃ = B̃TR N = SXTSY . 共28兲


and manipulate Eq. 共22兲 into the form
The probability P 共X = x, Y = y兲 that a vertex is assigned to

冉兺 冊
q c
community x in scheme X and to community y in scheme Y
1 is proportional to the corresponding element Nxy of the con-
Q= 兺
m j=1 k=1
T jkR̃ jk . 共27兲 fusion matrix, so that
1
As with the red vertices, we maximize Q by assigning the jth P共X = x,Y = y兲 = Nxy . 共29兲
n
blue vertex to the module k such that R̃ jk is the maximum of
Using the probability as defined in Eq. 共29兲, we can calculate
the jth row of R̃. the normalized mutual information as
Taken together, these two maximization procedures define
an algorithm that we call BRIM 共bipartite, recursively in- 2I共X,Y兲
duced modules兲. The BRIM algorithm is an iterative algo- Inorm共X,Y兲 = . 共30兲
H共X兲 + H共Y兲
rithm for maximizing Q, with the sets of red and blue verti-
ces each recursively drawing the other into modular Equation 共30兲 is expressed in terms of the usual mutual in-
structures. For each iteration, Q is guaranteed never to de- formation I共X , Y兲 and entropies H共X兲 and H共X兲 关36兴, defined
crease, as it is always possible at least to maintain the previ- as
ous vertex partitioning and keep the modularity the same.
P共X,Y兲
Therefore, the BRIM algorithm will always find a partition at I共X,Y兲 = 兺 P共X,Y兲log , 共31兲
a maximum of Q. In general, the identified partition will x,y P共X兲P共Y兲
correspond to a local maximum in Q, not the global maxi-
mum. H共X兲 = − 兺 P共X兲log P共X兲, 共32兲
Note that the BRIM algorithm can work with the entire B̃ x
matrix, or a rank-restricted approximation calculated by
omitting the smallest singular values. By using the full B̃ H共Y兲 = − 兺 P共Y兲log P共Y兲. 共33兲
matrix, we automatically include all positive contributions to y
the modularity. As well, the algorithm can work with any
assumed number of modules; however, no constraint exists In Eqs. 共30兲–共33兲, we have made use of the common short-
to ensure that each module is occupied. hand abbreviations P共X = x , Y = y兲 = P共X , Y兲, P共X = x兲 = P共X兲,
To test the efficacy of the BRIM algorithm, we apply it to and P共Y = y兲 = P共Y兲. The base of the logarithms in Eqs.
a simple model network. The model consists of Nmod mod- 共31兲–共33兲 is arbitrary, as the computed measures only appear
ules, each containing Nred red and Nblue blue vertices. An in the ratio in Eq. 共30兲.
edge exists between a red vertex and a blue vertex with prob- The normalized mutual information is a measure of the
ability pin if they are in the same module and with probabil- amount of information common to the two partitioning
ity pout if they are in different modules. No edges exist be- schemes. By taking one of the partitions to be the assumed
tween vertices with the same color. modular structure of the network and one to be the structure
The qualitative behavior of the model depends on pin and found using the BRIM algorithm, we can thus explore the
pout. When pin ⬎ pout, there is a greater probability of vertices efficacy of the algorithm. When the found modules match the
within a module being linked than vertices in different mod- real ones, we have Inorm = 1, and when the found modules are
ules, matching our intuitive notion of modularity. With pin independent of the real ones, we have Inorm = 0.
sufficiently close to one and pout small, the actual modular We now set Nmod = 5, Nred = 12, and Nblue = 8, giving n
structure of a particular realization of the model should cor- = 100 vertices in the network. With various choices of pin and
respond to the assumed modular structure. As pout → pin, the pout, we repeatedly instantiate the model network and deter-
network becomes more uniform, with the assumed modular mine the assignment of vertices to modules using the BRIM
structure ultimately vanishing and all vertices belonging to a algorithm. The algorithm is initialized by assigning each of
single module 关34兴. Lower values of pin introduce additional the blue vertices to a unique module. For each sample, we
substructure into the modules; the general behavior as pout calculate Inorm.
varies should be similar to the previous case, but with an In Fig. 1, we show results of applying the BRIM algo-
overall reduced correspondence between the assumed mod- rithm to the model network. The points show the mean value
ules and the actual modules in networks instantiated from the of Inorm, averaged over 100 instantiations of the network. The
model. error bars show the standard error of the mean. The general
Following Danon et al. 关14兴, we make precise the above behavior is as anticipated, lending confidence to the algo-
qualitative description in terms of the normalized mutual in- rithm definition.
formation Inorm. Consider two schemes X and Y for dividing
the n vertices into community groups, represented by two
D. Determining the number of modules
index matrices SX and SY 关35兴. The two index matrices are
used to calculate the so-called confusion matrix N, which The BRIM algorithm is silent on the issue of how many
takes the simple form modules should be used. As noted in Sec. III B, the number

066102-5
MICHAEL J. BARBER PHYSICAL REVIEW E 76, 066102 共2007兲

1 pin = 0.9 number of modules, trying new values for c so as to continu-


pin = 0.5
ously reduce the interval wherein the putative maximum in Q
lies. As with the initial extrapolation stage of the search,
0.8
vertices are assigned from earlier solutions to the newly al-
lowed modules for each value of c, and a new, locally opti-
0.6 mal solution found.
Inorm

The search for c terminates once the interval becomes


0.4 sufficiently small. In this work, we take the interval to be 2,
i.e., the Q maximum at c = cmax is bracketed by inferior so-
lutions at c = cmax − 1 and c = cmax + 1. This adaptive BRIM
0.2
algorithm enables us to identify the appropriate number of
modules cmax in a number of steps that scales logarithmically
0
0 0.2 0.4 0.6 0.8 1
with the number of vertices in the network.
pout / pin
IV. RESULTS
FIG. 1. Agreement between model network modules and mod-
ules found using the BRIM algorithm. Each point shows the mean In this section, we apply the BRIM algorithm to a net-
normalized mutual information between the model network com- work showing the interactions of women in the American
munity groups and those identified using the algorithm, averaged Deep South at various social events 关20兴 and to a network
over 100 realizations of the model network. Error bars show the showing corporate interlocks in Scottish firms 关37兴. Both net-
standard error of the mean. works are conveniently available on the World Wide Web in
Pajek format 关38兴.
of modules c is at most one more than the rank of B̃, which
A. Southern women event participation
is a relatively weak constraint. One approach is thus to as-
sign each vertex of the smaller of the red and blue vertex sets As an initial example, we consider the Southern women
to unique modules, and allow the vertices to be grouped into data set, collected by Davis et al. 关20兴 in and around
an appropriate number of modules. For the BRIM algorithm, Natchez, Mississippi during the 1930s as part of an extensive
said approach is resource intensive, requiring the calculation study of class and race in the Deep South. This data set and
of modularity contributions for what may be a grossly over- networks derived from it have been much studied. Indeed,
estimated number of modules. Worse still, when the number Freeman 关21兴 has described it as “. . . a touchstone for com-
of vertices is much greater than the number of modules, the paring analytic methods in social network analysis.”
BRIM algorithm may terminate at low-quality local maxima The Southern women data set describes the participation
far from the true number of modules in the network 共see Sec. of 18 women in 14 social events. The women and social
IV B for an example of this兲. events constitute a bipartite network; an edge exists between
Clearly, automatically selecting the correct number of al- a woman and a social event if the woman was in attendance
lowed modules in such a case would be preferable. The al- at the event. The network is connected.
lowed number of modules c thus becomes an adaptable pa- We identified network modular structure using the BRIM
rameter for which a value is to be found that optimizes the algorithm. The initial state is, in general, important. The de-
modularity. This presents some difficulties in that there is no pendence on the initial state is most visible in the quality of
obvious relationship between the allowed number of mod- the stable solution, i.e., the algorithm can get “stuck” at a
ules and the modularity found by the BRIM algorithm. How- poor quality local maximum. We initialized the assignment
ever, by assuming that the modularity depends on the al- of events to modules in T using several strategies: 共1兲 assign-
lowed number of modules in a reasonably smooth fashion, ing all events to a single module, 共2兲 assigning each event to
we can use a simple bisection approach to identify an appro- its own module, and 共3兲 randomly assigning events to mod-
priate value for the number of allowed modules. ules.
The search begins by requiring all vertices to belong to For this network, all three strategies identify modular
the same module, c = 1, giving Q = 0. We double the allowed structures. The first strategy produces a good quality solution
number of modules c. Half of the vertices are randomly re- 共four modules, Q = 0.345 54兲. The second strategy also pro-
assigned to the newly defined modules, and a new, locally duces a solution that captures a great deal of the modular
optimal solution is found using the BRIM algorithm. This structure, but is somewhat coarser than the first 共2 modules,
process continues, with c being repeatedly doubled so long Q = 0.321 17兲. The third strategy, random initial assignment,
as Q continues to increase. Each step in the c search builds sheds light on the quality of the first two. Because the net-
on the previous solution by partially reusing the assignment work is small, a large number of trials can be run without
of vertices to modules. difficulty; we ran 500 000 trials. The greatest modularity
Once Q drops as c increases, we have crossed a maximum found equalled that found with all events initially in unique
in the modularity landscape. We therefore switch from ex- modules, Q = 0.345 54, indicating that this best solution
trapolating to larger numbers of modules to interpolating found is quite good.
within the interval that includes the maximum. The interpo- In Fig. 2, we show the best assignment of vertices to
lation is done using a simple bisection search in the allowed modules determined using the BRIM algorithm with all

066102-6
MODULARITY AND COMMUNITY DETECTION IN … PHYSICAL REVIEW E 76, 066102 共2007兲

8 9 TABLE I. Comparison of modules in the Southern women net-


16 work. Where necessary, the modularity values Q are calculated
from an optimistic assignment of the events to the best possible
8
modules from a given assignment of the women to modules. Values
of the normalized mutual information Inorm are calculated between
18
the given divisions of the women and the best division found using
17 11
9 the BRIM algorithm.
1 6 10

6 Modules Q Inorm
5

3 12
3 7 11 BRIM 0.34554 1
1
12 Spectral 0.32117 0.56897
2 4
13 Davis 1 0.31057 0.44657
10
2 15 Davis 2 0.31839 0.45126
4 13
5 7 14
Doreian 0.29390 0.60766
Unipartite 0.21866 0.28019
14

Inorm values are seen from Table I to be similar for both


FIG. 2. Modules in the Southern women network. The women assignments, with the case where woman 9 is grouped with
are represented as open symbols with black labels and the events women 10–18 labeled as “Davis 1” and the case where
are represented as filled symbols with white labels. The modules are woman 9 is grouped with women 1–8 labeled as “Davis 2.”
indicated by the shape of the symbols. Vertices are positioned with The latter division is the same as what Freeman 关21兴 identi-
coordinates based on the elements of the singular vectors corre-
fied as the consensus from 21 different studies of the South-
sponding to the two largest singular values of B̃; some vertices are ern women data set. The Q and Inorm values are reasonably
repositioned slightly to eliminate overlaps. The vertex partition pic- similar to values found for two modules using either the
tured has the highest modularity we have found for the Southern BRIM algorithm or spectral bipartitioning as discussed in
women network, Q = 0.345 54.
Sec. III B, which groups the women into sets 兵1–7, 9其 and 兵8,
10–18其 共identified in Table I with the label “spectral”兲.
events initially in different modules. The shapes of the ver- Doreian et al. 关22兴 considered the modular nature of both
tices show which ones belong to the same modules, with four parts of the network, suggesting several divisions of the
modules in all. Open symbols with black labels portray ver- women and events. The division with the greatest modularity
tices corresponding to the women, and filled symbols with 共given in their Table 4兲 is characterized in Table I with the
white labels portray vertices corresponding to the events. label “Doreian.” Taking just their partitioning of the events
The positions of the vertices are based on the singular vec- into three groups 共events 1–5, 6–9, and 10–14兲 and replacing
tors corresponding to the two largest singular values of B̃, their partitioning of the women using the approach from Sec.
with the right singular vectors giving the coordinates for the III C, the modularity can be increased from 0.293 90 to
events and the left singular vectors giving the coordinates for 0.329 50. This is similar to the best assignment of vertices to
the women. Several vertices have been shifted slightly to modules we described above, with modularity of 0.345 54,
prevent overlapping vertex symbols while preserving the wherein the additional structure produces a modest, but real,
overall character of the network. improvement in the modularity.
The community groups found using the BRIM algorithm It is also of interest to compare the community groups
are comparable to those found in previous investigations of obtained for the Southern women network using the bipartite
the Southern women data set 共Ref. 关21兴 provides a useful network to those found using an unweighted projection net-
survey兲. Most such studies have focused on the women, leav- work. Here, we focus on the projection consisting of the 18
ing the groupings for the events unspecified; we can use the women as vertices, with edges defined by mutual participa-
groupings of the women to assign the events to the best tion in events. The best division we found for the women,
modules, as described in Sec. III C, and calculate modularity discussed above and shown in Fig. 2, actually has a negative
values for purposes of comparison. The community groups value for the standard unipartite modularity; it is thus better
can be further compared using the normalized mutual infor- to use only a single module containing all 18 women than the
mation between the various groupings of the women and the best module found for the bipartite network. Since the mod-
best grouping found using the BRIM algorithm. Values of Q ules we identified from the bipartite network using the BRIM
and Inorm are summarized in Table I and discussed in depth algorithm are similar to those found in numerous other stud-
below. ies, this highlights the difficulties that can arise using a uni-
In the original investigation, Davis et al. 关20兴 used general partite projection.
ethnographic knowledge of the community to assign the Conversely, we can determine the bipartite modularity for
women to two groups. The groups consisted of women 1–9 community groups found using the unipartite projection. We
and of women 9–18; woman 9 is a secondary member of first use the Newman vector to partition the women into two
both groups. To be consistent with the definitions in Sec. II, groups as described in Sec. III A, with women 2 and 4–7 in
we must assign this individual to a specific group. The Q and one group and all others in a second group. Next, we deter-

066102-7
MICHAEL J. BARBER PHYSICAL REVIEW E 76, 066102 共2007兲

mine the best assignment of events to modules using the 60


c = 86
approach from Sec. III C. Together, this gives the values c = 65
c = 40
shown in Table I for the label “Unipartite,” which reflect that 50 c = 25
c = 15

Final Number of Modules


some of the modular structure of the network has been cap- c = 10
c= 5
tured but is generally inferior to the solutions found from the 40

bipartite network. Further, the solution from the unweighted


projection does not correspond to a maximum in the bipartite 30

modularity; using the solution as the initial state for the


BRIM algorithm, a solution is obtained with two modules 20

identical to those found using spectral bipartitioning as de-


10
scribed in Sec. III B.
0
0.5 0.55 0.6 0.65 0.7
B. Scotland corporate interlock Q

As a second example, we consider a data set on corporate


interlocks in Scotland in the early twentieth century 关37兴. FIG. 3. Quality of solutions found in the Scotland corporate
The data set characterizes 108 Scottish firms during 1904–5, interlock network. The modularity Q depends on the allowed num-
ber of modules c. The points correspond to solutions found using
detailing the corporate sector, capital, and board of directors
the BRIM algorithm starting from a random initial assignment of
for each firm. The data set includes only those board mem-
vertices to modules. The values on the ordinate indicate the number
bers who held multiple directorships, totaling 136 individu-
of modules occupied by at least one vertex in the solution state
als. found by the BRIM algorithm. All points are slightly dithered to
Here, we focus on the bipartite network of firms and di- better show regions with many similar or identical solutions. The
rectors, with edges existing between each firm and its board lines show the course of an adaptive search for the correct number
members. Unlike the Southern women network, the Scotland of modules to maximize the modularity, terminating at states with
corporate interlock network is not connected. In the follow- the modularity and number of modules shown by the crosses.
ing, we consider only the largest component of the graph,
containing 131 directors and 86 firms—and thus, as many as
any of the much larger number of trials using BRIM with a
86 modules.
fixed c.
As with the Southern women network, assigning all direc-
Based on the solutions shown in Fig. 3, the main compo-
tors to unique modules or to the same module results in a
nent of the Scotland corporate interlock network has roughly
solution that captures some of the modular character of the
20 community groups, considerably fewer than the 131 di-
network, with Q = 0.566 34 and Q = 0.398 73, respectively.
rectors or 86 firms. This analysis could serve as a starting
However, in contrast to the Southern women network, these
point for an investigation of the community structures of the
are rather poor solutions to what can be found starting from
firms or directors. A more comprehensive analysis would
a random assignment of directors to modules 共see Fig. 3兲.
take into account the available information on the corporate
Further, the best solutions are found by restricting the
sectors and capital of the firms.
allowed number of modules c to less than the maximum. In
principle, allowing the number of modules to take on any
size leaves the BRIM algorithm to search the largest possible
space, potentially finding the largest possible modularity V. CONCLUSIONS
value. In practice, the results are inferior to those obtained We have defined and explored a modularity appropriate
from a more restricted search. In Fig. 3, we show the results, for bipartite networks. The presented results extend and spe-
in terms of the actual numbers of modules occupied and cialize the matrix-based approach recently reported by 关19兴
modularity values, for BRIM searches with the allowed num- for unipartite networks. The bipartite structure of the net-
ber of modules restricted. This trades off the possibility of work is reflected mathematically in the importance of an
higher modularity values in the excluded region for im-
asymmetric submatrix B̃ of the full bipartite modularity ma-
proved searching in the remaining region. The trade-off is
trix B, with a corresponding emphasis on the singular value
clearly a good one, as the best solutions are found with fewer
than thirty modules. decomposition of B̃ instead of the spectral decomposition of
In Fig. 3, we also show three runs of the adaptive BRIM B. We made use of the properties of B̃ to define an algorithm,
algorithm described in Sec. III D. The lines show the BRIM, for use in identifying network modules. By applying
progress of the number of modules and modularity value the algorithm to real-world networks, we demonstrated its
during the search. The number of modules c allowed for the effectiveness and identified some of its limitations.
BRIM search is typically close 共within 10%兲 to the number The usual unipartite modularity has a limited resolution
of modules actually found, suggesting that the adaptive ap- that depends on the number of edges in the network 关39兴.
proach eliminates a wasteful search through vertex assign- The main consequence of the resolution limit is that the
ments with too many modules. The three traces all show modules in large networks may have hidden substructures
typical behavior and lead to good solutions; two of the adap- that require deeper investigations to reveal. Although we
tive runs lead to better solutions, in terms of modularity, than have not shown it, we expect that the bipartite modularity

066102-8
MODULARITY AND COMMUNITY DETECTION IN … PHYSICAL REVIEW E 76, 066102 共2007兲

introduced in this work has a similar resolution limit, with The eigenvalues of the graph Laplacian are closely related
similar consequences. to many important properties and invariants of the graph
One of the key themes in this paper has been that the 关40兴. In contrast, relatively little is known about the spectra
bipartite structure of the network can be beneficially incor- of modularity matrices, be they for unipartite or bipartite
porated into its mathematical description and its computa- networks. We are optimistic that the eigenvalues of the
tional treatment. This theme was realized in the BRIM algo- modularity matrix usefully relate to important and interesting
rithm, where the assignment of vertices to modules in one network properties.
part of the network, when held fixed, provides a stable
modularity landscape in which it is straightforward to parti- ACKNOWLEDGMENTS
tion the vertices of the other part into modules. We expect
that the characteristics of other specialized classes of net- The author thanks Ludwig Streit, Philippe Blanchard, and
works could be taken advantage of in an analogous fashion Thomas Roediger-Schluga for useful comments and sugges-
to define appropriate null model networks, modularity mea- tions. This work has been supported in part by the European
sures, and community detection algorithms. FP6-NEST-Adventure Programme, Contract No. 028875.

关1兴 C. Christensen and R. Albert, in Complex Networks Structure 关25兴 M. J. Barber, A. Krueger, T. Krueger, and T. Roediger-
and Dynamics, special issue of Int. J. Bifurcation Chaos Appl. Schluga, Phys. Rev. E 73, 036132 共2006兲.
Sci. Eng. 17, 2201 共2007兲. 关26兴 T. Roediger-Schluga and M. J. Barber, in Innovation Networks,
关2兴 M. E. J. Newman, in The New Palgrave Dictionary of Eco- special issue of IJFIP 共in press兲.
nomics, edited by S. N. Durlauf and L. E. Blume 共Palgrave 关27兴 M. A. Porter, P. J. Mucha, M. E. J. Newman, and A. J. Friend,
Macmillan, Basingstoke, 2008兲, 2nd ed. 共in press兲. Physica A 386, 414 共2007兲.
关3兴 M. E. J. Newman, SIAM Rev. 45, 167 共2003兲. 关28兴 R. Guimerà, M. Sales-Pardo, and L. A. N. Amaral, Phys. Rev.
关4兴 R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 共2002兲. E 76, 036102 共2007兲.
关5兴 L. Angelini, S. Boccaletti, D. Marinazzo, M. Pellicoro, and S. 关29兴 J. Duch and A. Arenas, Phys. Rev. E 72, 027104 共2005兲.
Stramaglia, Chaos 17, 023114 共2007兲. 关30兴 The assignment when xi = 0 is arbitrary, and makes no contri-
关6兴 V. Gol’dshtein and G. A. Koganov arXiv:physics/0607159 共un- bution to the modularity.
published兲. 关31兴 C. J. Alpert and S.-Z. Yao, in DAC ’95: Proceedings of the
关7兴 M. B. Hastings, Phys. Rev. E 74, 035102 共2006兲. 32nd ACM/IEEE Conference on Design Automation 共ACM
关8兴 M. E. J. Newman and E. A. Leicht, Proc. Natl. Acad. Sci. Press, New York, 1995兲, pp. 195–200.
U.S.A. 104, 9564 共2007兲. 关32兴 C. J. Alpert, A. B. Kahng, and D. S. Yao, Discrete Appl. Math.
关9兴 J. Reichardt and S. Bornholdt, Phys. Rev. E 74, 016110 90, 3 共1999兲.
共2006兲. 关33兴 An arbitrary rule is needed to break ties, for example, random
关10兴 G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, Nature 共London兲 assignment of the vertex to one of the modules that maximizes
435, 814 共2005兲. Q.
关11兴 M. E. J. Newman and M. Girvan, Phys. Rev. E 69, 026113 关34兴 More customary 共see, e.g., Ref. 关14兴兲 is to fix the expected
共2004兲. degree of the network vertices and vary the expected number
关12兴 A. Clauset, M. E. J. Newman, and C. Moore, Phys. Rev. E 70, of edges linking vertices in different modules, with pin and pout
066111 共2004兲. calculated from the expectation values. However, for the bipar-
关13兴 M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. U.S.A. tite network model under consideration, the base case, with
99, 7821 共2002兲. edges only existing between vertices in the same module, will
关14兴 L. Danon, A. Díaz-Guilera, J. Duch, and A. Arenas, J. Stat. often be excluded using this approach.
Mech.: Theory Exp. 共2005兲, P09008. 关35兴 Analogous measures can be defined in a straightforward fash-
关15兴 M. E. J. Newman, Eur. Phys. J. B 38, 321 共2004兲. ion using the portions of the index matrices that correspond to
关16兴 G.-C. Rota, Am. Math. Monthly 71, 498 共1964兲. just the red or blue vertices.
关17兴 J. M. Pujol, J. Bejar, and J. Delgado, Phys. Rev. E 74, 016107 关36兴 T. M. Cover and J. A. Thomas, Elements of Information
共2006兲. Theory, Wiley Series in Telecommunications 共Wiley-
关18兴 M. E. J. Newman, Phys. Rev. E 69, 066133 共2004兲. Interscience, New York, 1991兲.
关19兴 M. E. J. Newman, Phys. Rev. E 74, 036104 共2006兲. 关37兴 J. Scott and M. Hughes, The Anatomy of Scottish Capital:
关20兴 A. Davis, B. B. Gardner, and M. R. Gardner, Deep South 共Uni- Scottish Companies and Scottish Capital, 1900–1979 共Croom
versity of Chicago Press, Chicago, 1941兲. Helm, London, 1980兲.
关21兴 L. Freeman, in Dynamic Social Network Modeling and Analy- 关38兴 V. Batagelj and A. Mrvar, Pajek Datasets available at http://
sis, edited by R. Breiger, K. Carley, and P. Pattison 共The Na- vlado.fmf.uni-lj.si/pub/networks/data/
tional Academies Press, Washington, D.C., 2003兲. 关39兴 S. Fortunato and M. Barthelemy, Proc. Natl. Acad. Sci. U.S.A.
关22兴 P. Doreian, V. Batagelj, and A. Ferligoj, Soc. Networks 26, 29 104, 36 共2007兲.
共2004兲. 关40兴 F. R. K. Chung, Spectral Graph Theory, CBMS Regional Con-
关23兴 M. E. J. Newman, Phys. Rev. E 64, 016131 共2001兲. ference Series in Mathematics 共American Mathematical Soci-
关24兴 M. E. J. Newman, Phys. Rev. E 64, 016132 共2001兲. ety, Providence, RI, 1997兲.

066102-9

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy