Stable_Subgraph_Isomorphism_Search_in_Temporal_Networks
Stable_Subgraph_Isomorphism_Search_in_Temporal_Networks
Abstract—In this paper, we study a new problem of seeking stable subgraph isomorphisms for a query graph in a temporal graph.
To solve our problem, we first develop a pruning-based search algorithm using several new pruning tricks to prune the unpromising
matching results during the search procedure. To further improve the efficiency, we propose a novel index structure called BCCIndex,
based on an idea of bi-connected component decomposition of the query graph, which can efficiently support the stable subgraph
isomorphism search. Equipped with the BCCIndex, we present an efficient query processing algorithm based on a carefully designed
tree join technique. We conduct extensive experiments to evaluate our algorithms on four large real-life datasets, and the results
demonstrate the efficiency and effectiveness of our algorithms.
some implicit features of a network which may be useful for M : Vq ! V satisfying that 8ðui ; uj Þ 2 Eq ; ðMðui Þ; Mðuj ÞÞ 2
understanding the structure and function of a network. E and 8ðui ; uj Þ 2
= Eq ; ðMðui Þ; Mðuj ÞÞ 2
= E, which is also
The stable subgraph isomorphism search problem is NP- known as induced subgraph isomorphism. In this paper,
hard because when u ¼ 1, it degenerates a problem of find- we focus on the definition of the former, i.e., subgraph
ing subgraph isomorphisms in every snapshot of G, which monomorphism, and also use “subgraph isomorphism” to
is NP-hard [7], [8]. To solve the stable subgraph isomor- represent “subgraph monomorphism” in the following.
phism search problem, a straightforward solution is to use a Organization. We introduce some important notations
traditional subgraph isomorphism query algorithm, such as and formulate our problem in Section 2. The pruning-based
the Ullmann algorithm [9], to compute all subgraph isomor- search framework is presented in Section 3. Section 4
phisms for a query graph in each snapshot, and then pick presents the index structure and the index construction
the stable subgraph isomorphisms among all snapshots. method. The index-based query processing algorithm is
Clearly, such a solution is very costly, because the subgraph proposed in Section 5. Section 6 reports the experimental
isomorphism search is a NP-hard problem. To efficiently results. We survey the related work in Section 7 and con-
compute the stable subgraph isomorphisms in a temporal clude this work in Section 8.
graph, we develop a new pruning-based search algorithm
equipped with several pruning tricks based on the temporal
stability constraint, which can significantly reduce the
2 PRELIMINARIES
unpromising intermediate results during the search proce- Given an undirected and unlabeled temporal graph G ¼
dure. To further improve the efficiency, we propose a novel ðV; EÞ with n ¼ jVj vertices and m ¼ jEj temporal edges.
index structure, called BCCIndex, based on a bi-connected Each temporal edge e 2 E is a triplet ðu; v; tÞ, where u; v are
component (BCC) decomposition technique which can effi- vertices in V, and t is the interaction time between u and v.
ciently support stable subgraph isomorphism query. Armed We assume that t is an integer, because the timestamp is an
with the BCCIndex and a carefully-designed tree join tech- integer in practice. The de-temporal graph of G is defined as
nique, we develop an efficient query processing algorithm G ¼ ðV; EÞ by discarding all timestamps on the temporal
to find stable subgraph isomorphisms. To the best of our edges and condensing the multiple edges between any two
knowledge, our work is the first to apply the BCC indexing vertices into a single edge. Clearly, we have V ¼ V and E ¼
technique to solve the stable subgraph isomorphism search fðu; vÞjðu; v; tÞ 2 Eg. We denote the neighbors of a vertex u
problem in temporal networks. In summary, we make the by Nu ðGÞ, i.e., Nu ðGÞ ¼ fv 2 V jðu; vÞ 2 Eg, and the degree
following contributions. of u by degG ðuÞ ¼ jNu ðGÞj. Given a subset S V , the sub-
An Pruning Based Algorithm. We propose an pruning-based graph of G induced by S is defined as GS ¼ ðVS ; ES Þ where
stable subgraph isomorphism search algorithm PruneSearch VS ¼ S and ES ¼ fðu; vÞju; v 2 S; ðu; vÞ 2 Eg and we denote
integrated with several pruning techniques to avoid explor- as GS G. We omit the symbol G in the above notations
ing the unpromising intermediate results during the search when the context is clear.
procedure. We also present a parallel version of PruneSearch Given a temporal graph G ¼ ðV; EÞ, we can extract a
to improve the scalability of the algorithm. series of snapshots based on the timestamps. Considering an
An Index-Based Algorithm. We devise an index structure, arithmetic time sequence ft0 ; t1 ; t2 ; :::; tT g satisfying that
namely, BCCIndex, based on a BCC decomposition tech- ðti ti1 Þ is a constant for each integer i > 0, the i-th snap-
nique. Equipped with the BCCIndex, we propose an index- shot of G is a de-temporal graph Gi ¼ ðV; Ei Þ where Ei is a
based solution, i.e., BCCIndexSearch, to efficiently find stable set of edges that are extracted from E in the time interval
subgraph isomorphisms based on a newly-developed tree ðti1 ; ti and V remains the same in general. Let T be the
join technique. We also propose a parallel BCCIndex con- number of snapshots of G and we have T m. Denote by
struction algorithm and a parallel BCCIndexSearch algorithm GT the set of all snapshots of G based on the time interval. In
to further improve the scalability. the experiments, we set ðti ti1 Þ to a default value of 1
Extensive Experiments. We conduct comprehensive experi- month/year which means that every snapshot contains all
ments to evaluate the efficiency of the proposed algorithms the temporal edges in a 1-month/year length sliding win-
using four large real-world temporal graphs. The results dow. Fig. 1 illustrates a temporal graph G with 77 temporal
show that 1) BCCIndexSearch is very efficient which is edges and T ¼ 6. The de-temporal graph of G is shown in
around 1-3 orders of magnitude faster than PruneSearch; 2) Figs. 1b. Figs. 1c, 1d, 1e, 1f, 1g, and 1h are all the six snap-
the BCCIndex can be constructed in a reasonable time for shots from G1 to G6 of G, respectively.
large temporal graphs and also the size of BCCIndex is often Before introducing the definition of stable subgraph
not very large; 3) both the parallel PruneSearch and parallel embedding, we give the concepts of subgraph isomorphism
BCCIndexSearch can achieve very high speedup ratios. In and subgraph isomorphism embedding as follows.
addition, we also conduct a case study on a collaboration
Definition 1 (Subgraph isomorphism). Given a query graph
network DBLP. The results show that our solutions can
q ¼ ðVq ; Eq Þ, a data graph g ¼ ðVg ; Eg Þ, q is subgraph iso-
indeed find meaningful and stable research teams in DBLP.
morphic to g if and only if there exists an injective function
Remark. Note that given a graph G ¼ ðV; EÞ and a query
M : Vq ! Vg such that 8ðui ; uj Þ 2 Eq ; ðMðui Þ; Mðuj ÞÞ 2 Eg .
graph Q ¼ ðVq ; Eq Þ, there are two concepts of subgraph iso-
We call g a subgraph isomorphism of q and denote by q ’ g.
morphism in the studies of graph analysis. The first is
defined as an injective function M : Vq ! V such that Definition 2 (Subgraph isomorphism embedding). Given a
8ðui ; uj Þ 2 Eq ; ðMðui Þ; Mðuj ÞÞ 2 E, which is often called sub- query graph q ¼ ðVq ; Eq Þ and its subgraph isomorphism graph
graph monomorphism. The second is an injective function g ¼ ðVg ; Eg Þ, a subgraph isomorphism embedding is an
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6407
Then, there are 7 6 2-stable subgraph embeddings of q2 unpromising intermediate results using the temporal
in G whose stable values are 2. Further, when u > 2, the information, as well as an index-based algorithm with a
result of stable subgraph embedding search is empty. bi-connected component decomposition technique which
can efficiently support stable subgraph isomorphism
Clearly, for a query graph q, all subgraph isomorphism search.
embeddings can be easily revealed by a subgraph isomor-
phism g (i.e., q ’ g). Thus, in the remaining of this paper, we
use the term “isomorphism” to refer to “embedding” for 3 A PRUNING SEARCH ALGORITHM
simplicity when there is no ambiguity, and we may use This section proposes a pruning-based stable subgraph iso-
embedding, match, and mapping interchangeably. morphism search algorithm, called PruneSearch, to solve our
Remark. Note that our stable subgraph isomorphism problem. The PruneSearch extends the classic Ullmann algo-
search problem is different from the classic frequent sub- rithm to handle temporal graphs which also integrates sev-
graph mining problem. A frequent subgraph is a pattern eral pruning techniques to prune unpromising candidate
that appears multiple times on a large graph [10], [11], [12], matches. Below, we first introduce the pruning rules, fol-
[13], [14] or in a set of graphs [15], [16], [17], [18]. In particu- lowed by the PruneSearch algorithm.
lar, a temporal motif is a frequent subgraph with the tempo-
ral information of edges. A stable subgraph isomorphism is
3.1 The Pruning Rules
an embedding of a given query graph that appears in multi-
Let ui 2 Vq be a query vertex and vi 2 V be a data vertex.
ple snapshots over time. The key difference between the fre-
Cðui Þ denotes the candidate set of ui which includes the
quent subgraph mining problem and our stable subgraph
data vertices that may form a mapping ui ! vi in a stable
search problem is that the former does not require a query
subgraph isomorphism. Let T ðvi ; vj Þ, also known as active
graph as an input, while the latter is based on a query
time, be the collection of snapshots derived by the temporal
graph. Due to this key difference, existing solutions for the
edges in G whose end-vertices are vi and vj , i.e., T ðvi ; vj Þ ¼
problem of frequent subgraph mining cannot be directly
ftjðvi ; vj Þ 2 Et ; 1 t T g. Denote by g ¼ ðVg ; Eg Þ an arbi-
applied to solve our problem.
trary stable subgraph isomorphism of q in G. Below, we give
Challenges. We first discuss the hardness of the stable sub-
four observations based on which the search algorithm can
graph isomorphism search problem. Consider a special
prune the intermediate results that definitely cannot form a
case: u ¼ 1. Clearly, the problem is equivalent to finding
stable subgraph isomorphism.
subgraph isomorphism in each snapshot of G which is NP-
hard. Thus, finding all isomorphisms of a query graph in at Observation 1. For an edge ðvi ; vj Þ 2 G, if jfGt jðvi ; vj Þ 2
least u snapshots is also NP-hard. Et ; 1 t T gj < u holds, then ðvi ; vj Þ is not included in g,
To solve the stable subgraph isomorphism search prob- i.e., ðvi ; vj Þ 2
= Eg .
lem in temporal graphs, a straightforward solution is to
compute the stable value for each temporal subgraph iso- Observation 2. Degree reduction: for a data vertex vi in G, if
morphism in all snapshots and then pick the subgraphs jfGt jdegGt ðvi Þ degq ðui Þ; 1 t T gj < u holds, then
whose stable values are no less than the stability threshold u vi 2
= Cðui Þ.
as the answers. Such an approach, however, is very costly
Observation 3. Failed neighbors reduction: for a data vertex vi
for large temporal graphs. This is because the solution
in G, let Nvi ¼ Nvi ðGÞ. We iterative update the sets: Tui ðvi Þ
needs to explore all subgraph isomorphisms in all snap-
ftjjNvi ðGt Þ \ Nvi j degq ðui Þ; t 2 T g and Nvi fvj jjftjvj 2
shots of G, which is often intractable for large temporal
Nvi ðGt Þ; t 2 T g \ Tui ðvi Þj ug until Nvi does not changes. If
graphs due to its NP-hard. To improve the efficiency, a
Tui ðvi Þ u holds, we say that vi is a candidate of ui and
potential solution is to apply temporal information of the
Tui ðvi Þ is the active time of vi .
edges to prune the unpromising intermediate results that
definitely cannot obtain a stable subgraph isomorphism. Observation 4. For a data vertex vi 2 Cðui Þ, vi should be
The challenge of the problem is how can we apply the tem- removed from Cðui Þ if one of the following conditions is
poral information to speed up the stable subgraph isomor- satisfied:
phism search procedure.
In addition, searching subgraph isomorphisms is often 1) Neighbor restriction: 9uj 2 Nui ðqÞ; Nvi ðGÞ \ Cðuj Þ ¼
not very efficient for large graphs, because it typically needs ;.
to perform a backtracking search procedure on the large 2) Time restriction: 8vj 2 Nvi ðGÞ \ Cðuj Þ; jTui ðvi Þ \
data graph to identify all subgraph isomorphisms. A natu- Tuj ðvj Þ \ T ðvi ; vj Þj < u.
ral question is that can we design an index-based solution
to efficiently support the stable subgraph isomorphism 3.2 The Proposed Algorithm
query? Clearly, we cannot pre-compute all the stable sub- Equipped with the above pruning rules, we propose
graph isomorphisms for all possible small-sized subgraph PruneSearch to solve the stable subgraph isomorphism
queries (in practice, the size of the query subgraph is often search problem. The main idea of our PruneSearch is to find
smaller than 10). Thus, the challenge to answer this question the results by expanding partial solutions or abandoning
is how can we maintain some stable subgraph isomorphism them when they definitely cannot form full answers. The
results to efficiently support all possible subgraph queries. algorithm can reduce the computing of unpromising inter-
To tackle the above challenges, we propose a pruning- mediate matches during the backtracking procedure, thus
based search algorithm which can efficiently prune the improving the efficiency significantly.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6409
Algorithm 1. PruneSearch restriction and time restriction (Observation 4); we call this
Input: G ¼ ðV; EÞ, a query q ¼ ðVq ; Eq Þ, an integer u. process SecondFilter (lines 3-21). Specifically, for each uj 2
Output: The stable subgraph isomorphism set S. Nui ðqÞ, a variable disjoint, initialized to true, is used to indi-
1 S ;; G ^ ¼ ðV^; EÞ
^ G; cate that no vj 2 Nvi ðGÞ can be a candidate of uj . The proce-
2 Construct the de-temporal graph G ¼ ðV; EÞ of G; dure calculates tempt by considering both the time T
3 for ðvi ; vj Þ 2 E do obtained in the last iteration and the active times of vi , vj , and
4 T ðvi ; vj Þ ¼ ftjðvi ; vj Þ 2 Et ; 1 t T g; ðvi ; vj Þ (lines 10-11). If tempt u holds, uj can be mapped to
5 wðvi ;vj Þ ¼ jT ðvi ; vj Þj; vj under ui ! vi and M; thus, the SubGraphSearch sets
6 if wðvi ;vj Þ < u then Delete all edges ðvi ; vj ; tÞ from G; ^ disjoint to false and checks the next neighbor of ui (line 13).
7 if degG^ ðvi Þ ¼ 0 then Delete vi from G; ^ On the other hand, no data vertex is both a neighbor of vi and
8 if degG^ ðvj Þ ¼ 0 then Delete vj from G; ^ a candidate of uj ; thus, vi is not a candidate of ui and the pro-
9 for ui 2 Vq do cedure updates Cðui Þ based on the round of iteration
10 Cðui Þ ;; (lines 14-18). During the SecondFilter, once any candidate set
11 for vi 2 V^ do Cðui Þ is empty, the procedure terminates (line 21). After
12 Tui ðvi Þ ;; Nvi Nvi ðGÞ; N^ vi ;; pruning candidates, the SubGraphSearch picks an unmapped
13 flagðvi Þ true; vertex us with the smallest size of candidate-set as the
14 while Nvi 6¼ N^ vi do selected vertex, and then performs the next iteration for each
15 N^ vi Nvi ; candidate of us (lines 23-30). Before executing the iteration,
16 Tui ðvi Þ ftjjNvi ðGt Þ \ Nvi j degq ðui Þ; t 2 T g;
SubGraphSearch re-computes the active time Tc and updates
17 Nvi fvj jjftjvj 2 Nvi ðGt Þ; t 2 T g \ Tui ðvi Þj ug;
candidate sets by marking the indicator flag based on the
18 if jTui ðvi Þj u then
new mapping of us (lines 24-28). When the inner
19 Insert ðvi ; Tui ðvi Þ; flagðvi ÞÞ into Cðui Þ;
SubGraphSearch completes, the procedure needs to recover
20 break;
21 if jCðui Þj ¼ 0 then return S; the candidates’ status (line 30). Since SubGraphSearch itera-
tively maps vertices one by one from q to G, ^ it adds M into
22 M ;; T c ftj1 t T g;
23 SubGraphSearchðq; G; ^ M; Tc Þ; the result set S when jMj ¼ jVq j holds, thus a u-stable sub-
24 return S; graph isomorphism of q in G is discovered (line 1).
Remark. In PruneSearch, we can apply the symmetry-break- Fig. 4. The index-query graph set GIQ .
ing trick to handle the query graphs with automorphism map-
ping as described in [19], to make a stable subgraph BCCIndex maintains a sorted list of the stable subgraph iso-
isomorphism search only once. When finding an answer, the morphisms for all index-query graphs. Here, the index-
other matches with different mapping relationships can be query graphs are all BCC-graphs with size no larger than 5
easily revealed by the automorphisms of q. as illustrated in Fig. 4. Clearly, there are 15 index-query
graphs in total and we denote the collection of them as GIQ .
3.3 The Parallel PruneSearch Algorithm In many practical applications, the size of the query graph
To further improve the scalability, we develop a parallel is often no larger than 10. By BCCDecompose, the query
version of the pruning-based search algorithm, called graph can be decomposed into very small subgraphs. Thus,
PPruneSearch. Specifically, in lines 9-22 of Algorithm 1, cal- it is sufficient to store all the isomorphism results of such
culating the candidates in FirstFilter can be performed inde- small subgraphs in GIQ as an index and design an efficient
pendently, thus we can process the query vertices in index-based query algorithm to handle different query
parallel in this procedure. In addition, in lines 23-30 of Algo- graphs. In addition, we focus mainly on the stable subgraph
rithm 2, when SubGraphSearch is first called, i.e., M ¼ ;, it isomorphism search problem in temporal graphs. An iso-
picks the first query vertex uf and generates new mappings morphism is not considered as a stable isomorphism if it
with candidates to perform the deeper iterations. For each appears in only one snapshot. If we want to find an isomor-
candidate vi 2 Cðuf Þ, SubGraphSearch finds u-stable sub- phism that appears in at least one snapshot, the time con-
graph isomorphisms based on the mapping uf ! vi , thus straint will fail; and this problem degenerates to a problem
we can process all vi ’s in parallel as all of them are indepen- of finding all subgraph isomorphisms in each snapshot.
dent. We will show that the parallel algorithm PPruneSearch Therefore, we maintain the stable subgraph matches with
can achieve a very good speedup ratio on real-life graphs in stable values no less than 2 in our BCCIndex structure.
the experiments. More specifically, the BCCIndex structure, denoted by EI,
contains 15 sorted lists corresponding to the BCC-graphs in
Fig. 4. For each graph IQi 2 GIQ , we search stable subgraph
4 THE BCCIndex STRUCTURE
isomorphisms based on the stability threshold u ¼ 2. For a
In this section, we propose an index structure, called temporal isomorphism mIQi , we maintain the snapshots
BCCIndex, to efficiently support the stable subgraph isomor- T ðmIQi Þ that it appears in and calculate svðmIQi Þ. Then, a
phism query. Below, we first introduce the BCCIndex struc- sorted list EIðIQi Þ can be obtained by sorting all temporal
ture, followed by the index construction algorithms. isomorphisms in a non-increasing order of their stable val-
ues. The goal that we maintain the snapshots is to extend
4.1 The Proposed BCCIndex the partial solutions easily in our query processing algo-
Before introducing the index structure, we give the defini- rithm to find the complete stable subgraph isomorphisms.
tion of Bi-Connected Component (BCC) of a graph [20], [21], Note that if the stable value svðmIQi Þ equals 1, then mIQi is
[22]. Consider a graph G and a subgraph g, we say that g is not added into EIðIQi Þ. The following example illustrates
a BCC if 1) the remaining graph is still connected after the BCCIndex structure.
removing any 1 edge from g and 2) any super-graph in G of
g cannot satisfy 1). Any graph can be decomposed into sev- Example 5. Consider a temporal graph G in Fig. 1. The
eral BCCs and isolated vertices [20]. We refer to such a index structure of G for IQ0 , IQ3 and IQ12 is shown in
decomposition as BCCDecompose. For instance, by perform- Fig. 5. Due to space limitations, we only illustrate one
ing BCCDecompose on the graph in Fig. 3a, we can obtain instance of the automorphisms for query graphs. Clearly,
four BCCs induced by the vertices colored red, blue, green, the BCCIndex structure EIðIQi Þ maintains the stable sub-
and gray, respectively; the vertices u7 and u11 colored black graph isomorphisms whose stable values are no less than
are isolated. 2 and sorts them in a non-increasing order based on the
Based on the BCCDecompose, we develop an index struc- stable values. For instance, the triangle induced by
ture, called BCCIndex, which maintains all the stable sub- ðv1 ; v2 ; v4 Þ only appears in G1 , so it is not included in
graph embeddings for small-size BCCs. In particular, EIðIQ0 Þ. The match containing v6 ; v7 ; v8 exists in
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6411
Algorithm 3. BCCIndexBuild
Input: G ¼ ðV; EÞ, an index-query set GIQ .
Fig. 5. The BCCIndex structure of G in Fig. 1. Output The BCCIndex EI.
1 EI ;;
G2 ; G3 ; G4 ; G5 and G6 , and its stable value equals 5 which 2 for IQi 2 GIQ do
is the largest among all isomorphisms, thus it ranks first 3 EIðIQi Þ ;;
in EIðIQ0 Þ. Similarly, we can see that EIðIQ3 Þ contains 4 EIðIQi Þ PruneSearchðG; IQi ; 2Þ;
5 Sort all matches in EIðIQi Þ in a non-increasing order based
three matches induced by fv7 ; v9 ; v10 ; v11 g, fv2 ; v3 ; v5 ; v6 g,
the stable values;
and fv4 ; v3 ; v5 ; v6 g, respectively. It is easy to verify that
6 EI EI [ EIðIQi Þ;
their stable values are equal to 4, 3, and 2, as illustrated in
7 return EI;
Fig. 5b. The index EIðIQ12 Þ is shown in Fig. 5c which con-
tains seven stable subgraph isomorphisms of IQ12 in G.
Ge are removed and no strategy is obtained, Ge is recog- stability threshold u (line 27). Note that EIðIQq Þ and
nized as a GeneralBCC. EIðIQq^Þ are sorted lists, BCCMatch terminates when the sta-
ble value of a solution is less than u for handling both
Algorithm 4. BCCMatch IndexIsoBCC (line 25) and StarIsoBCC (line 31). Finally, the set
Input : G ¼ ðV; EÞ, a query q ¼ ðVq ; Eq Þ, a BCC Ge ¼ ðVe ; Ee Þ, MðGe Þ stores the stable subgraph isomorphisms of Ge .
the BCCIndex EI, an integer u. Example 6. Consider a temporal graph G in Fig. 1 and a
Output: The stable subgraph isomorphism set MðGe Þ of Ge . query graph q3 in Fig. 2c. Suppose that u ¼ 2. We denote
1 MðGe Þ ;; the BCC induced by the vertices colored blue as Ge . Obvi-
2 if 8IQq 2 GIQ ; @IQq ’ Ge then
ously, Ge is a StarIsoBCC because after removing u1 , the
3 Let Q be a priority queue;
remaining graph is isomorphic to IQ12 . For each solution
4 Q ;; isdecom false;
in EIðIQ12 Þ shown in Fig. 5c, we extend it by considering
5 for ui 2 Ve do Q:pushðui ; degðui ÞÞ;
6 while Q 6¼ ; do
the star induced by the edges ðu1 ; u2 Þ; ðu1 ; u3 Þ to obtain
7 ður ; dmin Þ Q:popðÞ; Let G ^ ¼ ðV^; EÞ
^ be a graph; the matches of Ge . We can easily check that only the
8 V^ Ve nfur g; E ^ Ee nfðui ; uj Þjui ¼ ur or uj ¼ ur g; match induced by fv7 ; v10 ; v8 ; v9 ; v11 g can be extended by
9 if @IQq^ 2 GIQ ; IQq^ ’ G ^ then continue; adding the mapping u1 ! v6 . With the automorphisms of
10 else Ge , the results of Ge are ðv6 ; v7 ; v8 ; v9 ; v10 ; v11 Þ and
11 isdecom true; Cður Þ V; ðv6 ; v7 ; v8 ; v11 ; v10 ; v9 Þ.
12 for m ^ 2 EIðIQq^Þ do Remark. For a query graph q, the BCCMatch algorithm pro-
13 if svðmÞ^ u then
cesses its decomposed-BCCs according to their types. When
14 for ður ; ui Þ 2 Ee do
the BCC is a IndexIsoBCC or StarIsoBCC, BCCMatch finds the
15 vi ^
m:findðu i Þ;
stable matches with the BCCIndex which maintains stable
16 Cður Þ Cður Þ \ Nvi ðGÞ;
results for index-query graphs with size no larger than 5.
17 for vr 2 Cður Þ do
18 T^ T; While for the GeneralBCC, the BCCMatch algorithm needs to
19 for ður ; ui Þ 2 Ee do perform PruneSearch to search stable isomorphisms on the
20 vi ^
m:findðu i Þ;
basis of the stability threshold u.
21 T^ ^ IQq^ Þ;
Tur ðvr Þ \ T ðvi ; vr Þ \ T ðm
22 if T^ u then
23 m ^ m:insertður ; vr Þ;
m;
5.2 A Heuristic Join Order
24 MðGe Þ MðGe Þ [ m; As aforementioned, a graph G can be decomposed into sev-
25 else break; eral BCCs and isolated vertices [20]. There is only one edge
26 break; to link one BCC/isolated vertex to another BCC/isolated
27 if isdecom ¼ false then MðGe Þ PruneSearch (G, Ge , u); vertex in G. Thus, we can convert G into a tree, called
28 else BCCTree, by treating each BCC or isolated vertex as a tree
29 for m 2 EIðIQq Þ do node and adding the edges between them. A tree node n is
30 if svðmÞ u then MðGe Þ MðGe Þ [ m; associated with a set, denoted as CNðnÞ, which represents
31 else break; the corresponding vertices in G. If n is an isolated vertex,
32 return MðGe Þ; we refer to n ¼ CNðnÞ. For brevity, Gn ¼ ðVn ; En Þ is used to
represent the subgraph induced by CNðnÞ. We connect the
The BCCMatch algorithm is depicted in Algorithm 4. Spe- tree nodes ni and nj if there is an edge between ui 2 CNðni Þ
cifically, it first identifies whether Ge is an IndexIsoBCC. If and uj 2 CNðnj Þ in G, and the tree edge we label as ðui ; uj Þ.
there is an index-query graph IQq satisfying IQq ’ Ge , In this way, the BCCTree of G is created.
BCCMatch outputs the solutions with stable values no less
Example 7. Consider a graph q in Fig. 3a. Clearly, there are
than u in EIðIQq Þ as the results (lines 29-31). Otherwise, the
four BCCs and two isolated vertices in q. Fig. 3b illustrates
BCCMatch checks whether Ge is a StarIsoBCC. It pushes the
the information of tree nodes. The BCCTree of q is shown
vertices in Ve into a priority queue Q following a non-
in Fig. 3c in which the tree node’s color is consistent with
decreasing order based on their degrees, and then pops the
the vertices’ color in CNðnÞ in Fig. 3a. We connect n1 and
first element in Q at each loop to find a decomposition strat-
n2 with the edge ðu5 ; u6 Þ in BCCTree because vertex u5 2
egy (lines 3-27). A variable isdecom, initialized as false, is
CNðn1 Þ and vertex u6 2 CNðn2 Þ are linked in q.
used to indicate whether Ge is a StarIsoBCC (line 4). When a
decomposition strategy is found, i.e., Ge can be decomposed Clearly, a join order can be derived by traversing
into an IndexIsoBCC G ^ and a star pivoted on the removed BCCTree from any tree node for our BCCIndexSearch algo-
vertex ur , BCCMatch sets isdecom to true (line 11). We denote rithm to merge the partial stable subgraph isomorphisms.
the index-query graph as IQq^ which is isomorphic to G. ^ However, the pruning performance of BCCIndexSearch with
Then, for each match with stable value no less than u in various join orders can be significantly different. Here we
EIðIQq^Þ, BCCMatch extends it to obtain the complete solu- design a heuristic join order for BCCIndexSearch by con-
tions of Ge (lines 12-25). If Q is empty and isdecom still structing a JoinTree. Let ci be a child of a tree node n and
equals false, that means removing any vertex in Ge cannot depðci Þ be the depth of descendants of ci in BCCTree. For
derive a decomposition, thus we recognize Ge as a two children c1 ; c2 of n, we define c1 c2 if: 1) jCNðc1 Þj >
GeneralBCC. In this case, BCCMatch performs PruneSearch to jCNðc2 Þj; 2) jCNðc1 Þj ¼ jCNðc2 Þj and depðc1 Þ > depðc2 Þ. Fol-
search stable isomorphisms for Ge on the basis of the lowing this order, the tree node with a larger size and
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6413
deeper descendant has a higher priority to join which is former chooses the neighbors of the matched vertex in M,
intuitively reasonable. Therefore, we create a root node nr and the latter selects candidates based on the solutions
of the JoinTree with the highest rank based on the order. For obtained by BCCMatch. When M contains all mappings of
each tree node in JoinTree, we then iteratively add its chil- query vertices in q, a u-stable subgraph isomorphism of q is
dren into JoinTree according to the above order too. Finally, found, thus Algorithm 5 adds it into the result set S
the breadth-first traversal sequence of JoinTree is our join (line 15).
order for merging the partial solutions in BCCIndexSearch.
Following such a heuristic join order can form a larger sub- Algorithm 5. BCCIndexSearch
graph of q, thus avoiding the invalid merging operation
Input: G ¼ ðV; EÞ, a query q ¼ ðVq ; Eq Þ, an integer u, the
from small substructures.
BCCIndex EI.
Example 8. Consider a query graph q in Fig. 3a and its Output: The stable subgraph isomorphism set S.
BCCTree in Fig. 3c. Fig. 3d illustrates the JoinTree of q 1 S ;; V B ;; VI ;;
obtained by our method. By performing breadth-first tra- 2 ðV B ; VI Þ BCCDecomposeðqÞ;
versal on the JoinTree, we can obtain a join order: n1 ! 3 Construct BCCTree BT and JoinTree JT ;
n2 ! n5 ! n4 ! n3 ! n6 . Following this order, when 4 Q the join order by traversing JT with BFS;
processing the children of node n2 , that is, when deciding 5 nr Q:popðÞ; Gr ¼ ðVr ; Er Þ Gnr ;
6 if Er ¼ then S PruneSearch (G, q, u);
on the next substructure to extend, the BCCIndexSearch
7 else
first merges n5 as it has the largest size among all chil-
8 for Gei 2 V B do
dren. Clearly, such a join operation can derive a relatively
9 MðGei Þ BCCMatchðG; q; Gei ; EI; uÞ;
large partial solution. 10 for mr 2 MðGr Þ do
11 T T ðmr Þ; M mr ;
5.3 The Query Processing Algorithm 12 nt Q:popðÞ; TreeJoinðq; G; M; T; nt Þ;
Here we present the query processing algorithm, namely, 13 return S;
BCCIndexSearch. The main idea of BCCIndexSearch is to find 14 Procedure TreeJoinðq; G; M; T; nc Þ
partial solutions for BCCs of q based on the BCCIndex and 15 if jMj ¼ jVqj then S S [ M;
then join them to derive the results. The pseudo-code of 16 else
BCCIndexSearch is outlined in Algorithm 5. 17 Gc ¼ ðVc ; Ec Þ Gnc ; V ðMÞ fvi jðui ; vi Þ 2 Mg;
Like PruneSearch, the map structure M in BCCIndexSearch 18 if Vc ¼ nc ; Ec ¼ ; do
19 ui ui ; ðui ; nc Þ 2 Eq ;
maintains the mapping relationships for each stable sub-
20 vi M:findðui Þ;
graph isomorphism. Algorithm 5 works as follows. First, it
21 for vj 2 Nvi ðGÞ do
performs BCCDecompose to calculate BCCs of q and adds
22 if fvj g \ V ðMÞ ¼ ; and T \ T ðvi ; vj Þ u then
BCCs into V B and isolated vertices into VI (lines 1-2). Based
23 T T \ T ðvi ; vj Þ; M:insertðuj ; vj Þ;
on V B and VI , BCCIndexSearch constructs the BCCTree BT 24 nt Q:popðÞ; TreeJoinðq; G; M; T; nt Þ;
and the JoinTree JT to obtain our heuristic join order Q by 25 else
breadth-first traversal of JT (lines 3-4). Then, The algorithm 26 for mc 2 MðGc Þ do
pops the head element nr in Q (the root of JT ) as the initial 27 V ðmc Þ fvi jðui ; vi Þ 2 mc g;
substructure. If nr is isolated, it means that the query 28 if V ðmc Þ \ V ðMÞ ¼ ; and T \ T ðmc Þ u then
graph q is a tree, and we search u-stable subgraph isomor- 29 T T \ T ðmc Þ; M mc ;
phisms by PruneSearch (line 6). On the other hand, the 30 nt Q:popðÞ; TreeJoinðq; G; M; T; nt Þ;
BCCIndexSearch algorithm performs BCCMatch to find u-sta- 31 end procedure
ble isomorphisms for BCCs based on our BCCIndex (lines 8-
9). Since nr is the initial substructure (nr ranks first), Remark. In the BCCIndexSearch algorithm, we also use the
BCCIndexSearch pops the head element nt in Q as the next symmetry-breaking trick to handle the query graphs with
selected substructure and performs the TreeJoin procedure automorphism mapping as described in [19], to make a sta-
to extend each u-stable isomorphism of nr (lines 10-12). ble subgraph isomorphism search only once.
Finally, the result set S containing the u-stable subgraph iso-
morphism of q is returned.
TreeJoin finds the u-stable subgraph isomorphisms of q by 5.4 The Parallel Query Algorithm
joining tree nodes based on the order in Q. It extends the To improve the scalability, we introduce a parallel version of
current match M by adding the current tree node nc ’s solu- the index-based search algorithm, called PBCCIndexSearch.
tions that satisfy both neighbor and time restrictions (Obser- Specifically, in line 8 of Algorithm 5, the calculation of u-sta-
vation 4). For a match mc of tree node nc , we recognize mc is ble subgraph isomorphisms for BCCs is independent, thus
not an active candidate of nc if: 1) multiple query vertices we can process BCCs in parallel. In addition, in lines 10-12 of
are mapped to the same data vertex; 2) the number of snap- Algorithm 5, the BCCIndexSearch chooses the nr with the
shots that contain both mc and M is less than u. For each mc , highest rank as the initial substructure and performs TreeJoin
TreeJoin identifies whether it is active. If no, the procedure for each u-stable isomorphism of nr . Similar to the
terminates. Otherwise, it performs extension depend on the PPruneSearch, this procedure can also be processed in paral-
types of nc : isolated vertex extension (lines 18-24) and BCC lel. Our experiments show that such a simple parallel imple-
extension (lines 25-30). The difference between these two mentation can achieve a very good speedup ratio compared
extensions is the selection of candidate match mc . The to the sequential algorithm.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6414 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023
TABLE 1
Datasets
6 EXPERIMENTS
In this section, we conduct extensive experiments to evalu-
ate the efficiency and effectiveness of the proposed algo-
rithms. For comparison, we implement an algorithm, called
BaselineSearch, as a baseline. BaselineSearch uses a parallel
Ullmann algorithm [9] to compute subgraph isomorphisms
for a query graph in the de-temporal graph, and then picks
the stable isomorphisms among them. We also implement Fig. 6. The query graphs in the experiments.
another baseline algorithm, called BaseSnapSearch, which
uses the VF2 algorithm [23] to compute subgraph isomor- we pick seven query graphs (namely, q2 ; q3 ; q4 ; q5 ; q6 ; q7 ; q8 in
phisms for each snapshot and then selects the stable sub- Fig. 6) to evaluate the proposed algorithms. q2 is used to evalu-
graph embeddings as the results. The BaseSnapSearch also ate the BCCIndexSearch algorithm for handling StarIsoBCC, and
applies the pruning technique derived from Observation 1 q7 and q8 are designed for processing GeneralBCC. For the sub-
before searching subgraph isomorphisms in each snapshot. graph isomorphism problem, a slight difference between two
For our pruning-based stable subgraph isomorphism search query graphs may result in significant query performance
algorithm, we implement PruneSearch (Algorithm 1). We changes. Thus, we select q4 ; q5 as the query graphs to exhibit
implement the BCCIndexBuild algorithm (Algorithm 3) to con- an “edge growing” pattern. Since different datasets have vari-
struct the BCCIndex, as well as the index-based stable subgraph ous time scales, the stability threshold u is set within different
isomorphism search algorithm BCCIndexSearch (Algorithm 5). time intervals. Specifically, for Chess, u is selected from the
All algorithms are implemented in C++. In addition, we also interval ½2; 7 with a default value 4. For Lkml and Enron, u is
implement the parallel versions of PruneSearch, BCCIndexBuild chosen from the interval ½5; 15 with a default value 9.
and BCCIndexSearch using OpenMP, namely, PPruneSearch, For DBLP, u is selected from the interval ½5; 10 with a default
PBCCIndexBuild and PBCCIndexSearch. All experiments are con- value 7.
ducted on a PC with 2.10GHz Intel(R) Xeon(R) Sliver 4110 (8-
core) CPU and 256GB memory running Red Hat 4.8.5-16. In
all experiments, both the temporal graph and the BCCIndex are 6.1 Performance Studies
stored in the main memory. We set the time limit to 7 days. Comparison Among BaselineSearch, BaseSnapSearch and
Datasets. We use four different types of real-world tem- PruneSearch. We evaluate BaselineSearch, BaseSnapSearch and
poral networks in the experiments. The detailed statistics of PruneSearch with varying parameters on different datasets.
the datasets are summarized in Table 1. Chess is a temporal Fig. 8 depicts the runtime of BaselineSearch, BaseSnapSearch
network in which each temporal edge represents two chess and PruneSearch with q1 and q3 on Chess. The results for other
players playing a game at time t. Lkml is a temporal commu- query graphs on Chess are consistent. From Fig. 8, we can see
nication network of the Linux kernel mailing list. Enron is an that the runtime of BaselineSearch increases very smoothly
email communication network between employees of Enron. with increasing k, while the runtime of BaseSnapSearch
DBLP is a temporal collaboration network of authors in dblp and PruneSearch decreases as u increases for each query graph.
from 1940 to 2018. In Table 1, dmax and jT j denote the maxi- This is because BaselineSearch needs to compute all
mum number of temporal edges associated with a vertex subgraph isomorphisms to select the stable results, while
and the number of snapshots respectively. All these datasets BaseSnapSearch and PruneSearch equipped with pruning tech-
are downloaded from http://konect.uni-koblenz.de/. niques can significantly reduce the search space of stable sub-
Parameters. There are two input parameters in our algo- graph isomorphisms. Moreover, the running time of
rithms: the query graph q and the stability threshold u. The PruneSearch is at least 3 orders of magnitude lower than that of
input query graphs used in our experiments are summa- BaselineSearch within all parameter settings as expected. Com-
rized in Fig. 6 which includes eight different types of sub- pared with BaseSnapSearch, the PruneSearch is faster for
graphs with 5-8 vertices. Since BCCIndex is constructed smaller u (i.e., u ¼ 2; 3), and for relatively larger u, the runtime
based on the indexed-query graphs, the BCCIndexSearch of two algorithms is close. This is because the pruning effect of
algorithm can calculate stable subgraph isomorphisms in Observation 1 can significantly reduce the scale of the graph
linear time if the query graph is one of the indexed-query on the basis of a larger u. For example, for query graph q3 on
graphs. Therefore, we only use q1 as an example where the Chess, when u equals 2, PruneSearch takes 0.562 seconds to out-
query graph is an indexed-query graph; for other indexed- put all stable subgraph isomorphisms, while BaselineSearch
query graphs, the results are consistent. For the hard cases, and BaseSnapSearch consume 103,953 seconds and 760 sec-
in which the query graph is not an indexed-query graph, onds. The runtime of PruneSearch is up to 5 orders of
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6415
magnitude and up to 2 orders of magnitude faster than that of As expected, the runtime of BaseSnapSearch, PruneSearch and
BaselineSearch and BaseSnapSearch, respectively. These results BCCIndexSearch decreases as u increases for each query graph.
confirm that the pruning-based algorithm PruneSearch is sub- In general, the three algorithms achieve the maximum run-
stantially faster than the BaselineSearch and BaseSnapSearch time at the smallest u. This is because for a smaller stability
algorithms in real-life temporal graphs, which is consistent threshold u, there are large numbers of temporal subgraph
with our analysis in Section 2. isomorphisms with stable values no less than u in the graph,
The Pruning Effect of FirstFilter and SecondFilter. In this thus increasing computational costs. Moreover, we can also
experiment, we evaluate the effect of FirstFilter and see that the proposed PruneSearch and BCCIndexSearch algo-
SecondFilter in PruneSearch, and we refer to FirF and SecF for rithms work well while the BaseSnapSearch algorithm
brevity. We also employ two pruning tricks, i.e., degree exceeds the time limit within most parameter settings. The
reduction and neighbor reduction (also known as DegF and running time of PruneSearch is significantly lower than that of
NbrF), in BaselineSearch for comparison. We evaluate these BaseSnapSearch for a small u over all datasets. For a relatively
pruning techniques before performing the recursive search large u, PruneSearch is also faster than BaseSnapSearch expect
procedures. For each query vertex, the initial size of candi- for some subgraphs on DBLP (i.e., u ¼ 9 or 10). This is because
dates is the number of vertices and we denote as Init. Fig. 7 the pruning technique derived from Observation 1 can signif-
shows the size of candidates for vertex u2 in q6 with differ- icantly reduce the scale of DBLP for a large u, which is indeed
ent pruning techniques on all the datasets. The results with the case for temporal collaboration networks. The runtime of
varying u for other query vertices are consistent using dif- BCCIndexSearch is at least one order of magnitude and two
ferent query graphs. As can be seen from Fig. 7, both FirF orders of magnitude lower than that of PruneSearch and
and SecF can significantly prune the vertices that are defi- BaseSnapSearch within almost all parameter settings, respec-
nitely not included in a stable subgraph isomorphism com- tively. For example, for query graph q2 on DBLP, when u ¼ 5,
pared with DegF and NbrF. For example, in the case of BCCIndexSearch takes 88 seconds to output all the stable sub-
u ¼ 10 on DBLP, the number of candidates of u2 in q6 after graph isomorphisms, while PruneSearch consumes 159,964 sec-
performing FirF is 874 vertices while the size of initial candi- onds and BaseSnapSearch cannot calculate the results within
dates is 1,729,816. SecF can further reduce the size of Cðu2 Þ the limited time. The runtime of BCCIndexSearch is at least
to 508. For DegF and NbrF, they only reduce the size of candi- three orders of magnitude faster than that of PruneSearch and
dates to 1,287,416 and 1,285,661 respectively. In addition, BaseSnapSearch. We also evaluate PruneSearch and
the pruning effect of SecF (NbrF) is not obviously superior to BCCIndexSearch with q7 and q8 . The results on DBLP are
that of FirF (DegF). This is because SecF and NbrF work better depicted in Fig. 10 and similar results can also be found for
in the case of extending partial matches, i.e., the recursion other datasets. As expected, the runtime of PruneSearch and
procedure of PruneSearch and BaselineSearch. These results BCCIndexSearch is relatively close. This is because the q7 and q8
suggest that our pruning techniques can significantly prune are GeneralBCC graphs, and BCCIndexSearch needs to perform
those vertices that are not included in a stable subgraph iso- PruneSearch to search all stable subgraph isomorphisms. These
morphism. Again, these results confirm that PruneSearch is results demonstrate that the index-based solution
significantly better than BaselineSearch, which are consistent BCCIndexSearch is substantially faster than the PruneSearch and
with our previous experiments. BaseSnapSearch algorithm in real-life temporal graphs for
Comparison Among BaseSnapSearch, PruneSearch and query graphs with small-size BCCs.
BCCIndexSearch. Fig. 9 shows the running time of the three Evaluation of Parallel Query Processing Algorithms. In this
algorithms with varying u for q1 q6 on different datasets. experiment, we evaluate the running time of PPruneSearch
and PBCCIndexSearch with varying the number of threads
t 2 f1; 2; 4; 8; 12; 16g. Fig. 11 illustrates the results of two
query graphs q3 ; q4 on Lkml and Enron. Similar results can also
be observed on the other datasets and using the other query
graphs. As can be seen from Fig. 11, the runtime of PBC
CIndexSearch is significantly lower than that of PPruneSearch.
For example, for query graph q4 on Enron, when t ¼ 16,
PBCCIndexSearch takes 5.9 seconds to output all stable sub-
graph isomorphisms, while PPruneSearch consumes 3,254.596
Fig. 8. Running time of BaselineSearch, BaseSnapSearch and PruneSearch seconds. The running time of BCCIndexSearch is at least two
on Chess. orders of magnitude faster than that of PPruneSearch.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6416 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023
Moreover, we can see that both PPruneSearch and PBCC can be seen, the BCCIndex sizes on all datasets are less than
IndexSearch can achieve near-linear speedup ratios over 2.5GB which can be easily stored in the main memory of a
these two datasets. For example, in Fig. 11d, when t ¼ 16 and modern computer. These results imply that the BCCIndex
q ¼ q4 , the speedup ratios of PPruneSearch and PBCCIndex size is not very large on real-life temporal graphs. In addi-
Search on Enron are roughly equal to 7.6 and 12, respectively. tion, Fig. 12b reports the BCCIndex construction time using
These results indicate that our parallel stable subgraph iso- the sequential algorithm BCCIndexBuild. In Chess, Lkml, and
morphism search algorithms can achieve very high speedup Enron, our index construction algorithm BCCIndexBuild is
ratios on real-life graphs. very efficient which takes less than 3 hours to construct
Evaluation of the BCCIndex. In this experiment, we evalu- BCCIndex. In DBLP, BCCIndexBuild is a little bit time-consum-
ate the performance of our index construction algorithm. ing, but it is still able to construct the BCCIndex within 7
Fig. 12a shows the size of BCCIndex and the graph size. As days (less than 600,000 seconds). Once the BCCIndex is
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6417
TABLE 2
The Number of Different Types With Varying u on HCWs
Fig. 16. Case study on DBLP with 5-clique query graph. Fig. 17. The ratio of different type with varying u on HCWs.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6419
the unnecessary cartesian products by employing neighbor- proposed algorithms. The results show that the index-based
hood equivalence class to merge similar vertices in a query solution BCCIndexSearch is around 1-3 orders of magnitude
graph q. Ren et al. [30] improved the efficiency of TurboISO faster than the PruneSearch algorithm. The results also show
based on a technique of compressing the data graph G. that our parallel implementations for both pruning-based
CFL-Match [28] made use of the spanning tree instead of search and index-based search algorithms can achieve very
the original query graph to postpone cartesian products. high speedup ratios. Finally, we conduct a case study in
Han et al. proposed DAF [29] which employs the knowledge DBLP and the results demonstrate that our solution for sta-
learned from past computations to reduce redundant com- ble subgraph isomorphism search can be useful to identify
putations. The performance of these subgraph matching stable research groups in DBLP.
algorithms was compared in several previous studies [31],
[32]. Most of those improved algorithms mentioned above REFERENCES
are tailored for static and labeled graphs. In this work, we [1] N. Przulj, D. G. Corneil, and I. Jurisica, “Efficient estimation of
investigate a new subgraph isomorphism problem in unla- graphlet frequency distributions in protein–protein interaction
beled temporal graphs and the algorithms mentioned above networks,” Bioinformatics, vol. 22, no. 8, pp. 974–980, 2006.
cannot be directly used for efficiently solving our problem. [2] T. A. Snijders, P. E. Pattison, G. L. Robins, and M. S. Handcock,
“New specifications for exponential random graph models,”
Temporal Graph Analysis. Our work is also related to tem- Sociol. Methodol., vol. 36, no. 1, pp. 99–153, 2006.
poral graph analysis which has attracted much attention in [3] X. Yan, P. S. Yu, and J. Han, “Graph indexing: A frequent struc-
recent years [17], [33], [34], [35], [36], [37], [38], [39], [40], ture-based approach,” in Proc. ACM SIGMOD Int. Conf. Manage.
Data, 2004, pp. 335–346.
[41], [42], [43]. For example, Bansal et al. [33] studied the [4] A.-L. Barabasi, “The origin of bursts and heavy tails in human
problem of identifying keyword clusters in large collections dynamics,” Nature, vol. 435, no. 7039, pp. 207–211, 2005.
of blog posts for specific temporal intervals. Li et al. [35] [5] P. Holme and J. Saram€aki, “Temporal networks,” Phys. Rep.,
introduced a persistent community model and developed vol. 519, no. 3, pp. 97–125, 2012.
[6] X. Qiu et al., “Real-time constrained cycle detection in large
algorithms to efficiently solve this problem. Gurukar et al. dynamic graphs,” VLDB, vol. 11, no. 12, pp. 1876–1888, 2018.
[17] proposed an algorithm to identify the recurring sub- [7] J. Hartmanis, “Computers and intractability: A guide to the theory
graphs in a temporal graph. Recently, the subgraph isomor- of np-completeness (Michael R. Garey and David S. Johnson),”
phism problem in temporal graphs has been studied. Siam Rev., vol. 24, no. 1, p. 90, 1982.
[8] H.-N. Tran, J.-J. Kim, and B. He, “Fast subgraph matching on large
Redmond and Cunningham [39] introduced a time-respect- graphs using graphics processors,” in Proc. Int. Conf. Database Syst.
ing subgraph isomorphism problem which requires the Adv. Appli., 2015, pp. 299–315.
edges of the query graph following a temporal order. [9] J. R. Ullmann, “An algorithm for subgraph isomorphism,”
J. ACM, vol. 23, no. 1, pp. 31–42, 1976.
Semertzidis et al. [40] studied the problem of mining dura- [10] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and
ble subgraph patterns in a temporal graph. Franzke et al. U. Alon, “Network motifs: Simple building blocks of complex
[41] investigated the problem of pattern search in temporal networks,” Science, vol. 298, no. 5594, pp. 824–827, 2002.
graphs, which focuses on whether the subgraphs exist after [11] R. Wan and H. Mamitsuka, “Discovering network motifs in
protein interaction networks,” in Biological Data Mining in Pro-
Dt time satisfying that the temporal order of edges is consis- tein Interaction Networks, Hershey, PA, USA: IGI Global, 2009,
tent with the temporal pattern. Thus, their model only pp. 117–143.
searches isomorphisms within a fixed time interval, which [12] M. Koyut€ urk, A. Grama, and W. Szpankowski, “An efficient algo-
rithm for detecting frequent subgraphs in biological networks,”
cannot be used to measure the stability. Wang et al. [44] Bioinformatics, vol. 20, no. suppl_1, pp. i200–i207, 2004.
introduced a temporal stable community model based on [13] C. Jiang, F. Coenen, and M. Zito, “A survey of frequent subgraph
the temporal similarity of edges and developed algorithms mining algorithms,” Knowl. Eng. Rev., vol. 28, no. 1, pp. 75–105,
by extending the Louvain method to detect stable communi- 2013.
[14] P. Zhao and J. X. Yu, “Fast frequent free tree mining in graph data-
ties. Since the definition of our problem is different from bases,” World Wide Web, vol. 11, no. 1, pp. 71–92, 2008.
those of the above problems, all the existing techniques can- [15] R. Jin, S. McCallen, and E. Almaas, “Trend motif: A graph mining
not be directly applied for solving our problem. To the best approach for analysis of dynamic complex networks,” in Proc.
of our knowledge, our work is the first to apply the BCC 17th IEEE Int. Conf. Data Mining, 2007, pp. 541–546.
[16] P. Gupta et al., “Real-time twitter recommendation: Online motif
indexing technique to solve the stable subgraph isomor- detection in large dynamic graphs,” Proc. VLDB Endow., vol. 7,
phism search problem in temporal networks. no. 13, pp. 1379–1380, 2014.
[17] S. Gurukar, S. Ranu, and B. Ravindran, “COMMIT: A scalable
approach to mining communication motifs from dynamic
8 CONCLUSION networks,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015,
pp. 475–489.
In this paper, we study the problem of finding stable sub- [18] R. Ahmed and G. Karypis, “Algorithms for mining the coevolving
graph isomorphisms in temporal graphs. To solve the prob- relational motifs in dynamic networks,” ACM Trans. Knowl. Dis-
cov. Data, vol. 10, no. 1, pp. 4:1–4:31, 2015.
lem, we first develop a pruning-based search algorithm [19] J. A. Grochow and M. Kellis, “Network motif discovery using sub-
based on several non-trivial pruning techniques which can graph enumeration and symmetry-breaking,” in Proc. Annu. Int.
significantly reduce unpromising intermediate results dur- Conf. Res. Comput. Mol. Biol., 2007, pp. 92–106.
[20] A. Gibbons, Algorithmic Graph Theory, Cambridge, U.K.: Cambridge
ing the search procedure. To further improve the efficiency, Univ. Press, 1985.
we propose a novel index structure, called BCCIndex, to sup- [21] T. Akiba, Y. Iwata, and Y. Yoshida, “Linear-time enumeration of
port the stable subgraph isomorphism search in temporal maximal k-edge-connected subgraphs in large networks by ran-
graphs efficiently. We also develop an efficient query proc- dom contraction,” in Proc. 22nd ACM Int. Conf. Inf. Knowl. Manage.,
2013, pp. 909–918.
essing algorithm based on the BCCIndex and an efficient tree [22] L. Chang, J. X. Yu, L. Qin, X. Lin, C. Liu, and W. Liang, “Efficiently
join technique. Finally, we conduct extensive experiments computing k-edge connected components via graph decomposition,”
using four real-life datasets to evaluate the efficiency of the in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2013, pp. 205–216.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6420 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023
[23] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento, “A (sub) Rong-Hua Li received the PhD degree from the
graph isomorphism algorithm for matching large graphs,” IEEE Chinese University of Hong Kong, in 2013. He is
Trans. Pattern Anal. Mach. Intell., vol. 26, no. 10, pp. 1367–1372, currently a professor with the Beijing Institute of
Sep. 2004. Technology, Beijing, China. His research interests
[24] H. Shang, Y. Zhang, X. Lin, and J. X. Yu, “Taming verification include graph data management and mining,
hardness: An efficient algorithm for testing subgraph iso- social network analysis, graph computation sys-
morphism,” PVLDB, vol. 1, no. 1, pp. 364–375, 2008. tems, and graph-based machine learning.
[25] H. He and A. K. Singh, “Graphs-at-a-time: Query language and
access methods for graph databases,” in Proc. ACM SIGMOD Int.
Conf. Manage. Data, 2008, pp. 405–418.
[26] P. Zhao and J. Han, “On graph query optimization in large
networks,” PVLDB, vol. 3, no. 1/2, pp. 340–351, 2010.
[27] W.-S. Han, J. Lee, and J.-H. Lee, “Turboiso: Towards ultrafast and Hongchao Qin received the BS degree in
robust subgraph isomorphism search in large graph databases,” mathematics and the ME and PhD degrees in
in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2013, pp. 337–348. computer science from Northeastern University,
[28] F. Bi, L. Chang, X. Lin, L. Qin, and W. Zhang, “Efficient subgraph China, in 2013, 2015 and 2020, respectively. He
matching by postponing cartesian products,” in Proc. ACM SIG- is currently a postdoc with the Beijing Institute of
MOD Int. Conf. Manage. Data, 2016, pp. 1199–1214. Technology, China. His current research interests
[29] M. Han, H. Kim, G. Gu, K. Park, and W.-S. Han, “Efficient sub- include social network analysis and data-driven
graph matching: Harmonizing dynamic programming, adaptive graph mining.
matching order, and failing set together,” in Proc. ACM SIGMOD
Int. Conf. Manage. Data, 2019, pp. 1429–1446.
[30] X. Ren and J. Wang, “Exploiting vertex relationships in speeding
up subgraph isomorphism over large graphs,” PVLDB, vol. 8,
no. 5, pp. 617–628, 2015.
Guoren Wang received the BSc, MSc, and PhD
[31] J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee, “An in-depth
comparison of subgraph isomorphism algorithms in graph data- degrees from the Department of Computer Sci-
bases,” VLDB, vol. 6, no. 2, pp. 133–144, 2012. ence, Northeastern University, China, in 1988,
[32] S. Sun and Q. Luo, “In-memory subgraph matching: An in-depth 1991 and 1996, respectively. Currently, he is a
study,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2020, professor with the Department of Computer Sci-
pp. 1083–1098. ence, Beijing Institute of Technology, Beijing,
China. His research interests include XML data
[33] N. Bansal, F. Chiang, N. Koudas, and F. W. Tompa, “Seeking sta-
ble clusters in the blogosphere,” in Proc. 33rd Int. Conf. Very Large management, query processing and optimization,
Data Bases, 2007, pp. 806–817. bioinformatics, high dimensional indexing, paral-
[34] H. Wu, J. Cheng, S. Huang, Y. Ke, Y. Lu, and Y. Xu, “Path prob- lel database systems, and cloud data manage-
lems in temporal graphs,” PVLDB, vol. 7, no. 9, pp. 721–732, 2014. ment. He has published more than 100 research
[35] R.-H. Li, J. Su, L. Qin, J. X. Yu, and Q. Dai, “Persistent community papers.
search in temporal networks,” in Proc. IEEE 34th Int. Conf. Data
Eng., 2018, pp. 797–808.
[36] Y. Yang, D. Yan, H. Wu, J. Cheng, S. Zhou, and J. C. Lui, Zhiwei Zhang received the BS degree from the
“Diversified temporal subgraph pattern mining,” in Proc. 22nd Renmin University of China, in 2010, and the
ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2016, PhD degree from the Chinese University of Hong
pp. 1965–1974. Kong, in 2014. He is currently a professor with
[37] Z. Yang, A. W.-C. Fu, and R. Liu, “Diversified top-k subgraph the Beijing Institute of Technology (BIT), Beijing,
querying in a large graph,” in Proc. Int. Conf. Manage. Data, 2016, China. His research interests include federal
pp. 1167–1182. learning, data pricing and transaction, distributed
[38] S. Ma, R. Hu, L. Wang, X. Lin, and J. Huai, “Fast computation of system, blockchain, and algorithm analysis.
dense temporal subgraphs,” in Proc. IEEE 33rd Int. Conf. Data
Eng., 2017, pp. 361–372.
[39] U. Redmond and P. Cunningham, “Subgraph isomorphism in
temporal networks,” 2016, arXiv:1605.02174.
[40] K. Semertzidis and E. Pitoura, “Durable graph pattern queries on Ye Yuan received the BS, MS, and PhD degrees in
historical graphs,” in Proc. IEEE 33rd Int. Conf. Data Eng., 2016, computer science from Northeastern University, in
pp. 541–552. 2004, 2007, and 2011, respectively. He is currently
[41] M. Franzke, T. Emrich, A. Z€ ufle, and M. Renz, “Pattern search in
a professor with the Department of Computer Sci-
temporal social networks,” in Proc. 21st Int. Conf. Extending Data-
ence, Northeastern University, China. His research
base Technol., 2018, pp. 289–300. interests include graph databases, probabilistic
[42] H. Qin, R.-H. Li, G. Wang, L. Qin, Y. Yuan, and Z. Zhang, “Mining databases, and social network analysis.
bursting communities in temporal graphs,” 2019, arXiv: 1911.02780.
[43] H. Qin, R. Li, G. Wang, L. Qin, Y. Cheng, and Y. Yuan, “Mining
periodic cliques in temporal networks,” in Proc. IEEE 33rd Int.
Conf. Data Eng., 2019, pp. 1130–1141.
[44] W. Wang and X. Li, “Temporal stable community in time-varying
networks,” IEEE Trans. Netw. Sci. Eng., vol. 7, no. 3, pp. 1508–1520, " For more information on this or any other computing topic,
Mar. 2019. please visit our Digital Library at www.computer.org/csdl.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.