0% found this document useful (0 votes)
5 views16 pages

Stable_Subgraph_Isomorphism_Search_in_Temporal_Networks

This paper addresses the problem of stable subgraph isomorphism search in temporal networks, proposing a pruning-based search algorithm and a novel index structure called BCCIndex to enhance efficiency. The authors conduct extensive experiments on real-life datasets, demonstrating that their algorithms significantly improve the speed and effectiveness of finding stable subgraph isomorphisms. The study highlights the importance of temporal stability in various applications, including collaboration networks and financial transaction analysis.

Uploaded by

Cayo Oliveira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views16 pages

Stable_Subgraph_Isomorphism_Search_in_Temporal_Networks

This paper addresses the problem of stable subgraph isomorphism search in temporal networks, proposing a pruning-based search algorithm and a novel index structure called BCCIndex to enhance efficiency. The authors conduct extensive experiments on real-life datasets, demonstrating that their algorithms significantly improve the speed and effectiveness of finding stable subgraph isomorphisms. The study highlights the importance of temporal stability in various applications, including collaboration networks and financial transaction analysis.

Uploaded by

Cayo Oliveira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO.

6, JUNE 2023 6405

Stable Subgraph Isomorphism Search


in Temporal Networks
Qi Zhang , Rong-Hua Li , Hongchao Qin , Guoren Wang , Zhiwei Zhang , and Ye Yuan

Abstract—In this paper, we study a new problem of seeking stable subgraph isomorphisms for a query graph in a temporal graph.
To solve our problem, we first develop a pruning-based search algorithm using several new pruning tricks to prune the unpromising
matching results during the search procedure. To further improve the efficiency, we propose a novel index structure called BCCIndex,
based on an idea of bi-connected component decomposition of the query graph, which can efficiently support the stable subgraph
isomorphism search. Equipped with the BCCIndex, we present an efficient query processing algorithm based on a carefully designed
tree join technique. We conduct extensive experiments to evaluate our algorithms on four large real-life datasets, and the results
demonstrate the efficiency and effectiveness of our algorithms.

Index Terms—Graph query, temporal graphs, subgraph isomorphism search

1 INTRODUCTION denotes the interaction time between u and v. Consider a


time sequence ft0 ; :::; ti ; :::; tT g. Suppose that ðti  ti1 Þ is a
UBGRAPH isomorphism (or subgraph matching) search is
S a fundamental problem in graph analysis. Given a data
graph G and a query graph q, subgraph isomorphism search
constant. Then, we refer to a graph as a snapshot if its tem-
poral edges appear at the time interval ðti1 ; ti .
Although the subgraph isomorphism search techniques
is a problem of finding all subgraphs in G that are isomor-
have been widely used in many graph analysis applications,
phic to q. Such a problem has been found in a wide range of
most previous studies on subgraph isomorphism query are
applications in network analysis, including mining motif
mainly tailored for traditional static and labeled graphs
substructures in biological networks [1], analyzing the evo-
which ignore the temporal information, thus cannot be
lution of social networks [2], and finding syntheses of target
applied to analyze temporal graphs. In this paper, we focus
structures in chemistry networks [3].
on the keyword “temporal” and study a new problem of
In applications such as analysis of collaboration net-
seeking stable subgraph isomorphisms in unlabeled tempo-
works, communication networks, financial transaction net-
ral graphs. Our goal is to find all subgraph isomorphisms
works, and online social networks, edges in these networks
for a given query graph that are stable over time. More spe-
are often associated with temporal information. For exam-
cifically, for a query graph q and a stability threshold u, the
ple, in a collaboration network of scientific papers, each co-
stable subgraph isomorphism search problem is to identify
authorship relation contains two authors and the time when
all subgraphs that are isomorphic to q in no less than u
they co-authored a paper. In an email communication net-
snapshots.
work, each email consists of a sender, a receiver, as well as
Such a stable subgraph isomorphism search problem can
the time when the email was sent. In a financial transaction
be used for many temporal graph analysis applications. For
network, each transaction includes a sender and a receiver,
example, in a collaboration network, a stable k-clique sub-
as well as the time when the transaction was completed. In
graph represents that the k authors have co-authored many
an online social network, each instant message may include
papers multiple times, indicating a long-term collaboration
two users and the time when the message was sent. Such
among them. A stable star-like structure may reveal a stable
networks are typically modeled as temporal graphs [4], [5].
cooperative team in multi-discipline areas that have co-
In a temporal graph, each edge is represented as a triplet
authored many papers over time. Finding these stable struc-
ðu; v; tÞ where u, v are the end nodes of the edge and t
tures may be helpful for identifying the team of experts to
conduct a particular research project. In an email communi-
 The authors are with the School of Computer Science and Technology, Bei- cation network between staff in a company, a stable star
jing Institute of Technology, Beijing 100081, China. E-mail: {qizhangcs, structure may reveal the staff’s implicit leadership, as a
yuan-ye}@bit.edu.cn, {lironghuabit, wanggrbit}@126.com, qhc.neu@gmail. leader may often send tasks to the other staff and the staff
com, cszwzhang@outlook.com.
may report their work to the leader frequently. In a financial
Manuscript received 27 Apr. 2021; revised 3 May 2022; accepted 12 May 2022.
transaction network, a stable small-circle structure may rep-
Date of publication 19 May 2022; date of current version 1 May 2023.
This work was supported in part by the National Key Research and Develop- resent a kind of financial fraud behavior [6]. Finding such
ment Program of China under Grant 2020AAA0108503 and in part by NSFC stable small-circle isomorphisms in a temporal financial
under Grants 62072034 and U1809206. transaction network can help detect financial fraud behav-
(Corresponding author: Guoren Wang.)
Recommended for acceptance by T. Weninger. iors. In addition, searching stable subgraph isomorphisms
Digital Object Identifier no. 10.1109/TKDE.2022.3175800 in temporal graphs can find stable communities and reveal
1041-4347 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6406 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023

some implicit features of a network which may be useful for M : Vq ! V satisfying that 8ðui ; uj Þ 2 Eq ; ðMðui Þ; Mðuj ÞÞ 2
understanding the structure and function of a network. E and 8ðui ; uj Þ 2
= Eq ; ðMðui Þ; Mðuj ÞÞ 2
= E, which is also
The stable subgraph isomorphism search problem is NP- known as induced subgraph isomorphism. In this paper,
hard because when u ¼ 1, it degenerates a problem of find- we focus on the definition of the former, i.e., subgraph
ing subgraph isomorphisms in every snapshot of G, which monomorphism, and also use “subgraph isomorphism” to
is NP-hard [7], [8]. To solve the stable subgraph isomor- represent “subgraph monomorphism” in the following.
phism search problem, a straightforward solution is to use a Organization. We introduce some important notations
traditional subgraph isomorphism query algorithm, such as and formulate our problem in Section 2. The pruning-based
the Ullmann algorithm [9], to compute all subgraph isomor- search framework is presented in Section 3. Section 4
phisms for a query graph in each snapshot, and then pick presents the index structure and the index construction
the stable subgraph isomorphisms among all snapshots. method. The index-based query processing algorithm is
Clearly, such a solution is very costly, because the subgraph proposed in Section 5. Section 6 reports the experimental
isomorphism search is a NP-hard problem. To efficiently results. We survey the related work in Section 7 and con-
compute the stable subgraph isomorphisms in a temporal clude this work in Section 8.
graph, we develop a new pruning-based search algorithm
equipped with several pruning tricks based on the temporal
stability constraint, which can significantly reduce the
2 PRELIMINARIES
unpromising intermediate results during the search proce- Given an undirected and unlabeled temporal graph G ¼
dure. To further improve the efficiency, we propose a novel ðV; EÞ with n ¼ jVj vertices and m ¼ jEj temporal edges.
index structure, called BCCIndex, based on a bi-connected Each temporal edge e 2 E is a triplet ðu; v; tÞ, where u; v are
component (BCC) decomposition technique which can effi- vertices in V, and t is the interaction time between u and v.
ciently support stable subgraph isomorphism query. Armed We assume that t is an integer, because the timestamp is an
with the BCCIndex and a carefully-designed tree join tech- integer in practice. The de-temporal graph of G is defined as
nique, we develop an efficient query processing algorithm G ¼ ðV; EÞ by discarding all timestamps on the temporal
to find stable subgraph isomorphisms. To the best of our edges and condensing the multiple edges between any two
knowledge, our work is the first to apply the BCC indexing vertices into a single edge. Clearly, we have V ¼ V and E ¼
technique to solve the stable subgraph isomorphism search fðu; vÞjðu; v; tÞ 2 Eg. We denote the neighbors of a vertex u
problem in temporal networks. In summary, we make the by Nu ðGÞ, i.e., Nu ðGÞ ¼ fv 2 V jðu; vÞ 2 Eg, and the degree
following contributions. of u by degG ðuÞ ¼ jNu ðGÞj. Given a subset S  V , the sub-
An Pruning Based Algorithm. We propose an pruning-based graph of G induced by S is defined as GS ¼ ðVS ; ES Þ where
stable subgraph isomorphism search algorithm PruneSearch VS ¼ S and ES ¼ fðu; vÞju; v 2 S; ðu; vÞ 2 Eg and we denote
integrated with several pruning techniques to avoid explor- as GS  G. We omit the symbol G in the above notations
ing the unpromising intermediate results during the search when the context is clear.
procedure. We also present a parallel version of PruneSearch Given a temporal graph G ¼ ðV; EÞ, we can extract a
to improve the scalability of the algorithm. series of snapshots based on the timestamps. Considering an
An Index-Based Algorithm. We devise an index structure, arithmetic time sequence ft0 ; t1 ; t2 ; :::; tT g satisfying that
namely, BCCIndex, based on a BCC decomposition tech- ðti  ti1 Þ is a constant for each integer i > 0, the i-th snap-
nique. Equipped with the BCCIndex, we propose an index- shot of G is a de-temporal graph Gi ¼ ðV; Ei Þ where Ei is a
based solution, i.e., BCCIndexSearch, to efficiently find stable set of edges that are extracted from E in the time interval
subgraph isomorphisms based on a newly-developed tree ðti1 ; ti  and V remains the same in general. Let T be the
join technique. We also propose a parallel BCCIndex con- number of snapshots of G and we have T  m. Denote by
struction algorithm and a parallel BCCIndexSearch algorithm GT the set of all snapshots of G based on the time interval. In
to further improve the scalability. the experiments, we set ðti  ti1 Þ to a default value of 1
Extensive Experiments. We conduct comprehensive experi- month/year which means that every snapshot contains all
ments to evaluate the efficiency of the proposed algorithms the temporal edges in a 1-month/year length sliding win-
using four large real-world temporal graphs. The results dow. Fig. 1 illustrates a temporal graph G with 77 temporal
show that 1) BCCIndexSearch is very efficient which is edges and T ¼ 6. The de-temporal graph of G is shown in
around 1-3 orders of magnitude faster than PruneSearch; 2) Figs. 1b. Figs. 1c, 1d, 1e, 1f, 1g, and 1h are all the six snap-
the BCCIndex can be constructed in a reasonable time for shots from G1 to G6 of G, respectively.
large temporal graphs and also the size of BCCIndex is often Before introducing the definition of stable subgraph
not very large; 3) both the parallel PruneSearch and parallel embedding, we give the concepts of subgraph isomorphism
BCCIndexSearch can achieve very high speedup ratios. In and subgraph isomorphism embedding as follows.
addition, we also conduct a case study on a collaboration
Definition 1 (Subgraph isomorphism). Given a query graph
network DBLP. The results show that our solutions can
q ¼ ðVq ; Eq Þ, a data graph g ¼ ðVg ; Eg Þ, q is subgraph iso-
indeed find meaningful and stable research teams in DBLP.
morphic to g if and only if there exists an injective function
Remark. Note that given a graph G ¼ ðV; EÞ and a query
M : Vq ! Vg such that 8ðui ; uj Þ 2 Eq ; ðMðui Þ; Mðuj ÞÞ 2 Eg .
graph Q ¼ ðVq ; Eq Þ, there are two concepts of subgraph iso-
We call g a subgraph isomorphism of q and denote by q ’ g.
morphism in the studies of graph analysis. The first is
defined as an injective function M : Vq ! V such that Definition 2 (Subgraph isomorphism embedding). Given a
8ðui ; uj Þ 2 Eq ; ðMðui Þ; Mðuj ÞÞ 2 E, which is often called sub- query graph q ¼ ðVq ; Eq Þ and its subgraph isomorphism graph
graph monomorphism. The second is an injective function g ¼ ðVg ; Eg Þ, a subgraph isomorphism embedding is an
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6407

Fig. 2. The examples of query graphs.

M of q, the stable value of M is defined as the number of snap-


shots that M appears in, i.e., svðMÞ ¼ jfGt jq ’ gM  Gt ;
1  t  T gj.
Stable value can measure the degree of stability of a tem-
poral subgraph embedding. A larger stable value sv repre-
sents that the vertices in gM maintain the connections over
sv times, which indicates a long-term stable structure. With
the stable value, a stable subgraph embedding is defined as
follows.
Definition 5. (Stable subgraph embedding) Given a temporal
graph G ¼ ðV; EÞ, a query graph q ¼ ðVq ; Eq Þ and an integer u
as stability threshold, a mapping M is a u-stable subgraph
embedding when it is a temporal subgraph embedding of q in
Fig. 1. Basic concepts of a temporal graph G.
at least u snapshots, i.e., svðMÞ  u.
Example 3. Consider a temporal graph G in Fig. 1 and a
injective mapping M : Vq ! Vg . We use gM to represent the query graph q1 in Fig. 2a. Suppose that the stability
subgraph isomorphism graph specified by the mapping M. threshold u equals 3. The temporal subgraph embedding
M1 ðu1 ! v2 ; u2 ! v3 ; u3 ! v4 ; u4 ! v5 Þ is only contained
Example 1. Consider a graph G in Fig. 1b and a query in G2 among all six snapshots of G. Thus we have
graph q1 in Fig. 2a. In G, the 4-clique C induced by the svðM1 Þ ¼ 1. While the mapping M2 ðu1 ! v7 ; u2 !
vertex set Vg ¼ fv2 ; v3 ; v4 ; v5 g is a subgraph isomorphism v9 ; u3 ! v10 ; u4 ! v11 Þ appears in four snapshots, namely,
of q1 . The injective mapping Mðu1 ! v2 ; u2 ! v3 ; u3 ! G3 , G4 , G5 and G6 , thus svðM2 Þ equals 4. By u ¼ 3, we can
v4 ; u4 ! v5 Þ is a subgraph isomorphism embedding. clearly see that M2 is a 3-stable subgraph embedding of q
And the mapping M 0 ðu1 ! v5 ; u2 ! v4 ; u3 ! v3 ; u4 ! v2 Þ but M1 is not due to svðM1 Þ ¼ 1 < 3.
is also a subgraph isomorphism embedding. Clearly,
there are 24 subgraph isomorphism embeddings in the Based on the above definitions, we formulate the prob-
4-clique C. lem of seeking stable subgraph embeddings in temporal
networks as follows.
Below, we introduce the concepts of temporal subgraph Problem Formulation. Given a temporal graph G ¼ ðV; EÞ, a
embedding and stable value, which are essential to define a query graph q ¼ ðVq ; Eq Þ and an integer u, our goal is to find
stable subgraph embedding. all stable subgraph embeddings of q in G on the basis of the
stability threshold u.
Definition 3. (Temporal subgraph embedding) Given a temporal
The following example illustrates the definition of our
graph G ¼ ðV; EÞ and a query graph q ¼ ðVq ; Eq Þ, for any snap-
problem.
shot Gi ¼ ðVi ; Ei Þ 2 G, if an injective function M : Vq ! Vi
satisfies 8ðui ; uj Þ 2 Eq ; ðMðui Þ; Mðuj ÞÞ 2 Ei , we say that M Example 4. Reconsider the temporal graph G in Fig. 1 and a
is a temporal subgraph embedding of q. We use Gi ðqÞ to query graph q1 in Fig. 2a. There are six subgraph isomor-
represent the collection of subgraph isomorphism graphs based phisms of q1 in all snapshots of G which are induced by
on all temporal subgraph embeddings in a snapshot Gi , i.e., V1 ¼ fv2 ; v3 ; v4 ; v5 g, V2 ¼ fv2 ; v3 ; v4 ; v6 g, V3 ¼ fv2 ; v3 ; v5 ;
Gi ðqÞ ¼ fgM jq ’ gM  Gi g. v6 g, V4 ¼ fv2 ; v4 ; v5 ; v6 g, V5 ¼ fv3 ; v4 ; v5 ; v6 g and V6 ¼
fv7 ; v9 ; v10 ; v11 g, respectively. Each subgraph isomorphism
Example 2. Consider a temporal graph G in Fig. 1a and a
can generate 24 temporal subgraph embeddings with the
query graph q1 in Fig. 2a. In the snapshot G2 of G, the
same stable values. The stable values of temporal sub-
injective mapping Mðu1 ! v2 ; u2 ! v3 ; u3 ! v4 ; u4 ! v5 Þ
graph embeddings corresponding to the six subgraph iso-
is a temporal subgraph embedding. Moreover, we can see
morphisms are 1, 1, 3, 1, 2 and 4, respectively. Suppose
that in the snapshot G3 , the mappings: M1 ðu1 ! v2 ; u2 !
that the stability threshold u ¼ 3, the answers of stable
v3 ; u3 ! v5 ; u4 ! v6 Þ, M2 ðu1 ! v3 ; u2 ! v4 ; u3 ! v5 ; u4 !
subgraph embedding search problem are the mappings
v6 Þ, M3 ðu1 ! v7 ; u2 ! v9 ; u3 ! v10 ; u4 ! v11 Þ are also tem-
generated by the subgraphs induced by V3 and V6 . When
poral subgraph embeddings.
u equals 5, there is no 5-stable subgraph embedding as
Definition 4. (Stable value) Given a temporal graph G ¼ ðV; EÞ, none of the 6  24 temporal subgraph embeddings satisfy-
a query graph q ¼ ðVq ; Eq Þ and a temporal subgraph embedding ing svðÞ  5. Consider q2 in Fig. 2b as a query graph.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6408 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023

Then, there are 7  6 2-stable subgraph embeddings of q2 unpromising intermediate results using the temporal
in G whose stable values are 2. Further, when u > 2, the information, as well as an index-based algorithm with a
result of stable subgraph embedding search is empty. bi-connected component decomposition technique which
can efficiently support stable subgraph isomorphism
Clearly, for a query graph q, all subgraph isomorphism search.
embeddings can be easily revealed by a subgraph isomor-
phism g (i.e., q ’ g). Thus, in the remaining of this paper, we
use the term “isomorphism” to refer to “embedding” for 3 A PRUNING SEARCH ALGORITHM
simplicity when there is no ambiguity, and we may use This section proposes a pruning-based stable subgraph iso-
embedding, match, and mapping interchangeably. morphism search algorithm, called PruneSearch, to solve our
Remark. Note that our stable subgraph isomorphism problem. The PruneSearch extends the classic Ullmann algo-
search problem is different from the classic frequent sub- rithm to handle temporal graphs which also integrates sev-
graph mining problem. A frequent subgraph is a pattern eral pruning techniques to prune unpromising candidate
that appears multiple times on a large graph [10], [11], [12], matches. Below, we first introduce the pruning rules, fol-
[13], [14] or in a set of graphs [15], [16], [17], [18]. In particu- lowed by the PruneSearch algorithm.
lar, a temporal motif is a frequent subgraph with the tempo-
ral information of edges. A stable subgraph isomorphism is
3.1 The Pruning Rules
an embedding of a given query graph that appears in multi-
Let ui 2 Vq be a query vertex and vi 2 V be a data vertex.
ple snapshots over time. The key difference between the fre-
Cðui Þ denotes the candidate set of ui which includes the
quent subgraph mining problem and our stable subgraph
data vertices that may form a mapping ui ! vi in a stable
search problem is that the former does not require a query
subgraph isomorphism. Let T ðvi ; vj Þ, also known as active
graph as an input, while the latter is based on a query
time, be the collection of snapshots derived by the temporal
graph. Due to this key difference, existing solutions for the
edges in G whose end-vertices are vi and vj , i.e., T ðvi ; vj Þ ¼
problem of frequent subgraph mining cannot be directly
ftjðvi ; vj Þ 2 Et ; 1  t  T g. Denote by g ¼ ðVg ; Eg Þ an arbi-
applied to solve our problem.
trary stable subgraph isomorphism of q in G. Below, we give
Challenges. We first discuss the hardness of the stable sub-
four observations based on which the search algorithm can
graph isomorphism search problem. Consider a special
prune the intermediate results that definitely cannot form a
case: u ¼ 1. Clearly, the problem is equivalent to finding
stable subgraph isomorphism.
subgraph isomorphism in each snapshot of G which is NP-
hard. Thus, finding all isomorphisms of a query graph in at Observation 1. For an edge ðvi ; vj Þ 2 G, if jfGt jðvi ; vj Þ 2
least u snapshots is also NP-hard. Et ; 1  t  T gj < u holds, then ðvi ; vj Þ is not included in g,
To solve the stable subgraph isomorphism search prob- i.e., ðvi ; vj Þ 2
= Eg .
lem in temporal graphs, a straightforward solution is to
compute the stable value for each temporal subgraph iso- Observation 2. Degree reduction: for a data vertex vi in G, if
morphism in all snapshots and then pick the subgraphs jfGt jdegGt ðvi Þ  degq ðui Þ; 1  t  T gj < u holds, then
whose stable values are no less than the stability threshold u vi 2
= Cðui Þ.
as the answers. Such an approach, however, is very costly
Observation 3. Failed neighbors reduction: for a data vertex vi
for large temporal graphs. This is because the solution
in G, let Nvi ¼ Nvi ðGÞ. We iterative update the sets: Tui ðvi Þ
needs to explore all subgraph isomorphisms in all snap-
ftjjNvi ðGt Þ \ Nvi j  degq ðui Þ; t 2 T g and Nvi fvj jjftjvj 2
shots of G, which is often intractable for large temporal
Nvi ðGt Þ; t 2 T g \ Tui ðvi Þj  ug until Nvi does not changes. If
graphs due to its NP-hard. To improve the efficiency, a
Tui ðvi Þ  u holds, we say that vi is a candidate of ui and
potential solution is to apply temporal information of the
Tui ðvi Þ is the active time of vi .
edges to prune the unpromising intermediate results that
definitely cannot obtain a stable subgraph isomorphism. Observation 4. For a data vertex vi 2 Cðui Þ, vi should be
The challenge of the problem is how can we apply the tem- removed from Cðui Þ if one of the following conditions is
poral information to speed up the stable subgraph isomor- satisfied:
phism search procedure.
In addition, searching subgraph isomorphisms is often 1) Neighbor restriction: 9uj 2 Nui ðqÞ; Nvi ðGÞ \ Cðuj Þ ¼
not very efficient for large graphs, because it typically needs ;.
to perform a backtracking search procedure on the large 2) Time restriction: 8vj 2 Nvi ðGÞ \ Cðuj Þ; jTui ðvi Þ \
data graph to identify all subgraph isomorphisms. A natu- Tuj ðvj Þ \ T ðvi ; vj Þj < u.
ral question is that can we design an index-based solution
to efficiently support the stable subgraph isomorphism 3.2 The Proposed Algorithm
query? Clearly, we cannot pre-compute all the stable sub- Equipped with the above pruning rules, we propose
graph isomorphisms for all possible small-sized subgraph PruneSearch to solve the stable subgraph isomorphism
queries (in practice, the size of the query subgraph is often search problem. The main idea of our PruneSearch is to find
smaller than 10). Thus, the challenge to answer this question the results by expanding partial solutions or abandoning
is how can we maintain some stable subgraph isomorphism them when they definitely cannot form full answers. The
results to efficiently support all possible subgraph queries. algorithm can reduce the computing of unpromising inter-
To tackle the above challenges, we propose a pruning- mediate matches during the backtracking procedure, thus
based search algorithm which can efficiently prune the improving the efficiency significantly.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6409

Algorithm 1. PruneSearch restriction and time restriction (Observation 4); we call this
Input: G ¼ ðV; EÞ, a query q ¼ ðVq ; Eq Þ, an integer u. process SecondFilter (lines 3-21). Specifically, for each uj 2
Output: The stable subgraph isomorphism set S. Nui ðqÞ, a variable disjoint, initialized to true, is used to indi-
1 S ;; G ^ ¼ ðV^; EÞ
^ G; cate that no vj 2 Nvi ðGÞ can be a candidate of uj . The proce-
2 Construct the de-temporal graph G ¼ ðV; EÞ of G; dure calculates tempt by considering both the time T
3 for ðvi ; vj Þ 2 E do obtained in the last iteration and the active times of vi , vj , and
4 T ðvi ; vj Þ ¼ ftjðvi ; vj Þ 2 Et ; 1  t  T g; ðvi ; vj Þ (lines 10-11). If tempt  u holds, uj can be mapped to
5 wðvi ;vj Þ ¼ jT ðvi ; vj Þj; vj under ui ! vi and M; thus, the SubGraphSearch sets
6 if wðvi ;vj Þ < u then Delete all edges ðvi ; vj ; tÞ from G; ^ disjoint to false and checks the next neighbor of ui (line 13).
7 if degG^ ðvi Þ ¼ 0 then Delete vi from G; ^ On the other hand, no data vertex is both a neighbor of vi and
8 if degG^ ðvj Þ ¼ 0 then Delete vj from G; ^ a candidate of uj ; thus, vi is not a candidate of ui and the pro-
9 for ui 2 Vq do cedure updates Cðui Þ based on the round of iteration
10 Cðui Þ ;; (lines 14-18). During the SecondFilter, once any candidate set
11 for vi 2 V^ do Cðui Þ is empty, the procedure terminates (line 21). After
12 Tui ðvi Þ ;; Nvi Nvi ðGÞ; N^ vi ;; pruning candidates, the SubGraphSearch picks an unmapped
13 flagðvi Þ true; vertex us with the smallest size of candidate-set as the
14 while Nvi 6¼ N^ vi do selected vertex, and then performs the next iteration for each
15 N^ vi Nvi ; candidate of us (lines 23-30). Before executing the iteration,
16 Tui ðvi Þ ftjjNvi ðGt Þ \ Nvi j  degq ðui Þ; t 2 T g;
SubGraphSearch re-computes the active time Tc and updates
17 Nvi fvj jjftjvj 2 Nvi ðGt Þ; t 2 T g \ Tui ðvi Þj  ug;
candidate sets by marking the indicator flag based on the
18 if jTui ðvi Þj  u then
new mapping of us (lines 24-28). When the inner
19 Insert ðvi ; Tui ðvi Þ; flagðvi ÞÞ into Cðui Þ;
SubGraphSearch completes, the procedure needs to recover
20 break;
21 if jCðui Þj ¼ 0 then return S; the candidates’ status (line 30). Since SubGraphSearch itera-
tively maps vertices one by one from q to G, ^ it adds M into
22 M ;; T c ftj1  t  T g;
23 SubGraphSearchðq; G; ^ M; Tc Þ; the result set S when jMj ¼ jVq j holds, thus a u-stable sub-
24 return S; graph isomorphism of q in G is discovered (line 1).

The pseudo-code of PruneSearch is shown in Algorithm 1. ^ M; T Þ


Algorithm 2. SubGraphSearch ðq; G;
It first constructs the de-temporal graph G and processes G
by removing the edges that appear in less than u snapshots 1 if jMj ¼ jVqj then S S [ M;
as well as isolated vertices (lines 2-8). Such edges and verti- 2 else
ces are not contained in a stable subgraph isomorphism with 3 forui 2 Vq do
threshold u according to Observation 1. The algorithm then 4 cntui 0;
calculates candidate sets for query vertices in q by applying 5 for ðvi ; Tui ðvi Þ; flagðvi ÞÞ 2 Cðui Þ and flagðvi Þ ¼ true do
6 for uj 2 Nui ðqÞ do
degree reduction (Observation 2) and failed neighbors
7 disjoint true;
reduction (Observation 3); we call this process FirstFilter ^ do
8 for vj 2 Nvi ðGÞ
(lines 9-21). In FirstFilter, PruneSearch determines whether vi is
9 if ðvj ; Tuj ðvj Þ; flagðvj ÞÞ 2 Cðuj Þ and flagðvj Þ ¼ true
a candidate of ui by iteratively updating the sets: Tui ðvi Þ and
then
Nvi (lines 14-17). Here a set N^ vi is used to check whether Nvi 10 tempt Tui ðvi Þ \ Tuj ðvj Þ \ T ðvi ; vj Þ;
was no longer changed. When the loop ends, if jTui ðvi Þj  u 11 tempt tempt \ T ;
holds, that means vi is a candidate of ui . PruneSearch adds vi 12 if jtempt j  u then
into Cðui Þ with a variable flagðvi Þ ¼ true and active time 13 disjoint false; break;
Tui ðvi Þ (lines 18-20). The variable flagðvi Þ is used to indicate 14 if disjoint ¼ true then
whether the candidate vi is valid. If vi can be mapped to a 15 if M ¼ ; then
query vertex ui , we set flagðvi Þ to true in Cðui Þ, otherwise it is 16 Delete ðvi ; Tui ðvi Þ; flagðvi ÞÞ from Cðui Þ;
false. After calculating the initial Cðui Þ, if it is empty, that 17 else
means the query vertex ui cannot be mapped to any data ver- 18 ðvi ; Tui ðvi Þ; flagðvi ÞÞ ðvi ; Tui ðvi Þ; falseÞ;
tex in G, thus the algorithm terminates (line 21). On the other 19 for ðvi ; Tui ðvi Þ; flagðvi ÞÞ 2 Cðui Þ do
hand, PruneSearch invokes the SubGraphSearch procedure to 20 if flagðvi Þ ¼ true then cntui cntui þ 1;
iteratively map vertices one by one from q to G ^ with the can- 21 if cntui ¼ 0 then return;
didate sets for seeking the stable subgraph isomorphisms on 22 U ¼ fui jui 2 Mg; us ¼ argminui 2ðVq nUÞ cntui ;
the basis of u (line 23). Note that during the backtracking pro- 23 for ðvi ; Tui ðvi Þ; flagðvi ÞÞ 2 Cðus Þ and flagðvi Þ ¼ true do
cedure, the search is based on partial matches, thus some 24 M:insertðus ; vi Þ; Tc ¼ T \ Tui ðvi Þ;
candidates in Cðui Þ will fail and SubGraphSearch will update 25 for ui 2 Vq nus do
the status of flag for them (line 18, lines 25-28, line 30 in Algo- 26 ðvi ; Tui ðvi Þ; flagðvi Þ ðvi ; Tui ðvi Þ; falseÞ;
rithm 2). Finally, it returns S as the answers. 27 for vg 2 V nvi do
28 ðvg ; Tus ðvg Þ; flagðvg Þ ðvg ; Tus ðvg Þ; falseÞ;
The workflow of SubGraphSearch is outlined in Algo- ^ M; Tc Þ;
29 SubGraphSearchðq; G;
rithm 2. M is employed to maintain the mapping informa-
30 Perform the inverse operation of lines 25-28 for all
tion ui ! vi . The procedure prunes candidates for query
Cðui Þs;
vertices by identifying whether they satisfy both neighbor
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6410 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023

Fig. 3. Illustration of BCCDecompose.

Remark. In PruneSearch, we can apply the symmetry-break- Fig. 4. The index-query graph set GIQ .
ing trick to handle the query graphs with automorphism map-
ping as described in [19], to make a stable subgraph BCCIndex maintains a sorted list of the stable subgraph iso-
isomorphism search only once. When finding an answer, the morphisms for all index-query graphs. Here, the index-
other matches with different mapping relationships can be query graphs are all BCC-graphs with size no larger than 5
easily revealed by the automorphisms of q. as illustrated in Fig. 4. Clearly, there are 15 index-query
graphs in total and we denote the collection of them as GIQ .
3.3 The Parallel PruneSearch Algorithm In many practical applications, the size of the query graph
To further improve the scalability, we develop a parallel is often no larger than 10. By BCCDecompose, the query
version of the pruning-based search algorithm, called graph can be decomposed into very small subgraphs. Thus,
PPruneSearch. Specifically, in lines 9-22 of Algorithm 1, cal- it is sufficient to store all the isomorphism results of such
culating the candidates in FirstFilter can be performed inde- small subgraphs in GIQ as an index and design an efficient
pendently, thus we can process the query vertices in index-based query algorithm to handle different query
parallel in this procedure. In addition, in lines 23-30 of Algo- graphs. In addition, we focus mainly on the stable subgraph
rithm 2, when SubGraphSearch is first called, i.e., M ¼ ;, it isomorphism search problem in temporal graphs. An iso-
picks the first query vertex uf and generates new mappings morphism is not considered as a stable isomorphism if it
with candidates to perform the deeper iterations. For each appears in only one snapshot. If we want to find an isomor-
candidate vi 2 Cðuf Þ, SubGraphSearch finds u-stable sub- phism that appears in at least one snapshot, the time con-
graph isomorphisms based on the mapping uf ! vi , thus straint will fail; and this problem degenerates to a problem
we can process all vi ’s in parallel as all of them are indepen- of finding all subgraph isomorphisms in each snapshot.
dent. We will show that the parallel algorithm PPruneSearch Therefore, we maintain the stable subgraph matches with
can achieve a very good speedup ratio on real-life graphs in stable values no less than 2 in our BCCIndex structure.
the experiments. More specifically, the BCCIndex structure, denoted by EI,
contains 15 sorted lists corresponding to the BCC-graphs in
Fig. 4. For each graph IQi 2 GIQ , we search stable subgraph
4 THE BCCIndex STRUCTURE
isomorphisms based on the stability threshold u ¼ 2. For a
In this section, we propose an index structure, called temporal isomorphism mIQi , we maintain the snapshots
BCCIndex, to efficiently support the stable subgraph isomor- T ðmIQi Þ that it appears in and calculate svðmIQi Þ. Then, a
phism query. Below, we first introduce the BCCIndex struc- sorted list EIðIQi Þ can be obtained by sorting all temporal
ture, followed by the index construction algorithms. isomorphisms in a non-increasing order of their stable val-
ues. The goal that we maintain the snapshots is to extend
4.1 The Proposed BCCIndex the partial solutions easily in our query processing algo-
Before introducing the index structure, we give the defini- rithm to find the complete stable subgraph isomorphisms.
tion of Bi-Connected Component (BCC) of a graph [20], [21], Note that if the stable value svðmIQi Þ equals 1, then mIQi is
[22]. Consider a graph G and a subgraph g, we say that g is not added into EIðIQi Þ. The following example illustrates
a BCC if 1) the remaining graph is still connected after the BCCIndex structure.
removing any 1 edge from g and 2) any super-graph in G of
g cannot satisfy 1). Any graph can be decomposed into sev- Example 5. Consider a temporal graph G in Fig. 1. The
eral BCCs and isolated vertices [20]. We refer to such a index structure of G for IQ0 , IQ3 and IQ12 is shown in
decomposition as BCCDecompose. For instance, by perform- Fig. 5. Due to space limitations, we only illustrate one
ing BCCDecompose on the graph in Fig. 3a, we can obtain instance of the automorphisms for query graphs. Clearly,
four BCCs induced by the vertices colored red, blue, green, the BCCIndex structure EIðIQi Þ maintains the stable sub-
and gray, respectively; the vertices u7 and u11 colored black graph isomorphisms whose stable values are no less than
are isolated. 2 and sorts them in a non-increasing order based on the
Based on the BCCDecompose, we develop an index struc- stable values. For instance, the triangle induced by
ture, called BCCIndex, which maintains all the stable sub- ðv1 ; v2 ; v4 Þ only appears in G1 , so it is not included in
graph embeddings for small-size BCCs. In particular, EIðIQ0 Þ. The match containing v6 ; v7 ; v8 exists in
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6411

large, the corresponding subgraph isomorphism results can


be exponentially large which typically cannot be stored as an
off-line index. However, for temporal graphs, when adding
the temporal constraint, the results of temporal subgraph iso-
morphism are often not very large which generally can be
stored in modern computers as an index (as confirmed in our
experiments). Therefore, we can apply the BCCIndex technique
to solve the stable subgraph isomorphism search problem on
temporal graphs.

Algorithm 3. BCCIndexBuild
Input: G ¼ ðV; EÞ, an index-query set GIQ .
Fig. 5. The BCCIndex structure of G in Fig. 1. Output The BCCIndex EI.
1 EI ;;
G2 ; G3 ; G4 ; G5 and G6 , and its stable value equals 5 which 2 for IQi 2 GIQ do
is the largest among all isomorphisms, thus it ranks first 3 EIðIQi Þ ;;
in EIðIQ0 Þ. Similarly, we can see that EIðIQ3 Þ contains 4 EIðIQi Þ PruneSearchðG; IQi ; 2Þ;
5 Sort all matches in EIðIQi Þ in a non-increasing order based
three matches induced by fv7 ; v9 ; v10 ; v11 g, fv2 ; v3 ; v5 ; v6 g,
the stable values;
and fv4 ; v3 ; v5 ; v6 g, respectively. It is easy to verify that
6 EI EI [ EIðIQi Þ;
their stable values are equal to 4, 3, and 2, as illustrated in
7 return EI;
Fig. 5b. The index EIðIQ12 Þ is shown in Fig. 5c which con-
tains seven stable subgraph isomorphisms of IQ12 in G.

4.2 The BCCIndex Construction 5 THE INDEX-BASED SEARCH ALGORITHM


A Sequential Implementation. We present the BCCIndexBuild
We propose an index-based query processing algorithm,
algorithm to construct the BCCIndex structure EI. The
called BCCIndexSearch, to search stable subgraph isomor-
pseudo-code of BCCIndexBuild is shown in Algorithm 3. For
phisms for a query graph q. The main idea is to decompose
each index-query graph IQi in GIQ , the BCCIndexBuild per-
q into BCCs and isolated vertices, and then join the partial
forms PruneSearch to search stable subgraph isomorphisms
solutions to obtain the results. Below, we first introduce an
on the basis of the stability threshold u ¼ 2. When
algorithm, namely BCCMatch, to find the stable subgraph
PruneSearch finds a stable subgraph isomorphism M, it
isomorphisms for a BCC based on the BCCIndex, followed by
pushes M with the snapshots, that it appears in, into the
a heuristic join order and the BCCIndexSearch algorithm to
sorted list EIðIQi Þ and then sorts these matches in EIðIQi Þ
solve our problem.
in a non-increasing order of their stable values.
A Parallel Implementation. To improve the scalability, we
discuss the parallel methods for index construction. Specifi- 5.1 The BCCMatch Algorithm
cally, in lines 2-6 of Algorithm 3, seeking all stable subgraph After decomposing the query graph q, there may exist some
isomorphisms for each index-graph IQi can be performed BCCs with sizes larger than 5 which are not contained in our
independently by applying PruneSearch, thus we can pro- BCCIndex. Hence, we propose an algorithm, called
cess these index-query graphs in parallel. On the other BCCMatch, to handle this case. Specifically, in BCCMatch, all
hand, in line 4 of Algorithm 3, for each IQi 2 GIQ , we can BCCs are categorized into three types as follows. 1)
perform PPruneSearch (instead of PruneSearch) to calculate IndexIsoBCC: a BCC that is an isomorphism of any index-
stable subgraph isomorphisms for the first query vertex’s query graph in GIQ ; 2) StarIsoBCC: a BCC that can be decom-
candidates in parallel. posed into an IndexIsoBCC and a star; 3) GeneralBCC: the
remaining BCCs that do not satisfy 1) and 2). For instance,
4.3 Discussions the BCC illustrated in Fig. 2a is an isomorphism of IQ3 , thus
To the best of our knowledge, the proposed BCCIndex is a it is an IndexIsoBCC. In Fig. 2c, the BCC colored blue is not
novel technique to solve the stable subgraph isomorphism isomorphic to any index-query graph, but we can decom-
search problem in temporal graphs. Furthermore, we are the pose it into an IndexIsoBCC induced by fu2 ; u3 ; u4 ; u5 ; u6 g
first to apply the technique of bi-connected component and a star pivoted at u1 , thus it is a StarIsoBCC.
decomposition, i.e., BCCDecompose, to solve the subgraph iso- We propose an efficient algorithm to identify whether a
morphism search problem. We do not use BCCIndex for tradi- BCC is a StarIsoBCC as follows. Given a BCC graph Ge ¼
tional static subgraph isomorphism search based on the ðVe ; Ee Þ, we sort the vertices in Ve in a non-decreasing order
following reasons. First, for the static labeled graphs, the types of the degree and then remove vertices following this order-
of labeled bi-connected components grow exponentially with ing to decompose Ge . After removing the vertex u and the
the number of labels, because each vertex can be associated edges ending with u, we check whether the remaining
with different labels. It is intractable to compute and store all graph is an IndexIsoBCC. If so, a decomposition strategy is
the subgraph isomorphism results for all labeled bi-connected found, and thus Ge is a StarIsoBCC. Otherwise, we continue
components. Second, for the unlabeled static graphs, although to remove the next vertex and perform the above procedure
the number of bi-connected components may not be very to determine a decomposition strategy. When all vertices in
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6412 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023

Ge are removed and no strategy is obtained, Ge is recog- stability threshold u (line 27). Note that EIðIQq Þ and
nized as a GeneralBCC. EIðIQq^Þ are sorted lists, BCCMatch terminates when the sta-
ble value of a solution is less than u for handling both
Algorithm 4. BCCMatch IndexIsoBCC (line 25) and StarIsoBCC (line 31). Finally, the set
Input : G ¼ ðV; EÞ, a query q ¼ ðVq ; Eq Þ, a BCC Ge ¼ ðVe ; Ee Þ, MðGe Þ stores the stable subgraph isomorphisms of Ge .
the BCCIndex EI, an integer u. Example 6. Consider a temporal graph G in Fig. 1 and a
Output: The stable subgraph isomorphism set MðGe Þ of Ge . query graph q3 in Fig. 2c. Suppose that u ¼ 2. We denote
1 MðGe Þ ;; the BCC induced by the vertices colored blue as Ge . Obvi-
2 if 8IQq 2 GIQ ; @IQq ’ Ge then
ously, Ge is a StarIsoBCC because after removing u1 , the
3 Let Q be a priority queue;
remaining graph is isomorphic to IQ12 . For each solution
4 Q ;; isdecom false;
in EIðIQ12 Þ shown in Fig. 5c, we extend it by considering
5 for ui 2 Ve do Q:pushðui ; degðui ÞÞ;
6 while Q 6¼ ; do
the star induced by the edges ðu1 ; u2 Þ; ðu1 ; u3 Þ to obtain
7 ður ; dmin Þ Q:popðÞ; Let G ^ ¼ ðV^; EÞ
^ be a graph; the matches of Ge . We can easily check that only the
8 V^ Ve nfur g; E ^ Ee nfðui ; uj Þjui ¼ ur or uj ¼ ur g; match induced by fv7 ; v10 ; v8 ; v9 ; v11 g can be extended by
9 if @IQq^ 2 GIQ ; IQq^ ’ G ^ then continue; adding the mapping u1 ! v6 . With the automorphisms of
10 else Ge , the results of Ge are ðv6 ; v7 ; v8 ; v9 ; v10 ; v11 Þ and
11 isdecom true; Cður Þ V; ðv6 ; v7 ; v8 ; v11 ; v10 ; v9 Þ.
12 for m ^ 2 EIðIQq^Þ do Remark. For a query graph q, the BCCMatch algorithm pro-
13 if svðmÞ^  u then
cesses its decomposed-BCCs according to their types. When
14 for ður ; ui Þ 2 Ee do
the BCC is a IndexIsoBCC or StarIsoBCC, BCCMatch finds the
15 vi ^
m:findðu i Þ;
stable matches with the BCCIndex which maintains stable
16 Cður Þ Cður Þ \ Nvi ðGÞ;
results for index-query graphs with size no larger than 5.
17 for vr 2 Cður Þ do
18 T^ T; While for the GeneralBCC, the BCCMatch algorithm needs to
19 for ður ; ui Þ 2 Ee do perform PruneSearch to search stable isomorphisms on the
20 vi ^
m:findðu i Þ;
basis of the stability threshold u.
21 T^ ^ IQq^ Þ;
Tur ðvr Þ \ T ðvi ; vr Þ \ T ðm
22 if T^  u then
23 m ^ m:insertður ; vr Þ;
m;
5.2 A Heuristic Join Order
24 MðGe Þ MðGe Þ [ m; As aforementioned, a graph G can be decomposed into sev-
25 else break; eral BCCs and isolated vertices [20]. There is only one edge
26 break; to link one BCC/isolated vertex to another BCC/isolated
27 if isdecom ¼ false then MðGe Þ PruneSearch (G, Ge , u); vertex in G. Thus, we can convert G into a tree, called
28 else BCCTree, by treating each BCC or isolated vertex as a tree
29 for m 2 EIðIQq Þ do node and adding the edges between them. A tree node n is
30 if svðmÞ  u then MðGe Þ MðGe Þ [ m; associated with a set, denoted as CNðnÞ, which represents
31 else break; the corresponding vertices in G. If n is an isolated vertex,
32 return MðGe Þ; we refer to n ¼ CNðnÞ. For brevity, Gn ¼ ðVn ; En Þ is used to
represent the subgraph induced by CNðnÞ. We connect the
The BCCMatch algorithm is depicted in Algorithm 4. Spe- tree nodes ni and nj if there is an edge between ui 2 CNðni Þ
cifically, it first identifies whether Ge is an IndexIsoBCC. If and uj 2 CNðnj Þ in G, and the tree edge we label as ðui ; uj Þ.
there is an index-query graph IQq satisfying IQq ’ Ge , In this way, the BCCTree of G is created.
BCCMatch outputs the solutions with stable values no less
Example 7. Consider a graph q in Fig. 3a. Clearly, there are
than u in EIðIQq Þ as the results (lines 29-31). Otherwise, the
four BCCs and two isolated vertices in q. Fig. 3b illustrates
BCCMatch checks whether Ge is a StarIsoBCC. It pushes the
the information of tree nodes. The BCCTree of q is shown
vertices in Ve into a priority queue Q following a non-
in Fig. 3c in which the tree node’s color is consistent with
decreasing order based on their degrees, and then pops the
the vertices’ color in CNðnÞ in Fig. 3a. We connect n1 and
first element in Q at each loop to find a decomposition strat-
n2 with the edge ðu5 ; u6 Þ in BCCTree because vertex u5 2
egy (lines 3-27). A variable isdecom, initialized as false, is
CNðn1 Þ and vertex u6 2 CNðn2 Þ are linked in q.
used to indicate whether Ge is a StarIsoBCC (line 4). When a
decomposition strategy is found, i.e., Ge can be decomposed Clearly, a join order can be derived by traversing
into an IndexIsoBCC G ^ and a star pivoted on the removed BCCTree from any tree node for our BCCIndexSearch algo-
vertex ur , BCCMatch sets isdecom to true (line 11). We denote rithm to merge the partial stable subgraph isomorphisms.
the index-query graph as IQq^ which is isomorphic to G. ^ However, the pruning performance of BCCIndexSearch with
Then, for each match with stable value no less than u in various join orders can be significantly different. Here we
EIðIQq^Þ, BCCMatch extends it to obtain the complete solu- design a heuristic join order for BCCIndexSearch by con-
tions of Ge (lines 12-25). If Q is empty and isdecom still structing a JoinTree. Let ci be a child of a tree node n and
equals false, that means removing any vertex in Ge cannot depðci Þ be the depth of descendants of ci in BCCTree. For
derive a decomposition, thus we recognize Ge as a two children c1 ; c2 of n, we define c1 c2 if: 1) jCNðc1 Þj >
GeneralBCC. In this case, BCCMatch performs PruneSearch to jCNðc2 Þj; 2) jCNðc1 Þj ¼ jCNðc2 Þj and depðc1 Þ > depðc2 Þ. Fol-
search stable isomorphisms for Ge on the basis of the lowing this order, the tree node with a larger size and
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6413

deeper descendant has a higher priority to join which is former chooses the neighbors of the matched vertex in M,
intuitively reasonable. Therefore, we create a root node nr and the latter selects candidates based on the solutions
of the JoinTree with the highest rank based on the order. For obtained by BCCMatch. When M contains all mappings of
each tree node in JoinTree, we then iteratively add its chil- query vertices in q, a u-stable subgraph isomorphism of q is
dren into JoinTree according to the above order too. Finally, found, thus Algorithm 5 adds it into the result set S
the breadth-first traversal sequence of JoinTree is our join (line 15).
order for merging the partial solutions in BCCIndexSearch.
Following such a heuristic join order can form a larger sub- Algorithm 5. BCCIndexSearch
graph of q, thus avoiding the invalid merging operation
Input: G ¼ ðV; EÞ, a query q ¼ ðVq ; Eq Þ, an integer u, the
from small substructures.
BCCIndex EI.
Example 8. Consider a query graph q in Fig. 3a and its Output: The stable subgraph isomorphism set S.
BCCTree in Fig. 3c. Fig. 3d illustrates the JoinTree of q 1 S ;; V B ;; VI ;;
obtained by our method. By performing breadth-first tra- 2 ðV B ; VI Þ BCCDecomposeðqÞ;
versal on the JoinTree, we can obtain a join order: n1 ! 3 Construct BCCTree BT and JoinTree JT ;
n2 ! n5 ! n4 ! n3 ! n6 . Following this order, when 4 Q the join order by traversing JT with BFS;
processing the children of node n2 , that is, when deciding 5 nr Q:popðÞ; Gr ¼ ðVr ; Er Þ Gnr ;
6 if Er ¼ then S PruneSearch (G, q, u);
on the next substructure to extend, the BCCIndexSearch
7 else
first merges n5 as it has the largest size among all chil-
8 for Gei 2 V B do
dren. Clearly, such a join operation can derive a relatively
9 MðGei Þ BCCMatchðG; q; Gei ; EI; uÞ;
large partial solution. 10 for mr 2 MðGr Þ do
11 T T ðmr Þ; M mr ;
5.3 The Query Processing Algorithm 12 nt Q:popðÞ; TreeJoinðq; G; M; T; nt Þ;
Here we present the query processing algorithm, namely, 13 return S;
BCCIndexSearch. The main idea of BCCIndexSearch is to find 14 Procedure TreeJoinðq; G; M; T; nc Þ
partial solutions for BCCs of q based on the BCCIndex and 15 if jMj ¼ jVqj then S S [ M;
then join them to derive the results. The pseudo-code of 16 else
BCCIndexSearch is outlined in Algorithm 5. 17 Gc ¼ ðVc ; Ec Þ Gnc ; V ðMÞ fvi jðui ; vi Þ 2 Mg;
Like PruneSearch, the map structure M in BCCIndexSearch 18 if Vc ¼ nc ; Ec ¼ ; do
19 ui ui ; ðui ; nc Þ 2 Eq ;
maintains the mapping relationships for each stable sub-
20 vi M:findðui Þ;
graph isomorphism. Algorithm 5 works as follows. First, it
21 for vj 2 Nvi ðGÞ do
performs BCCDecompose to calculate BCCs of q and adds
22 if fvj g \ V ðMÞ ¼ ; and T \ T ðvi ; vj Þ  u then
BCCs into V B and isolated vertices into VI (lines 1-2). Based
23 T T \ T ðvi ; vj Þ; M:insertðuj ; vj Þ;
on V B and VI , BCCIndexSearch constructs the BCCTree BT 24 nt Q:popðÞ; TreeJoinðq; G; M; T; nt Þ;
and the JoinTree JT to obtain our heuristic join order Q by 25 else
breadth-first traversal of JT (lines 3-4). Then, The algorithm 26 for mc 2 MðGc Þ do
pops the head element nr in Q (the root of JT ) as the initial 27 V ðmc Þ fvi jðui ; vi Þ 2 mc g;
substructure. If nr is isolated, it means that the query 28 if V ðmc Þ \ V ðMÞ ¼ ; and T \ T ðmc Þ  u then
graph q is a tree, and we search u-stable subgraph isomor- 29 T T \ T ðmc Þ; M mc ;
phisms by PruneSearch (line 6). On the other hand, the 30 nt Q:popðÞ; TreeJoinðq; G; M; T; nt Þ;
BCCIndexSearch algorithm performs BCCMatch to find u-sta- 31 end procedure
ble isomorphisms for BCCs based on our BCCIndex (lines 8-
9). Since nr is the initial substructure (nr ranks first), Remark. In the BCCIndexSearch algorithm, we also use the
BCCIndexSearch pops the head element nt in Q as the next symmetry-breaking trick to handle the query graphs with
selected substructure and performs the TreeJoin procedure automorphism mapping as described in [19], to make a sta-
to extend each u-stable isomorphism of nr (lines 10-12). ble subgraph isomorphism search only once.
Finally, the result set S containing the u-stable subgraph iso-
morphism of q is returned.
TreeJoin finds the u-stable subgraph isomorphisms of q by 5.4 The Parallel Query Algorithm
joining tree nodes based on the order in Q. It extends the To improve the scalability, we introduce a parallel version of
current match M by adding the current tree node nc ’s solu- the index-based search algorithm, called PBCCIndexSearch.
tions that satisfy both neighbor and time restrictions (Obser- Specifically, in line 8 of Algorithm 5, the calculation of u-sta-
vation 4). For a match mc of tree node nc , we recognize mc is ble subgraph isomorphisms for BCCs is independent, thus
not an active candidate of nc if: 1) multiple query vertices we can process BCCs in parallel. In addition, in lines 10-12 of
are mapped to the same data vertex; 2) the number of snap- Algorithm 5, the BCCIndexSearch chooses the nr with the
shots that contain both mc and M is less than u. For each mc , highest rank as the initial substructure and performs TreeJoin
TreeJoin identifies whether it is active. If no, the procedure for each u-stable isomorphism of nr . Similar to the
terminates. Otherwise, it performs extension depend on the PPruneSearch, this procedure can also be processed in paral-
types of nc : isolated vertex extension (lines 18-24) and BCC lel. Our experiments show that such a simple parallel imple-
extension (lines 25-30). The difference between these two mentation can achieve a very good speedup ratio compared
extensions is the selection of candidate match mc . The to the sequential algorithm.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6414 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023

TABLE 1
Datasets

Dataset n jEj jEj dmax jT j Time scale


Chess 7,301 55,899 63,689 263 100 month
Lkml 26,885 159,996 327,077 14,117 97 month
Enron 86,978 297,456 501,116 4,229 60 month
DBLP 1,729,816 8,546,306 12,007,380 5,980 78 year

6 EXPERIMENTS
In this section, we conduct extensive experiments to evalu-
ate the efficiency and effectiveness of the proposed algo-
rithms. For comparison, we implement an algorithm, called
BaselineSearch, as a baseline. BaselineSearch uses a parallel
Ullmann algorithm [9] to compute subgraph isomorphisms
for a query graph in the de-temporal graph, and then picks
the stable isomorphisms among them. We also implement Fig. 6. The query graphs in the experiments.
another baseline algorithm, called BaseSnapSearch, which
uses the VF2 algorithm [23] to compute subgraph isomor- we pick seven query graphs (namely, q2 ; q3 ; q4 ; q5 ; q6 ; q7 ; q8 in
phisms for each snapshot and then selects the stable sub- Fig. 6) to evaluate the proposed algorithms. q2 is used to evalu-
graph embeddings as the results. The BaseSnapSearch also ate the BCCIndexSearch algorithm for handling StarIsoBCC, and
applies the pruning technique derived from Observation 1 q7 and q8 are designed for processing GeneralBCC. For the sub-
before searching subgraph isomorphisms in each snapshot. graph isomorphism problem, a slight difference between two
For our pruning-based stable subgraph isomorphism search query graphs may result in significant query performance
algorithm, we implement PruneSearch (Algorithm 1). We changes. Thus, we select q4 ; q5 as the query graphs to exhibit
implement the BCCIndexBuild algorithm (Algorithm 3) to con- an “edge growing” pattern. Since different datasets have vari-
struct the BCCIndex, as well as the index-based stable subgraph ous time scales, the stability threshold u is set within different
isomorphism search algorithm BCCIndexSearch (Algorithm 5). time intervals. Specifically, for Chess, u is selected from the
All algorithms are implemented in C++. In addition, we also interval ½2; 7 with a default value 4. For Lkml and Enron, u is
implement the parallel versions of PruneSearch, BCCIndexBuild chosen from the interval ½5; 15 with a default value 9.
and BCCIndexSearch using OpenMP, namely, PPruneSearch, For DBLP, u is selected from the interval ½5; 10 with a default
PBCCIndexBuild and PBCCIndexSearch. All experiments are con- value 7.
ducted on a PC with 2.10GHz Intel(R) Xeon(R) Sliver 4110 (8-
core) CPU and 256GB memory running Red Hat 4.8.5-16. In
all experiments, both the temporal graph and the BCCIndex are 6.1 Performance Studies
stored in the main memory. We set the time limit to 7 days. Comparison Among BaselineSearch, BaseSnapSearch and
Datasets. We use four different types of real-world tem- PruneSearch. We evaluate BaselineSearch, BaseSnapSearch and
poral networks in the experiments. The detailed statistics of PruneSearch with varying parameters on different datasets.
the datasets are summarized in Table 1. Chess is a temporal Fig. 8 depicts the runtime of BaselineSearch, BaseSnapSearch
network in which each temporal edge represents two chess and PruneSearch with q1 and q3 on Chess. The results for other
players playing a game at time t. Lkml is a temporal commu- query graphs on Chess are consistent. From Fig. 8, we can see
nication network of the Linux kernel mailing list. Enron is an that the runtime of BaselineSearch increases very smoothly
email communication network between employees of Enron. with increasing k, while the runtime of BaseSnapSearch
DBLP is a temporal collaboration network of authors in dblp and PruneSearch decreases as u increases for each query graph.
from 1940 to 2018. In Table 1, dmax and jT j denote the maxi- This is because BaselineSearch needs to compute all
mum number of temporal edges associated with a vertex subgraph isomorphisms to select the stable results, while
and the number of snapshots respectively. All these datasets BaseSnapSearch and PruneSearch equipped with pruning tech-
are downloaded from http://konect.uni-koblenz.de/. niques can significantly reduce the search space of stable sub-
Parameters. There are two input parameters in our algo- graph isomorphisms. Moreover, the running time of
rithms: the query graph q and the stability threshold u. The PruneSearch is at least 3 orders of magnitude lower than that of
input query graphs used in our experiments are summa- BaselineSearch within all parameter settings as expected. Com-
rized in Fig. 6 which includes eight different types of sub- pared with BaseSnapSearch, the PruneSearch is faster for
graphs with 5-8 vertices. Since BCCIndex is constructed smaller u (i.e., u ¼ 2; 3), and for relatively larger u, the runtime
based on the indexed-query graphs, the BCCIndexSearch of two algorithms is close. This is because the pruning effect of
algorithm can calculate stable subgraph isomorphisms in Observation 1 can significantly reduce the scale of the graph
linear time if the query graph is one of the indexed-query on the basis of a larger u. For example, for query graph q3 on
graphs. Therefore, we only use q1 as an example where the Chess, when u equals 2, PruneSearch takes 0.562 seconds to out-
query graph is an indexed-query graph; for other indexed- put all stable subgraph isomorphisms, while BaselineSearch
query graphs, the results are consistent. For the hard cases, and BaseSnapSearch consume 103,953 seconds and 760 sec-
in which the query graph is not an indexed-query graph, onds. The runtime of PruneSearch is up to 5 orders of
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6415

Fig. 7. The pruning effect of FirstFilter and SecondFilter on different datasets.

magnitude and up to 2 orders of magnitude faster than that of As expected, the runtime of BaseSnapSearch, PruneSearch and
BaselineSearch and BaseSnapSearch, respectively. These results BCCIndexSearch decreases as u increases for each query graph.
confirm that the pruning-based algorithm PruneSearch is sub- In general, the three algorithms achieve the maximum run-
stantially faster than the BaselineSearch and BaseSnapSearch time at the smallest u. This is because for a smaller stability
algorithms in real-life temporal graphs, which is consistent threshold u, there are large numbers of temporal subgraph
with our analysis in Section 2. isomorphisms with stable values no less than u in the graph,
The Pruning Effect of FirstFilter and SecondFilter. In this thus increasing computational costs. Moreover, we can also
experiment, we evaluate the effect of FirstFilter and see that the proposed PruneSearch and BCCIndexSearch algo-
SecondFilter in PruneSearch, and we refer to FirF and SecF for rithms work well while the BaseSnapSearch algorithm
brevity. We also employ two pruning tricks, i.e., degree exceeds the time limit within most parameter settings. The
reduction and neighbor reduction (also known as DegF and running time of PruneSearch is significantly lower than that of
NbrF), in BaselineSearch for comparison. We evaluate these BaseSnapSearch for a small u over all datasets. For a relatively
pruning techniques before performing the recursive search large u, PruneSearch is also faster than BaseSnapSearch expect
procedures. For each query vertex, the initial size of candi- for some subgraphs on DBLP (i.e., u ¼ 9 or 10). This is because
dates is the number of vertices and we denote as Init. Fig. 7 the pruning technique derived from Observation 1 can signif-
shows the size of candidates for vertex u2 in q6 with differ- icantly reduce the scale of DBLP for a large u, which is indeed
ent pruning techniques on all the datasets. The results with the case for temporal collaboration networks. The runtime of
varying u for other query vertices are consistent using dif- BCCIndexSearch is at least one order of magnitude and two
ferent query graphs. As can be seen from Fig. 7, both FirF orders of magnitude lower than that of PruneSearch and
and SecF can significantly prune the vertices that are defi- BaseSnapSearch within almost all parameter settings, respec-
nitely not included in a stable subgraph isomorphism com- tively. For example, for query graph q2 on DBLP, when u ¼ 5,
pared with DegF and NbrF. For example, in the case of BCCIndexSearch takes 88 seconds to output all the stable sub-
u ¼ 10 on DBLP, the number of candidates of u2 in q6 after graph isomorphisms, while PruneSearch consumes 159,964 sec-
performing FirF is 874 vertices while the size of initial candi- onds and BaseSnapSearch cannot calculate the results within
dates is 1,729,816. SecF can further reduce the size of Cðu2 Þ the limited time. The runtime of BCCIndexSearch is at least
to 508. For DegF and NbrF, they only reduce the size of candi- three orders of magnitude faster than that of PruneSearch and
dates to 1,287,416 and 1,285,661 respectively. In addition, BaseSnapSearch. We also evaluate PruneSearch and
the pruning effect of SecF (NbrF) is not obviously superior to BCCIndexSearch with q7 and q8 . The results on DBLP are
that of FirF (DegF). This is because SecF and NbrF work better depicted in Fig. 10 and similar results can also be found for
in the case of extending partial matches, i.e., the recursion other datasets. As expected, the runtime of PruneSearch and
procedure of PruneSearch and BaselineSearch. These results BCCIndexSearch is relatively close. This is because the q7 and q8
suggest that our pruning techniques can significantly prune are GeneralBCC graphs, and BCCIndexSearch needs to perform
those vertices that are not included in a stable subgraph iso- PruneSearch to search all stable subgraph isomorphisms. These
morphism. Again, these results confirm that PruneSearch is results demonstrate that the index-based solution
significantly better than BaselineSearch, which are consistent BCCIndexSearch is substantially faster than the PruneSearch and
with our previous experiments. BaseSnapSearch algorithm in real-life temporal graphs for
Comparison Among BaseSnapSearch, PruneSearch and query graphs with small-size BCCs.
BCCIndexSearch. Fig. 9 shows the running time of the three Evaluation of Parallel Query Processing Algorithms. In this
algorithms with varying u for q1 q6 on different datasets. experiment, we evaluate the running time of PPruneSearch
and PBCCIndexSearch with varying the number of threads
t 2 f1; 2; 4; 8; 12; 16g. Fig. 11 illustrates the results of two
query graphs q3 ; q4 on Lkml and Enron. Similar results can also
be observed on the other datasets and using the other query
graphs. As can be seen from Fig. 11, the runtime of PBC
CIndexSearch is significantly lower than that of PPruneSearch.
For example, for query graph q4 on Enron, when t ¼ 16,
PBCCIndexSearch takes 5.9 seconds to output all stable sub-
graph isomorphisms, while PPruneSearch consumes 3,254.596
Fig. 8. Running time of BaselineSearch, BaseSnapSearch and PruneSearch seconds. The running time of BCCIndexSearch is at least two
on Chess. orders of magnitude faster than that of PPruneSearch.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6416 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023

Fig. 9. Running time of PruneSearch and BCCIndexSearch on different datasets.

Moreover, we can see that both PPruneSearch and PBCC can be seen, the BCCIndex sizes on all datasets are less than
IndexSearch can achieve near-linear speedup ratios over 2.5GB which can be easily stored in the main memory of a
these two datasets. For example, in Fig. 11d, when t ¼ 16 and modern computer. These results imply that the BCCIndex
q ¼ q4 , the speedup ratios of PPruneSearch and PBCCIndex size is not very large on real-life temporal graphs. In addi-
Search on Enron are roughly equal to 7.6 and 12, respectively. tion, Fig. 12b reports the BCCIndex construction time using
These results indicate that our parallel stable subgraph iso- the sequential algorithm BCCIndexBuild. In Chess, Lkml, and
morphism search algorithms can achieve very high speedup Enron, our index construction algorithm BCCIndexBuild is
ratios on real-life graphs. very efficient which takes less than 3 hours to construct
Evaluation of the BCCIndex. In this experiment, we evalu- BCCIndex. In DBLP, BCCIndexBuild is a little bit time-consum-
ate the performance of our index construction algorithm. ing, but it is still able to construct the BCCIndex within 7
Fig. 12a shows the size of BCCIndex and the graph size. As days (less than 600,000 seconds). Once the BCCIndex is

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6417

Fig. 10. Running time of PruneSearch and BCCIndexSearch on DBLP.

Fig. 13. Speedup ratio of PBCCIndexBuild on various datasets.

Fig. 14. Case study on DBLP with 5-clique query graph.


Fig. 11. Running time of PPruneSearch and PBCCIndexSearch on different
datasets. temporal network of contacts between patients and health-
care workers in a hospital which is downloaded from
http://www.sociopatterns.org/datasets.
Case Study on DBLP. In DBLP, we search the stable sub-
graph isomorphisms of a 5-clique to identify stable teams
with the highest stable scores. To this end, we set the stability
threshold u to 2 to find all teams in which the authors have
co-authored many papers for more than 2 years, and then
sort them in a non-increasing order based on their stable val-
ues. The runtime of PruneSearch and BCCIndexSearch is
Fig. 12. Evaluation of the BCCIndex.
111,298 seconds and 274 seconds respectively, which is con-
sistent with previous experiments. Fig. 14 shows the top-2
established, it can be used to handle different query graphs
teams with the highest stable values. As shown in Fig. 14a,
in practical applications. These results suggest that the pro-
all the five authors are interested in bioinformatics and work
posed index-based solution can work on large real-life tem-
on the Jackson Laboratory. This team contains the long-term
poral graphs.
collaborators who have co-authored papers from 2000 to
Evaluation of Parallel Index Construction. Here we evaluate
2017, indicating that our algorithm can find stable relation-
the speedup ratio of our parallel index construction algo-
ships in real-world applications. In Fig. 14b, we can also see
rithm PBCCIndexBuild. To this end, we vary the number of
that the authors are the professors of the AtlantTIC research
threads t from 1 to 16, and record the runtime of
center in Universidade de Vigo. Furthermore, they are the
PBCCIndexBuild to compute the speedup ratio for each data-
key members of GSSI (Grupo de Servicios Para la Sociedad
set. Fig. 13 reports the results on all datasets. As expected,
de la Informaci on) where the coordinator is Jose J. Pazos-
PBCCIndexBuild achieves near-linear speedup ratios over all
Arias. These authors are interested in semantic web, recom-
datasets. Moreover, we can observe that in the largest data-
mender systems, crowdsourcing and crowd computing; and
set DBLP, the runtime of PBCCIndexBuild is around 14 times
they have co-authored many papers from 2004 to 2006, 2008
lower than the sequential algorithm BCCIndexBuild. These
to 2011 and 2013 to 2016. Thus, this team contains long-term
results indicate that our parallel index construction algo-
collaborators which indicates that our algorithm can indeed
rithm is very efficient on real-life temporal graphs.
find stable communities in real-world applications.
We also search the stable subgraph isomorphisms in DBLP
6.2 Case Study using a “double star” structure as a query graph to reveal sta-
In this experiment, we conduct case studies on DBLP and ble multi-discipline cooperations. Fig. 15 shows two results
HCWs, to evaluate the effectiveness of the proposed algo- as examples. As shown Fig. 15a, Jin and Guo play the roles as
rithms. Aforementioned, DBLP is a temporal collaboration “bridge”. From their homepages, Jin is mainly interested in
network of authors in dblp from 1940 to 2018. HCWs is a parallel and distributed architecture, data query processing,
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6418 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023

TABLE 2
The Number of Different Types With Varying u on HCWs

u Total MED NUR PAT ADM


5 520 76 279 47 18
10 162 33 105 16 8
15 84 14 57 8 5
20 54 6 40 4 4
Fig. 15. Case study on DBLP with double-star query graph. 25 45 0 38 3 4
30 33 0 28 2 3
computational complexity. Guo’s research interests include
big data, edge AI, mobile computing, and distributed sys-
tems, and others in Fig. 15a are interested in multi-discipline Case Study on HCWs. In HCWs, there are four identity
areas as expected. For instance, Liao focuses on memory com- types, i.e., nurse (Nur), medic (Med), administrator (Adm) and
puting, runtime system, graph computing. Zou’s research patient (Pat), and each individual is associated with an iden-
lies in data security, software vulnerability detection and net- tity label. We search the stable isomorphisms of a triangle
work attack and defense. Lu and Li mainly focus on wireless with varying stability threshold u and observe the individ-
networking, cloud computing, pervasive computing, and big ual type of these results. Table 2 and Fig. 17 illustrate the
data. Zeng is interested in network function visualization, number of different types of vertices and the proportion of
software-defined networking, and edge computing. The different types of vertices in all stable triangles, respectively.
members maintain stable multi-discipline cooperative rela- As can be seen, the number of patients involved in the stable
tionships who have co-authored papers from 2009 to 2015, triangles decreases with an increasing u, while the nurses
and 2017. Analogously, the authors in Fig. 15b have co-auth- are the majorities of the stable communities. Moreover, for a
ored papers from 2004 to 2011 whose research covers multi- large u, the proportion of medics in stable communities
ple areas. For example, Jennings and his neighbors are decreases. This is because medics often do not need to main-
mainly interested in machine learning, autonomous agents tain such a stable relationship with other people. In addi-
and multi-agent systems. Wooldridge’s research lies in the tion, the administrator engagement in stable communities is
intersection of logic, computational complexity and game less affected by u which is consistent with our intuition.
theory. McBurney focuses on AI and computational finance, These results further confirm that our stable subgraph iso-
including distributed ledgers, blockchain, smart contracts. morphism technique can find stable communities and
Van der Hoek is interested in logics for agent systems, data reveal some implicit features of a network.
mining and the semantic Web. Dunne’s research areas are
the complexity of dialogue and argumentation, coalitional 7 RELATED WORK
games, contract and resource allocation mechanisms. There-
Subgraph Isomorphism. Our work is closely related to the sub-
fore, this team contains long-term and multiple-areas collab-
graph isomorphism on static graphs, where the goal is to
orators. The results indicate that our algorithm can indeed
find all embeddings of a query graph q in the data graph G.
find stable communities with multi-discipline cooperations
Such a subgraph isomorphism is a classic NP-hard problem
in real-world applications.
[7], [8]. To solve this problem, Ullmann [9] proposed a back-
In addition, we extract two temporal subgraphs, namely,
tracking algorithm that iteratively maps vertices from q to G
DB and DM, from DBLP which contain the authors in DBLP
by following a fixed order of query vertices. There are many
who have published at least one paper in the area of data-
algorithms proposed to improve the efficiency of the classic
base and data mining, respectively. In both DB and DM, We
Ullmann algorithm, including VF2 [23], QuickSI [24],
search the stable subgraph isomorphisms of a 5-clique.
GraphQL [25], SPath [26], TurboISO [27], CFL-Match [28],
Figs. 16a and 16b show the top-1 teams with the highest sta-
and DAF [29]. Specifically, VF2 [23] generated the matching
ble values in DB and DM, respectively. From Fig. 16a, we can
order by selecting a vertex connected to one of the already
observe that the top-1 team is a stable group formed by five
selected vertices rather than a randomly selected vertex.
famous database researchers. Similarly, as shown in
QuickSI [24] created a matching order based on an infre-
Fig. 16b, the other authors in the top-1 group identified in
quent-labels first strategy. GraphQL [25] and SPath [26]
DM are stable collaborators of Professor Jiawei Han who
focused on reducing the candidates of query vertices by
have co-authored many papers in the past few years. These
exploiting infrequent paths. TurboISO [27] further reduced
results further confirm that our algorithm can find stable
communities in real-world applications.

Fig. 16. Case study on DBLP with 5-clique query graph. Fig. 17. The ratio of different type with varying u on HCWs.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
ZHANG ET AL.: STABLE SUBGRAPH ISOMORPHISM SEARCH IN TEMPORAL NETWORKS 6419

the unnecessary cartesian products by employing neighbor- proposed algorithms. The results show that the index-based
hood equivalence class to merge similar vertices in a query solution BCCIndexSearch is around 1-3 orders of magnitude
graph q. Ren et al. [30] improved the efficiency of TurboISO faster than the PruneSearch algorithm. The results also show
based on a technique of compressing the data graph G. that our parallel implementations for both pruning-based
CFL-Match [28] made use of the spanning tree instead of search and index-based search algorithms can achieve very
the original query graph to postpone cartesian products. high speedup ratios. Finally, we conduct a case study in
Han et al. proposed DAF [29] which employs the knowledge DBLP and the results demonstrate that our solution for sta-
learned from past computations to reduce redundant com- ble subgraph isomorphism search can be useful to identify
putations. The performance of these subgraph matching stable research groups in DBLP.
algorithms was compared in several previous studies [31],
[32]. Most of those improved algorithms mentioned above REFERENCES
are tailored for static and labeled graphs. In this work, we [1] N. Przulj, D. G. Corneil, and I. Jurisica, “Efficient estimation of
investigate a new subgraph isomorphism problem in unla- graphlet frequency distributions in protein–protein interaction
beled temporal graphs and the algorithms mentioned above networks,” Bioinformatics, vol. 22, no. 8, pp. 974–980, 2006.
cannot be directly used for efficiently solving our problem. [2] T. A. Snijders, P. E. Pattison, G. L. Robins, and M. S. Handcock,
“New specifications for exponential random graph models,”
Temporal Graph Analysis. Our work is also related to tem- Sociol. Methodol., vol. 36, no. 1, pp. 99–153, 2006.
poral graph analysis which has attracted much attention in [3] X. Yan, P. S. Yu, and J. Han, “Graph indexing: A frequent struc-
recent years [17], [33], [34], [35], [36], [37], [38], [39], [40], ture-based approach,” in Proc. ACM SIGMOD Int. Conf. Manage.
Data, 2004, pp. 335–346.
[41], [42], [43]. For example, Bansal et al. [33] studied the [4] A.-L. Barabasi, “The origin of bursts and heavy tails in human
problem of identifying keyword clusters in large collections dynamics,” Nature, vol. 435, no. 7039, pp. 207–211, 2005.
of blog posts for specific temporal intervals. Li et al. [35] [5] P. Holme and J. Saram€aki, “Temporal networks,” Phys. Rep.,
introduced a persistent community model and developed vol. 519, no. 3, pp. 97–125, 2012.
[6] X. Qiu et al., “Real-time constrained cycle detection in large
algorithms to efficiently solve this problem. Gurukar et al. dynamic graphs,” VLDB, vol. 11, no. 12, pp. 1876–1888, 2018.
[17] proposed an algorithm to identify the recurring sub- [7] J. Hartmanis, “Computers and intractability: A guide to the theory
graphs in a temporal graph. Recently, the subgraph isomor- of np-completeness (Michael R. Garey and David S. Johnson),”
phism problem in temporal graphs has been studied. Siam Rev., vol. 24, no. 1, p. 90, 1982.
[8] H.-N. Tran, J.-J. Kim, and B. He, “Fast subgraph matching on large
Redmond and Cunningham [39] introduced a time-respect- graphs using graphics processors,” in Proc. Int. Conf. Database Syst.
ing subgraph isomorphism problem which requires the Adv. Appli., 2015, pp. 299–315.
edges of the query graph following a temporal order. [9] J. R. Ullmann, “An algorithm for subgraph isomorphism,”
J. ACM, vol. 23, no. 1, pp. 31–42, 1976.
Semertzidis et al. [40] studied the problem of mining dura- [10] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and
ble subgraph patterns in a temporal graph. Franzke et al. U. Alon, “Network motifs: Simple building blocks of complex
[41] investigated the problem of pattern search in temporal networks,” Science, vol. 298, no. 5594, pp. 824–827, 2002.
graphs, which focuses on whether the subgraphs exist after [11] R. Wan and H. Mamitsuka, “Discovering network motifs in
protein interaction networks,” in Biological Data Mining in Pro-
Dt time satisfying that the temporal order of edges is consis- tein Interaction Networks, Hershey, PA, USA: IGI Global, 2009,
tent with the temporal pattern. Thus, their model only pp. 117–143.
searches isomorphisms within a fixed time interval, which [12] M. Koyut€ urk, A. Grama, and W. Szpankowski, “An efficient algo-
rithm for detecting frequent subgraphs in biological networks,”
cannot be used to measure the stability. Wang et al. [44] Bioinformatics, vol. 20, no. suppl_1, pp. i200–i207, 2004.
introduced a temporal stable community model based on [13] C. Jiang, F. Coenen, and M. Zito, “A survey of frequent subgraph
the temporal similarity of edges and developed algorithms mining algorithms,” Knowl. Eng. Rev., vol. 28, no. 1, pp. 75–105,
by extending the Louvain method to detect stable communi- 2013.
[14] P. Zhao and J. X. Yu, “Fast frequent free tree mining in graph data-
ties. Since the definition of our problem is different from bases,” World Wide Web, vol. 11, no. 1, pp. 71–92, 2008.
those of the above problems, all the existing techniques can- [15] R. Jin, S. McCallen, and E. Almaas, “Trend motif: A graph mining
not be directly applied for solving our problem. To the best approach for analysis of dynamic complex networks,” in Proc.
of our knowledge, our work is the first to apply the BCC 17th IEEE Int. Conf. Data Mining, 2007, pp. 541–546.
[16] P. Gupta et al., “Real-time twitter recommendation: Online motif
indexing technique to solve the stable subgraph isomor- detection in large dynamic graphs,” Proc. VLDB Endow., vol. 7,
phism search problem in temporal networks. no. 13, pp. 1379–1380, 2014.
[17] S. Gurukar, S. Ranu, and B. Ravindran, “COMMIT: A scalable
approach to mining communication motifs from dynamic
8 CONCLUSION networks,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015,
pp. 475–489.
In this paper, we study the problem of finding stable sub- [18] R. Ahmed and G. Karypis, “Algorithms for mining the coevolving
graph isomorphisms in temporal graphs. To solve the prob- relational motifs in dynamic networks,” ACM Trans. Knowl. Dis-
cov. Data, vol. 10, no. 1, pp. 4:1–4:31, 2015.
lem, we first develop a pruning-based search algorithm [19] J. A. Grochow and M. Kellis, “Network motif discovery using sub-
based on several non-trivial pruning techniques which can graph enumeration and symmetry-breaking,” in Proc. Annu. Int.
significantly reduce unpromising intermediate results dur- Conf. Res. Comput. Mol. Biol., 2007, pp. 92–106.
[20] A. Gibbons, Algorithmic Graph Theory, Cambridge, U.K.: Cambridge
ing the search procedure. To further improve the efficiency, Univ. Press, 1985.
we propose a novel index structure, called BCCIndex, to sup- [21] T. Akiba, Y. Iwata, and Y. Yoshida, “Linear-time enumeration of
port the stable subgraph isomorphism search in temporal maximal k-edge-connected subgraphs in large networks by ran-
graphs efficiently. We also develop an efficient query proc- dom contraction,” in Proc. 22nd ACM Int. Conf. Inf. Knowl. Manage.,
2013, pp. 909–918.
essing algorithm based on the BCCIndex and an efficient tree [22] L. Chang, J. X. Yu, L. Qin, X. Lin, C. Liu, and W. Liang, “Efficiently
join technique. Finally, we conduct extensive experiments computing k-edge connected components via graph decomposition,”
using four real-life datasets to evaluate the efficiency of the in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2013, pp. 205–216.
Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.
6420 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 6, JUNE 2023

[23] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento, “A (sub) Rong-Hua Li received the PhD degree from the
graph isomorphism algorithm for matching large graphs,” IEEE Chinese University of Hong Kong, in 2013. He is
Trans. Pattern Anal. Mach. Intell., vol. 26, no. 10, pp. 1367–1372, currently a professor with the Beijing Institute of
Sep. 2004. Technology, Beijing, China. His research interests
[24] H. Shang, Y. Zhang, X. Lin, and J. X. Yu, “Taming verification include graph data management and mining,
hardness: An efficient algorithm for testing subgraph iso- social network analysis, graph computation sys-
morphism,” PVLDB, vol. 1, no. 1, pp. 364–375, 2008. tems, and graph-based machine learning.
[25] H. He and A. K. Singh, “Graphs-at-a-time: Query language and
access methods for graph databases,” in Proc. ACM SIGMOD Int.
Conf. Manage. Data, 2008, pp. 405–418.
[26] P. Zhao and J. Han, “On graph query optimization in large
networks,” PVLDB, vol. 3, no. 1/2, pp. 340–351, 2010.
[27] W.-S. Han, J. Lee, and J.-H. Lee, “Turboiso: Towards ultrafast and Hongchao Qin received the BS degree in
robust subgraph isomorphism search in large graph databases,” mathematics and the ME and PhD degrees in
in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2013, pp. 337–348. computer science from Northeastern University,
[28] F. Bi, L. Chang, X. Lin, L. Qin, and W. Zhang, “Efficient subgraph China, in 2013, 2015 and 2020, respectively. He
matching by postponing cartesian products,” in Proc. ACM SIG- is currently a postdoc with the Beijing Institute of
MOD Int. Conf. Manage. Data, 2016, pp. 1199–1214. Technology, China. His current research interests
[29] M. Han, H. Kim, G. Gu, K. Park, and W.-S. Han, “Efficient sub- include social network analysis and data-driven
graph matching: Harmonizing dynamic programming, adaptive graph mining.
matching order, and failing set together,” in Proc. ACM SIGMOD
Int. Conf. Manage. Data, 2019, pp. 1429–1446.
[30] X. Ren and J. Wang, “Exploiting vertex relationships in speeding
up subgraph isomorphism over large graphs,” PVLDB, vol. 8,
no. 5, pp. 617–628, 2015.
Guoren Wang received the BSc, MSc, and PhD
[31] J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee, “An in-depth
comparison of subgraph isomorphism algorithms in graph data- degrees from the Department of Computer Sci-
bases,” VLDB, vol. 6, no. 2, pp. 133–144, 2012. ence, Northeastern University, China, in 1988,
[32] S. Sun and Q. Luo, “In-memory subgraph matching: An in-depth 1991 and 1996, respectively. Currently, he is a
study,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2020, professor with the Department of Computer Sci-
pp. 1083–1098. ence, Beijing Institute of Technology, Beijing,
China. His research interests include XML data
[33] N. Bansal, F. Chiang, N. Koudas, and F. W. Tompa, “Seeking sta-
ble clusters in the blogosphere,” in Proc. 33rd Int. Conf. Very Large management, query processing and optimization,
Data Bases, 2007, pp. 806–817. bioinformatics, high dimensional indexing, paral-
[34] H. Wu, J. Cheng, S. Huang, Y. Ke, Y. Lu, and Y. Xu, “Path prob- lel database systems, and cloud data manage-
lems in temporal graphs,” PVLDB, vol. 7, no. 9, pp. 721–732, 2014. ment. He has published more than 100 research
[35] R.-H. Li, J. Su, L. Qin, J. X. Yu, and Q. Dai, “Persistent community papers.
search in temporal networks,” in Proc. IEEE 34th Int. Conf. Data
Eng., 2018, pp. 797–808.
[36] Y. Yang, D. Yan, H. Wu, J. Cheng, S. Zhou, and J. C. Lui, Zhiwei Zhang received the BS degree from the
“Diversified temporal subgraph pattern mining,” in Proc. 22nd Renmin University of China, in 2010, and the
ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2016, PhD degree from the Chinese University of Hong
pp. 1965–1974. Kong, in 2014. He is currently a professor with
[37] Z. Yang, A. W.-C. Fu, and R. Liu, “Diversified top-k subgraph the Beijing Institute of Technology (BIT), Beijing,
querying in a large graph,” in Proc. Int. Conf. Manage. Data, 2016, China. His research interests include federal
pp. 1167–1182. learning, data pricing and transaction, distributed
[38] S. Ma, R. Hu, L. Wang, X. Lin, and J. Huai, “Fast computation of system, blockchain, and algorithm analysis.
dense temporal subgraphs,” in Proc. IEEE 33rd Int. Conf. Data
Eng., 2017, pp. 361–372.
[39] U. Redmond and P. Cunningham, “Subgraph isomorphism in
temporal networks,” 2016, arXiv:1605.02174.
[40] K. Semertzidis and E. Pitoura, “Durable graph pattern queries on Ye Yuan received the BS, MS, and PhD degrees in
historical graphs,” in Proc. IEEE 33rd Int. Conf. Data Eng., 2016, computer science from Northeastern University, in
pp. 541–552. 2004, 2007, and 2011, respectively. He is currently
[41] M. Franzke, T. Emrich, A. Z€ ufle, and M. Renz, “Pattern search in
a professor with the Department of Computer Sci-
temporal social networks,” in Proc. 21st Int. Conf. Extending Data-
ence, Northeastern University, China. His research
base Technol., 2018, pp. 289–300. interests include graph databases, probabilistic
[42] H. Qin, R.-H. Li, G. Wang, L. Qin, Y. Yuan, and Z. Zhang, “Mining databases, and social network analysis.
bursting communities in temporal graphs,” 2019, arXiv: 1911.02780.
[43] H. Qin, R. Li, G. Wang, L. Qin, Y. Cheng, and Y. Yuan, “Mining
periodic cliques in temporal networks,” in Proc. IEEE 33rd Int.
Conf. Data Eng., 2019, pp. 1130–1141.
[44] W. Wang and X. Li, “Temporal stable community in time-varying
networks,” IEEE Trans. Netw. Sci. Eng., vol. 7, no. 3, pp. 1508–1520, " For more information on this or any other computing topic,
Mar. 2019. please visit our Digital Library at www.computer.org/csdl.

Qi Zhang is currently working toward the PhD


degree with the Beijing Institute of Technology,
China. Her current research interests include social
network analysis and data-driven graph mining.

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on December 09,2024 at 17:07:18 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy