Mengjiao Guo Thesis PDF
Mengjiao Guo Thesis PDF
by
Mengjiao Guo
Supervisors: Prof. Jing He,
Prof. Chi Huang Chi
i
of similarity or dissimilarity between the two graphs, inexact graph matching algo-
rithms provide a gradual matching result. Graph distance is also named approximate
graph isomorphism, error-tolerant graph matching, it is a measure of similarity (or
dissimilarity) between two graphs. The graph distance function is an elusive ques-
tion remaining a pressing problem in a wide range of fields. In general, graphs are
usually enriched with node and edge attributes, namely, heterogeneous graphs, our
focus is encapsulating node and edge identities in a graph. Structural and semantic
features are both preserved by graphs, so the analysis of row sum and eigenspectra
for vertex and edge adjacency matrix has captured the affinity interactions within
graphs locally and globally. Therefore, we estimate the geometrical and semantic
dissimilarities/distances between graphs extensively and systemically. Several illus-
trative chemical compound cases implement in practical scenarios, thus, the theory
could be easily understood. Our method defines a synopsis quantitative distance (or
dissimilarity) measure for approximate matching, it has a good measure of expres-
siveness and a suited polynomial computation complexity, which paves the way for
graph analysis.
In this work, our (sub)graph/approximate isomorphism-based method adopts a
3-stage framework for learning and refining structural correspondences. First, an ad-
equate representation of the graph-adjacency matrix can capture the connectivity of
the graph structure. Secondly, we employ the permutation theorem to evaluate the
row sum of vertex and edge adjacency matrices of two graphs. Lastly, our proposed
scheme deploys the well-found equinumerosity theorem to verify the relationship be-
tween two graphs. The (sub)graph and approximate graph matching models provide
a solid theoretical foundation for practical applications and effective querying. In this
thesis, We close this gap by studying the problem via the permutation and equinu-
merosity theorems, which demonstrates the effectiveness of our method, which can
be solved in polynomial time.
ii
Acknowledgments
I am very thankful to my principal supervisor Prof. Jing He, who led and encouraged
me during my studies. I particularly want to express my gratitude to Prof. Sheng
Wen and Prof. Chi Huang Chi, who both illuminated inspirations and ideas for me.
I would like to express my appreciation to the Faculty of Science, Engineering
and Technology, Swinburne University of Technology. Without their support, all this
would not have been possible.
Special thanks to Matthew Mitchell and Caslon Chua for their willingness to offer
me a tutor job; Kathy Wallace, Cate O’Dwyer and Kostas Kondelias for their kindness
and patience to discuss my career issues and solve the puzzle in my life.
I’m also taking this opportunity to thank my friends in Melbourne, Afzaal Hassan
and Qinyuan Li for sharing their teaching knowledge and invaluable advice, which
really helped me polish my teaching skills and cheer me up; Limeng Zhang for having
lunch and dinner, necessary distractions and sharing happiness. Micheal Baron, Rob
Hand, Sue Hand, Rowan Forster, Michael Cutter and Darren Cronshaw for guiding
me to explore the local culture and customs.
I would also like thank all my colleagues in Algorithms and Chips group, Hui
Zheng, Junfeng Wu and Peng Zhang for sharing knowledge and research experience,
which really helped me to polish my skills.
I would like to thank my beloved parents for encouraging me to pursue a Ph.D
degree overseas over these past years, they have provided me with both physical and
emotional support.
Lastly, I appreciate every person I met during my Ph.D study and my life: thank
you for your love, support and encouragement which enabled me to overcome countless
iii
challenges and made my time in Melbourne unforgettable.
iv
Declaration
I, Mengjiao Guo, declare that this thesis titled, “A (Sub)graph Isomorphism Identi-
fication Theorem” and the work presented in it are my own. I confirm that:
• This work was done wholly or mainly while in candidature for a research degree
at this University.
• Where any part of this thesis has previously been submitted for a degree or
any other qualification at this University or any other institution, this has been
clearly stated.
• Where I have consulted the published work of others, this is always clearly
attributed.
• Where I have quoted from the work of others, the source is always given. With
the exception of such quotations, this thesis is entirely my own work.
• Where the thesis is based on work done by myself jointly with others, I have
made clear exactly what was done by others and what I have contributed myself.
v
vi
Publications
vii
tion, Concurrency and Computation: Practice and Experience, Vol. 32, no. 7
(Apr 2020), article no. e5599.
• Y. Wu, J. He,Y. Ji, G. Huang, H. Yao, P. Zhang,W. Xu, M. Guo and Y. Li,
Enhanced classification models for iris dataset, Proceedings of the 7th Inter-
national Conference on Information Technology and Quantitative Management
(ITQM), 2019 (Best Paper).
viii
Contents
1 Introduction 1
1.1 Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Research Concerns and Contributions . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Related work 9
2.1 Basics of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 A Brief Overview of Graph Matching Problems . . . . . . . . . . . . 12
2.3 Existing Exact Graph and the Subgraph Pattern Querying Algorithms 15
2.3.1 Pure Tree Search Algorithms . . . . . . . . . . . . . . . . . . 16
2.3.2 Index-based Algorithms . . . . . . . . . . . . . . . . . . . . . 18
2.3.3 Constraint Programming . . . . . . . . . . . . . . . . . . . . . 20
2.3.4 Algebraic Graph Theory Techniques . . . . . . . . . . . . . . 21
2.3.5 Miscellaneous Methods and Techniques . . . . . . . . . . . . . 21
2.4 Existing Inexact Graph and the Subgraph Pattern Querying Algorithms 23
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
ix
3.2.1 Triple Tuple Method . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Undirected Vertex and Edge Adjacency Matrix Representation 29
3.2.3 Undirected Multigraph Vertex and Edge Adjacency Matrix Rep-
resentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.4 Directed Graph Vertex and Edge Adjacency Matrix Represen-
tation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Permutation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.1 Calculating Arrays of Row Sum based on the Vertex and Edge
Adjacency Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Mathematical Approval for Permutation Theorem . . . . . . . 42
3.4 Equinumerosity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.1 Maximal Linearly Independent Subsets . . . . . . . . . . . . . 51
3.4.2 Mathematical Approval for Equinumerosity Theorem . . . . . 53
3.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.6 Performance Studies and Applications . . . . . . . . . . . . . . . . . 59
3.6.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 59
3.6.2 Case Study 1 - Undirected Graph Isomorphism . . . . . . . . 63
3.6.3 Case Study 2 - Directed Graph Isomorphism . . . . . . . . . . 72
3.6.4 Case Study 3 - Multigraph Isomorphism . . . . . . . . . . . . 79
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
x
5 Quantitative Graph Distance Measurement 99
5.1 Permutation Topological Distance . . . . . . . . . . . . . . . . . . . . 100
5.2 Equinumerosity Topological Distance . . . . . . . . . . . . . . . . . . 103
5.2.1 Equinumerous topological distance of singular values . . . . . 103
5.2.2 Equinumerous topological distance of left and right singular
matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3 Graphs with Labeled Nodes . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 Performance Studies and applications . . . . . . . . . . . . . . . . . . 109
5.4.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 110
5.4.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6 Epilogue 125
xi
xii
List of Figures
1-1 Various types of graphs: (a) undirected and unlabeled, (b) undirected
with labeled nodes (different colors refer to different labels), (c) undi-
rected with labeled nodes and edges (d) directed and unlabeled, (e)
directed with labeled nodes, (f) directed with labeled nodes and edges,
(g) undirected, unlabeled multigraph, (h) undirected with labeled nodes
multigraph and (i) undirected with labeled nodes and edges multigraph. 5
1-2 Graph (a) is an induced subgraph of (c), and graph (b) is a partial
subgraph of (c). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
xiii
3-14 Directed Graph Isomorphism Case. . . . . . . . . . . . . . . . . . . . 72
3-15 Multigraph isomorphism matching case . . . . . . . . . . . . . . . . . 79
xiv
List of Tables
xv
3.19 Vertex Adjacency Matrix of g1 . (step 4 marked as G1 ) . . . . . . . . 65
3.20 Vertex Adjacency Matrix of g2 . (step 6 marked as G2 ) . . . . . . . . 65
3.21 Edge adjacency matrix of g. (step 5 marked as G1 ) . . . . . . . . . . 65
3.22 Edge Adjacency Matrix of g1 . (step 7 marked as G2 ) . . . . . . . . . 65
3.23 Computation results of vertex adjacency matrix for graph g1 (step 8) 65
3.24 Computation results of vertex adjacency matrix for graph g2 (step 9) 66
3.25 Computation results of edge adjacency matrix for graph g1 (step 10) . 66
3.26 Computation results of edge adjacency matrix for graph g2 (step 11) . 66
3.27 Left Singular Matrix Uv1 for g1 . (step 18) . . . . . . . . . . . . . . . . 67
3.28 Singular Matrix Σv1 for g1 . (step 19) . . . . . . . . . . . . . . . . . . 67
3.29 Right Singular Matrix VvT1 for g1 . (step 20) . . . . . . . . . . . . . . . 67
3.30 Left Singular Matrix Uv2 for g2 . (step 21) . . . . . . . . . . . . . . . . 68
3.31 Singular Matrix Σv2 for g2 . (step 22) . . . . . . . . . . . . . . . . . . 68
3.32 Right Singular Matrix VvT2 for g2 . (step 23) . . . . . . . . . . . . . . . 68
3.33 Maximally linearly independent system of left singular vector for V1 . 69
3.34 Maximally linearly independent system of left singular vector for V2 . 69
3.35 Maximally linearly independent system of right singular vector for V1 . 69
3.36 Maximally linearly independent system of right singular vector for V2 . 69
3.37 Left Singular Matrix Ue1 for E1 . (step 24) . . . . . . . . . . . . . . . 69
3.38 Singular Matrix Σe1 for E1 . (step 25) . . . . . . . . . . . . . . . . . . 70
3.39 Right Singular Matrix VeT1 for E1 . (step 26) . . . . . . . . . . . . . . . 70
3.40 Left Singular Matrix Ue2 for E2 . (step 27) . . . . . . . . . . . . . . . 70
3.41 Singular Matrix Σe2 for E2 . (step 28) . . . . . . . . . . . . . . . . . . 70
3.42 Right Singular Matrix VeT2 for E2 . (step 29) . . . . . . . . . . . . . . . 71
3.43 Maximally linearly independent system of left singular vector for E1 . 71
3.44 Maximally linearly independent system of left singular vector for E2 . 71
3.45 Maximally linearly independent system of right singular vector for E1 . 72
3.46 Maximally linearly independent system of right singular vector for E2 72
3.47 Triple tuple of graph g3 . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.48 Triple tuple of graph g4 . . . . . . . . . . . . . . . . . . . . . . . . . . 73
xvi
3.49 Vertex adjacency matrix of g3 . (step 4 marked as G1 ) . . . . . . . . . 73
3.50 Vertex adjacency matrix of g4 . (step 6 marked as G2 ) . . . . . . . . . 74
3.51 Edge adjacency matrix of g3 . (step 5 marked as G1 ) . . . . . . . . . . 74
3.52 Edge adjacency matrix of g4 . (step 7 marked as G2 ) . . . . . . . . . . 75
3.53 Computation results of binary vertex adjacency matrix for graph g3
(step 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.54 Computation results of binary vertex adjacency matrix for graph g4
(step 9) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.55 Computation results of binary edge adjacency matrix for graph g3 (step
10) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.56 Computation results of binary edge adjacency matrix for graph g4 (step
11) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.57 Triple tuple of graph g5 . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.58 Triple tuple of graph g2 . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.59 Vertex Adjacency Matrices for induced Matching. . . . . . . . . . . . 80
3.60 c. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.61 C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.62 Vertex Adjacency Matrices for induced Matching. . . . . . . . . . . . 80
3.63 c. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.64 C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.65 Computation results of vertex adjacency matrix for graph g1 (step 8) 80
3.66 Computation results of vertex adjacency matrix for graph g2 (step 9) 80
xvii
4.8 step 1: enumerate vertices from graph C . . . . . . . . . . . . . . . . 92
4.9 step 2a: enumerate edges from graph C . . . . . . . . . . . . . . . . . 93
4.10 step 2b: enumerate edges from graph C . . . . . . . . . . . . . . . . . 93
4.11 Triple tuple of graph c. . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.12 Triple tuple of graph C . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.13 Vertex adjacency matrices for query graph c. . . . . . . . . . . . . . . 95
4.14 Vertex adjacency matrices for data graph C . . . . . . . . . . . . . . 95
4.15 Edge adjacency matrix for query graph c . . . . . . . . . . . . . . . . 96
4.16 Edge adjacency matrix for data graph C . . . . . . . . . . . . . . . . 96
4.17 Triple tuple of graph c. . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.18 Triple tuple of graph C . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.19 c. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.20 Vertex adjacency matrix of c′ . . . . . . . . . . . . . . . . . . . . . . 97
4.21 Edge adjacency matrix of c′ . . . . . . . . . . . . . . . . . . . . . . . 97
4.22 Edge adjacency matrix of C . . . . . . . . . . . . . . . . . . . . . . . 97
4.23 Computation results of vertex adjacency matrix for graph g1 (step 8) 97
4.24 Computation results of vertex adjacency matrix for graph g1 (step 8) 97
xviii
5.13 Vertex adjacency matrix of Oxetane (O1CCC1) . . . . . . . . . . . . 118
5.14 Vertex adjacency matrix of Thietane (S1CCC1) . . . . . . . . . . . . 118
5.15 Edge adjacency matrix of Oxetane (O1CCC1) . . . . . . . . . . . . . 118
5.16 Edge adjacency matrix of Thietane (S1CCC1) . . . . . . . . . . . . . 119
5.17 Triple tuple of Propanal (CCC=O) . . . . . . . . . . . . . . . . . . . 120
5.18 Triple tuple of Methoxyethene (COC=C) . . . . . . . . . . . . . . . . 120
5.19 Vertex adjacency matrix of Propanal (CCC=O) . . . . . . . . . . . . 121
5.20 Vertex adjacency matrix of Methoxyethene (COC=C) . . . . . . . . . 121
5.21 Edge adjacency matrix of Propanal (CCC=O) . . . . . . . . . . . . . 121
5.22 Edge adjacency matrix of Methoxyethene (COC=C) . . . . . . . . . . 121
5.23 Triple tuple of Chloroethane (CCCl) . . . . . . . . . . . . . . . . . . 122
5.24 Triple tuple of Propanol (CCCO) . . . . . . . . . . . . . . . . . . . . 122
5.25 Vertex adjacency matrix of Chloroethane (CCCl) . . . . . . . . . . . 123
5.26 Vertex adjacency matrix of Propanol (CCCO) . . . . . . . . . . . . . 123
5.27 Edge adjacency matrix of Chloroethane (CCCl) . . . . . . . . . . . . 123
5.28 Edge adjacency matrix of Propanol (CCCO) . . . . . . . . . . . . . . 123
xix
Chapter 1
Introduction
The interrelations and influences among different objects can intuitively form vari-
ous networks, such as disease propagation networks [1], transportation systems [2],
social networks [3], biological networks [4], interlinked documents with citations [5],
recommendation systems [6] and criminal investigations [7]. These networks can all
be cast in graphical form, which is a potent modelling tool to naturally represent
the associations between nodes, consisting of nodes representing anything that has a
unique identity and clear relationships; edges express any relationship among things.
Therefore, graphs can model and express nearly all the systems involved with con-
nections between things. Some traditional data mining and management algorithms
like frequent pattern mining and classification have been modelled as the graph sce-
nario because the semantic expression in a graph is clearer and more flexible. Thus,
graphs are conducive to summarising and displaying large amounts of information in
a concise and orderly manner that are too numerous or complicated to be described
adequately.
With the continuous development of theoretical computer science, the importance
of graph data processing has become increasingly prominent. Graph matching is the
process of evaluating the similarity of graphs, which includes two approaches: exact
and inexact graph matching. The prior one aims at finding a perfect correspondence
between two graphs to be matched. While in certain cases, it is a common practice
to measure the dissimilarity of two graphs. In other words, querying a graph can
1
be processed by searching for subgraphs using the relation of inclusion or by graph
similarity (checking for structural similarity).
Note that, to prevent ambiguity in this thesis, the pairs of terms below could be
used alternatively: ‘node’ and ‘vertex’, ‘edge’ and ‘link’, ‘graph’ and ‘candidate’. In
addition, because the vertex and edge adjacency matrices are symmetric, we use ‘row’
instead of ‘row/column’ for simplicity.
2
1.1 Research Scope
This thesis considers the relative problems, and we highlight them in the following
list:
• nine basic components of graph matching are studied: (1) undirected and un-
labeled; (2) undirected with labelled nodes; (3) undirected with labelled nodes
and edges; (4) directed and unlabeled; (5) directed with labelled nodes; (6)
directed with labelled nodes and edges; (7) undirected, unlabeled multigraph;
(8) undirected with labelled nodes multigraph and (9) undirected with labelled
nodes. Different kinds of graphs are shown in Figure 1-1.
3
The aim of this research problem can be stated as understanding the structure of
graphs, modeling the complex interactions between entities, and designing algorithms
for estimating the similarity between two graphs (structural and labeled graphs).
Given a query graph Q and a set of graph database D = {g1 , g2 , . . . gn }, graph
search returns a query answer set DQ = {G|C(Q, G) = 1, G ∈ D}, C could be a
function testing graph isomorphism (full structure search), subgraph isomorphism
(substructure search), approximate match (full structure similarity search), and ap-
proximate subgraph match (substructure similarity search), the function return values
range from 0 to 1. In this introductory chapter, detailed illustrations will be discussed
in the following chapters.
Working with graphs is very challenging, the challenging issues are as follows:
• Querying graph data is one of the challenging operations in the graph theory
area. Graph query problems are associated with subgraph isomorphism search-
ing in almost all large-scale graph database applications. It is knotty to tell
the isomorphic relation of any two graphs since there are approximate n! node
mapping relationships between the graphs both with n vertices by the brute
force method and the high calculation complexity. It is even harder to manage
subgraph matching in graphs because of enumerating the same scale candidate
subgraphs as the query graphs in the data graph, which is also a signification
obstruction that impedes practical usages. However, the existing subgraph iso-
morphism methods offer very unsatisfactory query performance, so a solid data
retrieval protocol is in demand. Consequently, to carry out this task elegantly
and promptly, there are several pivotal issues that need to address: 1) the
method of modelling the query and data graph; ii) the form of managing and
storing the data; and iii) how to map the data for efficient query processing.
• Graphs that arise in vast practical fields. Due to the greater expressive power
4
(a) g1 (b) g2 (c) g3
Figure 1-1: Various types of graphs: (a) undirected and unlabeled, (b) undirected
with labeled nodes (different colors refer to different labels), (c) undirected with
labeled nodes and edges (d) directed and unlabeled, (e) directed with labeled nodes,
(f) directed with labeled nodes and edges, (g) undirected, unlabeled multigraph, (h)
undirected with labeled nodes multigraph and (i) undirected with labeled nodes and
edges multigraph.
Figure 1-2: Graph (a) is an induced subgraph of (c), and graph (b) is a partial
subgraph of (c).
5
associated with graphs, the related cost comes into being. This cost concerns
the complexity of data representation. Nodes and edges can be accessed and
processed in arbitrary order, and the structural nature of the graph data makes
the intermediate representation and interpretability of the mining results much
more challenging. The structural storage and presentation of graphs come with a
price due to their flexible and clear expressive power. A simplified representation
of graphs is essential for further filtering and verification process.
• Graphs not only with geometrical information but also carry attribute informa-
tion in practice. The design of labelled structures for graphs is a considerable
challenge. Structural graphs can be modelled as attributed graphs where ver-
tices are associated with specific properties, and edges represent relations be-
6
tween two vertices. Advocating a more realistic subgraph model on attributed
graphs that think beyond the simple structure-based graph models is highly
necessary.
Our new methods are built upon the foundation of mathematics between theory
and practice. We summarize our principal contributions or innovations as follows:
• Our proposed algorithm can handle not only multiple types of the graph but
also graphs of different size;
• Some useful graph invariants of vertex and edge adjacency matrix are essential
to be introduced to reveal the inherent nature of graphs;
• We develop an efficient exact graph algorithm to find the query graph within the
structural and labeled graph. A novel approach to measuring graph similarity
between geometric graphs is also being introduced. The distance between two
geometric graphs is defined, which includes permutation and equinumerosity
distance measurement;
7
1.3 Thesis Organization
The remaining chapters of this thesis are structured as follows:
• Chapter 2 provides all the theoretical foundations about graphs, and exact and
inexact graph matching problems required to understand the contents of the
following chapters. It also summarizes the literature review of the most recent
and well-known work conducted in the field of graph and subgraph problems
(graph distance) thoroughly. The pros and cons of those works are pointed out.
• Chapter 4 introduces the induced and partial subgraph algorithms, which deal
with subgraph matching problems with different constraints. We speed up sub-
graph isomorphism verification also by permutation and equinumerosity theo-
rem. The algorithm performances are thoroughly analyzed in theory and ex-
amined through experimentation.
8
Chapter 2
Related work
Structural data have played a prominent role in various domain technologies, such as
social networks, criminal tracking, web link analysis, epidemic spreading, biological
structures, and cheminformatics. Graphs are natural representations of relations be-
tween entities in complex systems. Therefore, effectively managing graph-structured
data is of great significance in different domains.
In this chapter, we present a detailed literature review of the works that are
related to exact and inexact graph matching problems. Section 2.1 presents a brief
overview of the basic concepts of graphs. Section 2.2 discuss the different type of
graph isomorphism problems, including undirected, directed graph and subgraph.
Then, in Section 2.3 and 2.4, different types of exact and inexact graph matching
algorithms and techniques are discussed in detail. Finally, Section 2.5 provides a
summary of the existing literature on graph isomorphism problems and the research
concerns of this thesis.
9
2.1 Basics of Graphs
Any graph can be seen as a finite collection of vertices(or nodes) connected through a
collection of edges that connect a pair of nodes. It is generally represented as G and
consists of two types of elements: vertices and edges. Where V denotes the vertex
set (equivalently node set) and E denotes the edge set (connection or join set). Each
edge has two endpoints u, and v is written as {u, v}, which belong to the vertex set.
One graph is termed as a connected graph if for every pair of vertices u and v, there
is a path from u to v.
An edge (or link) of a graph is one of the connections between the vertices (or
nodes) of the graph. Edges can be undirected, directed, looped, multiplied, unlabeled
and labelled.
A vertex (or node) of a graph is one of the connections that bridge the edges (or
links) of the graph. Vertices can be unlabeled and labelled.
Undirected Graph An undirected graph has an unordered pair of vertexes. In
other words, all the vertexes connect together by bidirectional arcs, or all edges are
undirected (without arrows). When there is an edge representation as (u, v), it is
possible to traverse from u to v, or v to u as there is no specific direction.
Directed Graph A directed graph or digraph G = (V, E) consists of a vertex set
V and an edge set of ordered pairs E of elements in the vertex set. The connected
vertexes have specific directions. The edges of the graph represent a specific direction
from one vertex to another. If there is a direction edge from u to v, the first element
u is the initial or start node and The second element v is the terminal or end node.
The direction is from u to v. Therefore, we cannot consider v to u direction.
Multigraph It is possible to have a single edge or multiple (two or more) edges
that join the same pair of vertices, and the later are called parallel or multiple (and
pointing in the same direction if the graph is directed). An edge connecting a vertex
to itself is called a loop. A graph with neither self-loops nor multiple edges is called
a simple graph.
Labeled/Weighted Graph In a weighted graph, the total weight of the path is
10
the sum of the edge weights for the edges in the path. In a weighted graph, the length
and the total weight may be interesting, depending on the graph. In an unweighted
graph, the concept of “total weight” does not exist since individual weights do not
exist either. Note that the path with the least weight and the fewest edges are not
always the same.
Degree The degree of a node v, denoted by d(v), is the number of links that
connect with other nodes. For directed graphs, the out-degree of a node is the number
of edges entering it; the in-degree of a node is the number of edges coming out of it.
Incidence In graph theory, a vertex is an incident to an edge if the vertex is one
of the endpoints of that edge; an edge is an incident to an edge if two edges share a
common vertex.
Adjacency (If two vertices are connected by an edge in a graph, we say the
vertices are adjacent. The set of vertices adjacent to v is called the neighbourhood
of v, denoted N (v).)
Data Graph (The data graph is a large graph and is the pre-saved data to be
queried on. Given a particular context, we sometimes use data to refer to the data
graph for short.
Query Graph In contrast, the query graph is much smaller than the data graph.
We also use a query or simply query pattern the same as a query graph.) The general
idea is to identify an explicit pattern of the query graph within the data graph.
Subgraph and Supergraph If there exists an embedding of Q = (VS , ES ) in
G = (V, E), then Q is a subgraph of G, denoted by Q ⊆ G, and G is said to be a
supergraph of Q. Vertex set VS is a subset of the vertex set V , that is VS ⊆ V , and
whose edge set ES is a subset of the edge set E, that is ES ⊆ E.
Incident Subgraph An induced subgraph can be constructed by deleting vertices
(and with them all the incident edges), but no more edges. If additional edges are
deleted, the subgraph is not induced. That is, an incident subgraph keeps the same
edges as the original graph between the given set of vertices.
Partial Subgraph In contrast to the incident subgraph, a non-induced subgraph
could have fewer edges than an incident subgraph.
11
The induced and partial subgraphs are shown in Figure 1-2.
Graph Representation Graphs form a complex and expressive data type. We
need methods for representing graphs in databases and manipulating and querying
them. In algebraic graph theory, a matrix can be used to encode some relationships
between entities and obtain the structural properties of a graph.
Adjacency Matrix The adjacency matrix representation as a dominant method
to store graph data in memory to optimize relative graph algorithms. An adjacency
matrix is a V by V (0,1)-matrix of integers with zeros on its diagonal utilized to
describe a finite graph, representing a graph G = (V, E). The connections are rep-
resented via adjacency matrix A, where Aij ̸= 0 denotes (vi , vj ) ∈ E, while Aij = 0
denotes (vi , vj ) ∈
/ E. The degree of node vi is d(vi ). If the edges between nodes are
directed, the in-degree and out-degree are denoted as d+ and d− , respectively. The
number of vertexes and edges of a network are |V | = n, and |E| = m, respectively.
In this thesis, we assume a network is unweighted and undirected unless specified
explicitly.
12
sponding pair of nodes in another graph. The graph isomorphism problem is one of
the few problems in computational complexity theory that belongs to NP; however,
it is still not known whether it is solvable in polynomial time or NP-complete. The
subgraph isomorphism is a generalization of the graph isomorphism problem. Sub-
graph isomorphism is a subproblem of subgraph matching, which finds all subgraph
isomorphisms from a query graph to a data graph, and it is NP-complete. [8].
Because of the stringent conditions of exact graph matching imposed for matching,
exact matching is normally inapplicable in real-world applications. Exact matching
can only identify whether two graphs are exactly similar, while it does not explore
similarity space between dissimilar graphs. For example, fault-tolerant mapping is an
essential research topic in the graph domain due to graph data may get some modifi-
cations. Therefore, it is necessary to compute the approximate matches between two
non-isomorphic graphs. This problem is also referred to as inexact graph matching.
Most variants of the graph matching problem are well known to be NP-hard.
46.Inexact graph matching accounts for errors, noises, and distortions during the
matching process. It is able to accommodate these differences between graphs. Com-
pared with exact matching, the structural constraint has relaxed to some extent to
quantify the closeness of non-identical graphs.
There are a large number of well-studied works paying much more attention to
both exact and approximate subgraph query [9], [10], [11]. Next, different kinds of
query methods of a subgraph can be bracketed into three classes [12] are presented
in detail. (1) Subgraph/supergraph containment query; (2) Graph pattern matching;
and (3) Graph similarity search.
13
identify the graphs which are sub-isomorphic to the query graph. Whereas the
supergraph containment query algorithm returns the graphs that are subgraphs
contained in the query graph. The matching subgraph/supergraphs of the query
graph are returned from the database. The subgraph containment search aims
to retrieve if there exists one substructure in the data graph for each graph
candidate. Subgraph containment search requires to index data graphs and then
filtering and verification procedures have been launched. The filtering process
narrows down the search scope to decrease the computing overhead. Each node
in the candidate list would make a comparison with the query graph in the
verification phrase. The nodes corresponding to the query will be saved. Several
works were proposed to work on the subgraph containment search problem (such
as gIndex [51], Tree+delta [59] FG-index [28], gCode [62] and others.)
3. In the third type of query, a graph database with a number of graphs in the in-
put, the graph similarity search tries to obtain a series of graphs that are isomor-
14
phic (or similar) to the given query graph in a graph database. Graph similarity
search needs the aid of some graph proximity function to measure the similarity
of graphs. Current works mainly rely on a definition of a similarity measure
between two graphs, and then the unqualified candidate graphs can be excluded
by a filtering mechanism. And then, the costing graph search operations are
performed to verify the candidate graphs. There have been proposed several dif-
ferent ways of modelling and computation of similarity between graphs, such as
graph edit distances, maximum common subgraphs, edge/feature misses graph
alignment and graph kernels.
15
these algorithms/techniques. We have identified five main paradigms.
16
cases to a significant improvement over Ullmann’s algorithms, as shown in [15] in
1999.
VF2 VF2 The same authors for [14] proposed an improved version of VF algorithm
named VF2 algorithm [16] in 2001, which has reduced the memory requirement from
O(n2 ) to O(n) with respect to the number of nodes in the graphs, thus making the
algorithm mainly work well with large graphs. Unlike VF, VF2 algorithm defines
orders for query vertices to be selected. VF2 adopts a depth-first strategy with
backtracking, and it contains two main phrases: search and refinement. The first
step is basically as same as Ullman’s algorithm. The main difference is displayed in
the refinement phase. The algorithm deals with the first vertex initially and then
selects a vertex connected with the already matched query nodes, searches for a
subgraph match, and backtracks if not. The real innovation of VF2 is it brings in
feasibility rules to prune in advance.
RI algorithm [17] is similar to VF2. It has a complexity linear in space and
quadratic in time in the average case. RI can be used for both graph isomorphism
and subgraph isomorphism.RI applies a static pattern preordering procedure before
the search, which guarantees the next node visited involves more constraints with
the matched nodes and considers extra constraints to reduce the search space at the
earliest.
VF2+ algorithm [18] is a modification of VF2 algorithm, which improves to
optimize search paths by providing the prior probability of a candidate node and a
search sequence instead of deep first search strategy that the VF2 algorithm employed.
VF2++ algorithm [19] also determines the matching order of a search sequence
with cutting rules rather than deep First search applied on VF2 algorithm. The
search sequence is improved based on Breath First search to find the most unfruitful
branches. Furthermore, pruning rules are also provided to reduce searches astray.
That means this algorithm tends to recognize searches astray and apply pruning
rules at the highest levels; fruitless searches would be less while the investigation goes
deeper.
RI-DS It [20] is a variant of RI for dense graphs. The difference between them
17
is that RI-DS precomputes a compatibility map among pattern and target vertices,
which can reduce the total running time dramatically.
VF3 [21] is a variant of VF2+ algorithm that is specially designed for larger and
denser graphs. It keeps its structure based on a state space representation and uses
depth-first search with backtracking and several heuristic rules to prune the search
space. VF3 markedly reduces the scale of the number of explored nodes to get the
candidate node and the time spending on each state by computing the coverage tree
during the preprocessing of the data graph.
Graph indexing-based algorithms mainly adopt two critical steps, nodes pruning strat-
egy and matching order selection, and leverage some auxiliary indices to accelerate
the exploration. First, enumerating every possible index path from a database graph
up to a maximum length and index them to compute a graph index, that is, a vector
or a tree of features representative of the structural and semantic information of a
graph; then, searching all the indices of the subgraphs that containing the same index
inside the data graph. The size of the index path set could increase drastically with
the size of the graph database.
QuickSI [22] proposes a search sequence QuickSI tries to access nodes having in-
frequent node labels and infrequent adjacent edge labels as early as possible. Specif-
ically, instead of using label frequency information from a query graph as in VF2,
QuickSI pre-processes data graphs to compute the frequencies of node labels and the
frequencies of a triple (source node label, edge label, target node label). By using the
calculated edge label frequencies, we assign a weight to each query edge and obtain a
minimum partial tree using a modified Prim algorithm. QuickSI creates a sequence
by using the order in which the vertices are inserted into the minimum partial tree.
QuickSI is designed for handling small graphs, and it does not have a filtering phase
and auxiliary indices for matching on a single data graph.
GraphQL [23] mainly performs two technical skills for filtering: neighbourhood
signatures and pseudo subgraph isomorphisms. The neighbourhood signatures of data
18
graph vertices prune the initial candidate set; the pseudo-isomorphism technique nar-
rows the search space globally. In particular, GraphQL uses a simple greedy method
by working with a bipartite matching between the query graph and the candidates in
the data graph until it achieves the specific requirement. These strategies are costly
but efficient. This algorithm is designed to work with large graphs.
GADDI [24] indexes a data graph based on a neighbourhood discriminating struc-
ture distance of node pairs in the data graph. Then, a two-way pruning dynamic sub-
graph matching procedure launches. It mainly deals with small and medium-sized
graphs.
SPath [25] algorithm uses a path-based indexing technique as patterns of com-
parison in the data graph and neighbourhood signatures to minimize searching space.
It converts the query graph into a set of shortest paths in order to query. It relies on
a disk-based indexing technique, so it is suited for handling large graphs.
TurboIso [26] proposes a new concept of the neighbourhood equivalence class. It
provides a method of merging similar nodes in the query graph and transforms the
query graph into a spanning tree and then filters out the unnecessary intermediate
results by the path filtering method.
BoostIso [27] is optimized for TurboIso, and the proposed preprocessing tech-
niques are applicable to all subgraph isomorphism algorithms. Firstly, it merges the
nodes with similar attributes in the data graph, after that, further reduces the unfea-
sible candidate nodes by the path filtering method. However, TurboIso and BoostIso
are inapplicable to the situation of a query graph and data graph that have dis-
similar nodes. In addition, the running time of the path filtering method increases
exponentially along with the growth of path length.
CFL-Match [28] presents the core-forest-leaf decomposition of the query graph
and the compact path-based index aiming to postpone cartesian products. The query
graph has been decomposed into three substructures: core, forest and leaf before the
BFS tree construction, and each of these substructures execute subgraph matching.
It contains three procedures: it constructs a spanning tree of the query graph; then, it
calculates all non-tree edges in the minimal connected subgraph; lastly, it iteratively
19
rules out the nodes of degree 1 and counts the degrees of each node.
CECI [29] is similar to Turbo-iso, and the intersection-based method included in
this work makes it performs better than the edge verification method CFL-Match for
the most part. This algorithm can’t work well with large graphs.
DAF [30] proposes several novel techniques for pruning. It also adopts the
intersection-based method to find the candidates. However, it is difficult to apply
on large graphs due to its inherent sequential nature.
20
LAD is quadratic in the average case, while the time complexity is between n2 and
n4 , where n is the size of the query graph.
Improved Ullmann [34], Ullmann proposed a new bit-vector algorithm for bi-
nary constraint satisfaction in 2011 by computing subgraph isomorphism using binary
constraint satisfaction, which primarily depends on the search and partially on do-
main editing. It is a newly updated version paper of the work [13].
Algebraic methods applied to graph problems are known as algebraic graph theory.
The structure and properties of a graph G can be revealed by studying the matri-
ces associated with G, such as the adjacency matrix, the incidence matrix, and the
Laplacian.
Nauty [35] is based on group theory. This paper constructs an efficient way for
computing the generators for the automorphism group of the graphs. The canonical
labelling of the input graphs can be derived from the automorphism group. It de-
termines whether two graphs are isomorphic by comparing the adjacency matrices of
canonical form for corresponding graphs. The equality verification can find an optimal
assignment solution in O(n2 ) time, but the computation of canonical labelling cannot
be solved in subexponential time in the worst case. In most cases, this algorithm can
achieve the desired performance.
Traces The same authors of Nauty [36] in 2014 improved the Nauty program and
introduced a novel technique that is a global vertex order for the whole graph and a
local order for each patch in a preprocessing step. The experimental result indicated
it is more efficient than most of the other existing graph isomorphism tools.
Messmer and Bunke [37], in 1998, proposed a new technique of using a preprocessing
step to convert a graph into a decision tree, which can be solved in polynomial time.
However, the time of preprocessing step and the space of decision tree construction
21
is increased with the number of nodes in the graphs. Weber et al. Messmer and
Bunke [38], in 2000, proposed a technique to find a subgraph as the query graph
from an extensive database of preprocessed graphs. This method follows the rule of
subgraph containment query. It is possible that the dataset may contain multiple
subgraphs like query graphs. Still, only one of the final results will be represented,
which can save execution time. [39], in 2011, the same authors described an extension
of the above approaches to reduce the storage space and indexing time dramatically.
STwig [40] achieves a higher efficiency without considering graph structure in-
dices, but it uses parallel technology. This method decomposes the query graph into
two-level trees by a sophisticated algorithm and adopts exploration and join mecha-
nisms, which can reduce both the number of two-way join operations and the size of
joins’ parameters. However, time-consuming join operations and unevenly distributed
graph data result in the space and time complexity of STwig being relatively high.
Many algorithms such as VF2 [16], VF2+ [18], VF2++ [19], VF3 [21], QuickSI
[22], GADDI [24], GraphQL [23] and SPath [25] are presented to enhance the perfor-
mance of Ullmann algorithm [13] in recent years. These algorithms mentioned above
exploit different join orders to search potential objects, pruning rules and auxiliary in-
formation to exclude false positive candidates, and auxiliary information to eliminate
false candidates to speed up progress. They lack a well-defined filtering scheme to find
optimal solutions and scalable validation rules to decrease computational complexity
as the data scale grows, and none of these algorithms is designed to handle all types
of graphs of all sizes. In addition, no polynomial runtime algorithm is known for the
problem of graph isomorphism, except some algorithms have been devised for spe-
cial structures, such as trees [42], bounded-valence graphs [43], ordered graphs [44],
22
graphs with unique node labels [45]. Planar graphs [46]. Therefore, new strategies
are needed to be able to compete with the challenges that recently arose.
23
elegant in its formulation and easy to use and computationally efficient, it is sub-
ject to certain restrictions: 1) the input graphs should be of the same size; 2) it is
susceptible to weight errors for weighted graphs; 3) the eigenvalues of the adjacency
matrix of each graph have to be single and isolated enough to each other; 4) the
rows of the matrix of the corresponding absolute eigenvectors cannot be very similar
to each other. [51] improves Umeyama’s work to match two graphs with different
numbers of nodes by choosing the largest k-eigenvalues as the projection space. In
2004, Caelli and Kosinov [52] presented an algorithm using eigen-subspace projec-
tions and vertex clustering techniques to handle graph matching, which is used for
inexact many-to-many graph mapping relations instead of one-to-one matching. In
2005, Shokoufandeh et al. [53] provided an effective mechanism for indexing hierarchi-
cal image structures into large databases of directed acyclic graphs by using spectral
characterization.
The main problem of spectral methods in these existing works is that they are
extremely sensitive towards the structure of the matrix. Moreover, most of these
methods are structurally oriented, limiting them only applicable to unlabeled graphs.
2.5 Summary
This chapter discussed the available algorithms for exact and approximate graph
matching problems. We identified the strategy for different frameworks and the ad-
vantages and limitations of existing algorithms.
The above works have substantially contributed to the development of subgraph
matching problems. Now, no Polynomial runtime algorithms are known for the issue
of graph isomorphism, except for some unique graphs.
24
Chapter 3
This chapter introduces two theorems for graph isomorphism verification. It focuses
on the undirected graph, undirected multigraph and directed graph isomorphism ver-
ifications. The isomorphism verification algorithm contains the permutation theorem
and equinumerosity theorem. So far, however, no specific polynomial-bound algo-
rithm for graph isomorphism has been recognized. This is due to the inherent diffi-
culties that are contained in this particularly elusive and challenging problem. It has
aroused theoretical interest due to its affiliation with the concept of NP-completeness;
due to the tricky essence of the issue, many graph theorists have committed signif-
icant time to it. Several algorithms on the subject have been scarce, and progress
has been slight. We proposed a simple directed graph isomorphism polynomial-time
algorithm as NP=P. The theoretical analysis and experimental results show that the
proposed protocol is effective and efficient.
This chapter is organized as follows. Section 3.1 describes undirected and directed
graph isomorphism problems. The graph representation method is discussed in Sec-
tion 3.2. Then, we elaborate on the permutation theorem in Section 3.3 and the
equinumerosity theorem in Section 3.4. In Section 3.5, we also discuss some extended
contents of our proposed algorithm. After that, we provide algorithm complexity and
3 case studies in Section 3.6. Finally, we summarise this chapter in Section 3.7.
25
3.1 Graph Matching
In the graph domain, the characteristics of different types of the graph are not uni-
form. We now present the definitions of graph isomorphism in this section.
Definition 1. Two isomorphic graphs G1 and G2 is a bijective map f from the vertices
of G1 to the vertices of G2 that preserves the “edge structure” in the case that there
is an edge from vertex u to vertex v in G1 if and only if there is an edge from ƒ(u) to
ƒ(v) in G2 [54].
f : V (G1 ) → V (G2 )
where V (G1 ) is the vertex sets of G1 and V (G2 ) is the vertex sets of G2 [54].
Definition 1 and 2 are equivalent. Both two definitions are indicating that any
two vertices u and v of G1 are adjacent in G2 if and only if ƒ(u) and ƒ(v) are adjacent
in G2 . This kind of bijection is commonly described as “edge-preserving bijection”,
in accordance with the general notion of isomorphism being a structure-preserving
bijection [54]. If an isomorphism exists between two graphs, then the graphs are
called isomorphic and denoted as G1 ≃ G2 .
Definition 3. Two isomorphic graphs G1 and G2 is a bijective map from the vertices
of G1 to the vertices of G2 that preserves the “directed edge structure” in the case
that there is a directed edge from vertex u to vertex v in G1 if and only if there is
a directed edge from u to v in G2 [55] and [54]. An isomorphism of directed graphs
G1 and G2 is a bijection between the vertex sets of G1 and G2 , f : V (G1 ) → V (G2 ),
where V (G1 ) is the vertex sets of G1 and V (G2 ) is the vertex sets of G2 [55] and [54].
26
The definition indicates that any two vertices u and v of G1 are adjacent in G2
if and only if f (u) and f (v) are adjacent in G2 . This kind of bijection is commonly
described as “edge-preserving bijection”, in accordance with the general notion of
isomorphism being a structure-preserving bijection [55] and [54]. If an isomorphism
exists between two directed graphs, then the directed graphs are called isomorphic
and denoted as G1 ≈ G2 [55] and [54].
Multigraph isomorphism has opened a wide area of extensive research due to its well-
known NP-complete nature, and nondeterministic polynomial-complete [54]. In exact
graph matching, if there exists a bijective mapping among the vertices and edges on
them; Thus, each pair of two isomorphic graphs share a common structure. A multi-
graph may also contain directed and undirected edges. Multigraphs are more generic
than simple graphs. Simple graphs usually are not rich with multi-edge information,
while multigraph permits multiple edges/relations between a pair of vertices. And
many real-world datasets can be modelled as a network with a set of nodes intercon-
nected with each other with multiple relations. So, the crucial difference is to capture
the multi-edge information.
Multigraph isomorphism based on edge structure: Two multigraphs G1 and G2
are isomorphic if there exists between their vertices a bijection mapping f that cor-
responds the vertices of G1 to the vertices of G2. That keeps the “edge structure”
in the case that there is an edge between vertex u and vertex v in G1 if and only if
there is an edge between f (u) and f (v) in G2 [55].
27
Figure 3-1: Multigraph example
Triple tuple, in a simple directed graph where the number of nodes is n, and the
number of edges is m. We employ a triple tuple for each edge e = (j, vs , vt ), where
j = 1, 2, . . . m. For an undirected graph, vs as an endpoint and vt as another endpoint
are two nodes for an edge ej . For a directed graph, vs as a starting node and vt as an
ending node are two nodes for an edge ej . The general format of the triple tuple for
a graph is shown in Table 3.1 and 3.2.
The matrix generation method for undirected and directed graphs is different. In
Section 3.2.2 and 3.2.4, we will elaborate on the progress.
28
3.2.2 Undirected Vertex and Edge Adjacency Matrix Rep-
resentation
A graph is endowed with distinct positive integers for subscript of the vertex v1 ,
v2 , . . . , vn , where n is the number of the vertices of the graph [54], we call v1 , v2 , . . . ,
vn the vertex label. Likewise, two graphs are endowed with unique labels commonly
taken from the positive integer range for subscript of edge, where m is the number of
the edges of the graph and natural number, used only to uniquely identify the edges,
we call e1 , e2 , . . . , em , the edge label. Both vertex and edge labelling methods could
represent the undirect graph uniquely.
One triple tuple is produced to represent the finite graph like graph 3-2 shown in
Table 3.3.
29
Vertex Adjacency Matrix Representation Method
The vertex adjacency matrix is a Boolean square matrix representing a finite graph.
The elements (valued 0 and 1) in the matrix denote whether pairs of vertices are
connected or not in graph Exp1 3-2. For example, in graph Exp1 , v2 are adjacent
with v1 and v3 . Then, in the vertex adjacency matrix, (v1 , v2 ) = 1 and (v1 , v3 ) =
1. The corresponding vertex adjacency matrices for graph Exp1 are shown in Table
3.19.
v1 v2 v3 v4
v1 0 1 1 1
v2 1 0 1 0
v3 1 1 0 0
v4 1 0 0 0
We create the edge adjacency matrix which is a Boolean square matrix which repre-
sents a finite graph. The elements in the matrix denote whether pairs of edges are
connected with each other or not in the graph Exp1 3-2. For example, in the graph
Exp1 , e2 is adjacent with e1 and e3 . Then, in the edge adjacency matrix, (e1 , e2 ) = 1,
(e2 , e3 ) = 1. The corresponding edge adjacency matrices for graph Exp1 is shown in
Table 3.21.
e1 e2 e3 e4
e1 0 1 1 1
e2 1 0 1 0
e3 1 1 0 1
e4 1 0 1 0
30
3.2.3 Undirected Multigraph Vertex and Edge Adjacency
Matrix Representation
Graphs with parallel edges and/or loops are called multigraph. A Multigraph, with
the counterpart of a simple group, could be with multiple edges and several loops.
For an undirected graph, if there is more than one undirected edge associated with a
pair of vertices, these edges are called parallel edges, as the edges 4 and 9 of graph
Exp2 in Figure 3-3. If there is one edge which has the starting node and the ending
node are the same node, that means to allow an edge that connects a vertex to itself,
and the edge is called a loop, as the edge 3 of graph Exp2 in Figure 3-3. (In this
situation, the diagonal of the vertex adjacent matrix of the graph is a positive nonzero
integer.) The proposed algorithm also works for partial non-simple graphs (working
for only one vertex has the cycle).
One triple tuple is produced to represent the finite graph like graph Exp2 shown in
Table 3.6.
31
Table 3.6: Triple tuple for Exp2
v1 v2 v3 v4 v5
v1 0 1 0 1 2
v2 1 0 1 1 1
v3 0 1 1 1 0
v4 1 1 1 0 0
v5 2 1 0 0 0
The edge adjacency matrix is an m × m boolean square matrix that represents a finite
multigraph, where m is the number of the edges of the graph and natural number,
used only to identify the edges uniquely, as e1 , e2 , . . . , em . The elements (valued 0, 1)
denote whether pairs of edges are connected or not in the multigraph. For example,
in the graph Exp2 , e4 are adjacent with e2 , e3 , e5 , e6 . Then, in the edge adjacency
32
matrix, (e2 , e4 ) = 1, (e3 , e4 ) = 1, (e4 , e5 ) = 1 (e4 , e6 ) = 1. The edge adjacency matrix
for graph Exp2 is shown in Table 3.22.
e1 e2 e3 e4 e5 e6 e7 e8 e9
e1 0 1 0 0 1 1 1 1 0
e2 1 0 1 0 1 0 0 0 1
e3 0 1 1 1 0 0 0 0 0
e4 0 1 1 0 1 1 0 0 0
e5 1 1 0 1 0 1 0 0 1
e6 1 0 0 1 1 0 1 1 0
e7 1 0 0 0 0 1 0 1 1
e8 1 0 0 0 0 1 1 0 1
e9 1 1 0 0 1 0 0 1 0
Two graphs are endowed with unique labels commonly taken from the positive integer
range for subscript and ‘+’, ‘-’ for superscript of vertex, v1+ , v1− , v2+ , v2− , . . . , vn+ , vn− ,
where n is the number of the vertices of the graph and natural number, used only to
uniquely identify the vertices and ‘+’, ‘-’ indicate inlink or outlink of each vertex, we
call v1+ , v1− , v2+ , v2− , . . . , vn+ , vn− the vertex label. Similarly, two graphs are endowed
with unique labels commonly taken from the positive integer range for subscript and
− −
‘+’, ‘-’ for superscript of vertex, e+ + + −
1 , e1 , e2 , e2 , . . . , en , en , where m is the number
of the vertices of the graph and natural number, used only to uniquely identify the
− −
edges and ‘+’, ‘-’ indicate inlink or outlink of each edge, we call e+ +
1 , e1 , e2 , e2 , . . . ,
−
e+
n , en the edge label. Both vertex and edge labelling methods could represent the
One triple tuple is produced to represent the finite graph like graph 3-4 shown in
Table 3.9.
33
Figure 3-4: Exp3
The binary vertex adjacency matrix is a square matrix used to represent a finite
directed/weighted graph. The matrix elements indicate whether pairs of vertices
are adjacent or not in the graph. Each vertex will be represented by two vertexes
with superscripts + and - in the binary vertex adjacency matrix. The superscript +
represents the in-link, and - represents the out-link. We create binary v+
i (in-link)
and v−
i (out-link) to represent one vertex v, where i = 1, 2, . . . n. The superscript +
indicates that there is an in-link to the vertex (a link going into the vertex), and the
superscript - indicates that there is an out-link to the vertex (the link going out). In
this case, the matrix will be a 2n ∗ 2n matrix. For example, in graph G1 shown in
Figure 3-4 directed graph G1 , v− +
1 (out-link of vertex 1) is adjacent with v2 (in-link of
−
vertex 2), v+
3 (in-link of vertex 3) is adjacent with v2 (out-link of vertex 2). Then, in
the binary vertex adjacency matrix, av1− ,v2+ = 1 and av3+ ,v2− = 1. The corresponding
binary vertex adjacency matrix for directed graph Exp3 is shown in Table 3.10.
34
Table 3.10: Vertex Adjacency Matrix of graph Exp2 .
v+
1 v−
1 v+
2 v−
2 v+
3 v−
3 v+
4 v−
4
v+
1 0 0 0 0 0 1 0 0
v−
1 0 0 1 0 0 0 1 0
v+
2 0 1 0 0 0 0 0 0
v−
2 0 0 0 0 1 0 0 0
v+
3 0 0 0 1 0 0 0 0
v−
3 1 0 0 0 0 0 0 0
v+
4 0 1 0 0 0 0 0 0
v−
4 0 0 0 0 0 0 0 0
1, 2, . . . , m, m is the number of edges. The maximum edge for the simple graph is
n ∗ (n − 1). The superscript + indicates that the edge is an in-link to the vertex (a
link going into the vertex), and the superscript - indicates that the edge is an out-link
to the vertex. If two edges are connected by a vertex, two edges are adjacent. For
−
example, in graph Exp3 shown in Figure 3-4, e+ +
2 is adjacent with e1 , e4 is adjacent
with e−
3 . Then, in the binary edge adjacency matrix, av1− ,v2+ = 1 and av3− ,v4+ = 1.
It will be a 2m*2m matrix. The corresponding binary edge adjacency matrix for
directed graph Exp3 is shown in Table 3.11.
e+
1 e−
1 e+
2 e−
2 e+
3 e−
3 e+
4 e−
4
e+
1 0 0 0 0 0 1 0 0
e−
1 0 0 1 0 0 0 0 0
e+
2 0 1 0 0 0 0 0 0
e−
2 0 0 0 0 1 0 0 0
e+
3 0 0 0 1 0 0 0 0
e−
3 1 0 0 0 0 0 1 0
e+
4 0 0 0 0 0 1 0 0
e−
4 0 0 0 0 0 0 0 0
35
have adjacency matrices with identical algorithmic information content, as proven
in [56]. The eigenvalues and eigenvectors of the adjacency matrix can tell properties
of matrix structure, and eigenvalues are a measure of the distortion induced by the
transformation, and eigenvectors tell about the orientation of the distortion and much
other information.
The basic concept of permutation is rearranging the components (vertex and edge)
of a structured object (graph). Given two graphs G1 and G2 , calculating the sum
of rows based on the rows of vertex adjacency matrices V1 and V2 , edge adjacency
matrices E1 and E2 , and produce n sets of an array as Vv and Vv′ , m sets of an
Pn i
Pn i
Pn i 2
array as Ee and Ee′ . Compute and check if i=1 Vv = i=1 Vv ′ , i=1 (Vv ) =
Pn i 2
Pn i n
Pn i n
Pm j
i=1 (Vv ′ ) , . . . , i=1 (Vv ) = i=1 (Vv ′ ) for rows. Compute and check if j=1 Ee =
Pm j Pm j 2
Pm j 2 Pm j m
Pm j m
j=1 Ee′ , j=1 (Ee ) = j=1 (Ee′ ) , . . . , j=1 (Ee ) = j=1 (Ee′ ) for rows. n and
m represent the dimension of the corresponding vertex or edge adjacency matrices.
Namely, the permutation theorem would check if two arrays of sum row based on
vertex adjacency matrix and two arrays of sum row based on edge adjacency matrix
are respectively bijective, if and only if one array is a permutation of another one,
the corresponding two graphs are isomorphic.
The basic algorithmic theory of permutation theorem shows in Figure 3-5 and 3-6.
The computational procedure of permutation theorem shows in Figure 3-7 and
the pseudo code shows in Algorithm 1.
The process of calculation shows in Figure 3-7. If matrix A and B do not sat-
isfy permutation theorem, then no need to proceed step 2. If matrix A and B do
satisfy the permutation theorem, while matrix C and D do not satisfy permutation
theorem, then candidate 1 and candidate 2 are not satisfy permutation theorem. If
matrix A and B, matrix C and D do satisfy permutation theorem, then candidate
1 and candidate 2 do satisfy permutation theorem. Note, n represents the number
of vertex of data graph; m represents the number of edges. For undirected graph,
36
v1n Sum
v2 n rowsum _ 1 v11 v12 v1n
rowsum _ 2 v21 v12 v2 n
vnn
rowsum _ n vn1 vn 2 vnn
n
1 rowsum _ i
i 1
Sum
v12n
2 2
v22n rowsum _ 1 v11 v12 v12n
2 2
rowsum _ 2 v21 v22 v22n
v11 v12 v1n
2
vnn
v21 v22 v2 n
rowsum _ n vn21 vn22 2
vnn
n
v vnn
n1 vn1 2 rowsum _ i
i 1
v13n
Sum
v23n
n n
rowsum _ 1 v11 v12 v1nn
n n
3
vnn rowsum _ 2 v21 v22 v2nn
rowsum _ n vnn1 vnn2 n
vnn
n
n rowsum _ i
i 1
37
Sum
Sum
rowsum _ 1 e11 e12 e1m
1 e11 e12 e1m
rowsum _ 2 e21 e12 e2 m
2 e21 e12 e2 m
rowsum _ m em1 em 2 emm
m em1 em 2 emm
m
m
rowsum _ j
1 rowsum _ j
j 1
j 1
Sum
Sum
2 2
2 2 rowsum _ 1 e11 e12 e12m
1 e11 e12 e12m 2 2
2 2 rowsum _ 2 e21 e22 e22m
2 e21 e22 e22m e11 e12 e1m
e21 e22 e2 m 2 2 2
rowsum _ m em 1 em 2 emm
2
m em 2 2
1 em 2 emm
m
e emm
m
rowsum _ j
m1 em1 2 rowsum _ j
j 1
j 1
Sum Sum
3 3
1 e11 e12 e13n m
rowsum _ 1 e11 m
e12 e1mn
3 3
2 e21 e22 e23m m
rowsum _ 2 e21 m
e22 e2mm
3 3 3 m m m
m em 1 em 2 emm rowsum _ m em1 em 2 emm
m m
rowsum _ j
j 1
m rowsum _ j
j 1
38
the row sum array of vertex adjacency matrix contains n values and row sum array
of edge adjacency matrix contains m values. While for directed graph, the row sum
array of vertex adjacency matrix contains 2n values and row sum array of edge ad-
jacency matrix contains 2m values. The computational scale is based on the size of
corresponding matrix.
The number of links per node constitutes a vital characteristic of a graph, which is a
foundation for the permutation theorem.
For an undirected graph, the array of the sum of rows of the vertex adjacency
matrix reflects each vertex’s degree (sum of rows). Say we compute the sum of rows
equal to the degree of the vertex. It also applies to the edge adjacency matrix.
For a directed graph, the array of the sum of rows of the vertex adjacency matrix
reflects each vertex’s degree for both in-link (sum of odd rows) and out-link (sum of
even rows). Say we compute the sum of odd rows equal to the in-link degree, and the
sum of even rows equals the out-link degree. It also applies to binary edge adjacency
matrix.
Table 3.12 and 3.13 are the general formats for the row sum of an undirected
graph.
Table 3.12: General format to calculate row sum of vertex adjacency matrix based
on triple tuple
Table 3.14 and 3.15 are the general format for row sum of directed graph
39
A B
' '
v11 v12 v1n v11
v12 v1' n
' '
v21 v22 v2 n v21 v22 v2' n
v vnn v' ' '
n1 vn 2 n1 vn 2 vnn
Node adjacent matrix n*n
V
n n
Vvi i
v'
i 1 i 1
Candidate 1 and
(V )
n n
i 2
(Vvi' ) 2 candidate 2 are
v
i 1 i 1 N not satisfy
Permutation
Theorem.
n n
(Vvi ) n (Vvi' ) n
i 1 i 1
C D
' '
e11 e12 e1m e11 e12 e1' m
' '
e21 e22 e2m e21 e22 e2' m
e emm em
' ' '
m1 em2 1 em2 emm
Edge adjacent matrix m*m
E
m m j
Eej
j 1 j 1 e' Candidate 1 and
candidate 2 are
(E ) (E )
m j 2 m j 2
e e' N not satisfy
j 1 j 1
Permutation
Theorem.
m m
( Eej ) m ( Eej' ) m
j 1 j 1
Figure 3-7: Permutation theorem for vertex and edge adjacency matrix.
40
Table 3.13: General format to calculate row sum of edge adjacency matrix based on
triple tuple
Table 3.14: General format to calculate row sum of binary vertex adjacency matrix
based on triple tuple
Table 3.15: General format to calculate row sum of binary edge adjacency matrix
based on triple tuple
41
3.3.2 Mathematical Approval for Permutation Theorem
From the binary vertex and edge adjacency matrix, we can observe that either bi-
nary vertex adjacency representation or edge adjacency representation can uniquely
represent a graph if we treat the edge as vertex shown in Figure 3-3 and create the
new edge presentation way. As we can see, the edge adjacency labelling method
shown in Figure 3-8 is too complicated, and we usually do not use it but the binary
edge adjacency matrix. The following theorems can only approve the “if and only
if” conditions when there is only one exchange among arrays in the vertex adjacency
matrix. If there are duplicate arrays, it is impossible to identify whether there is only
one exchange among the permutation at this stage. Therefore, we use the duality of
vertex and edge adjacency to guarantee our proposed algorithm.
Figure 3-8: Duality of vertex and edge for Exp2 as treating edge as vertex
42
Dual Equivalence of Permutation and Bijection
The graph isomorphism needs to guarantee two bijections for both the vertex set and
edge set. The bijection is equivalent to the permutation of a set. We produce two
matrices as a binary vertex adjacency matrix (2n ∗ 2n) and a binary edge adjacency
matrix (2m ∗ 2m), where n is the number of vertices and m is the number of edges.
The sum of the row for both matrices will be computed as four arrays for two graphs
based on vertex/edge adjacency matrices. Comparing the different summing arrays
in the permutation theorem, we could draw the conclusion as to whether two graphs
are permutable or not [55] and [54]. Regarding the two comparing sets of two arrays,
the definition of permutation refers to the act of arranging all the members of a set
into some sequence or order. A permutation of a set is defined as a bijection from
the set to itself. If we could claim one set A is another set B ’s permutation, A and B
are bijective [55], and [54].
Proof For “if” (sufficient condition), because of the equivalence between permu-
43
tation and bijection, if two sets of edges and vertexes are bijective, two graphs are
bijective. The duality of edge and vertex will guarantee the corresponding relation-
ship between edge and vertex. The “only if” (necessary condition) of the theorem is
not complex because two isomorphic directed graphs always have the two permuta-
tions of arrays in both vertex and edge adjacency matrices as follows. For a simple
graph, there is all zero in the diagonal, and the vertex adjacency matrix for the graph
is symmetric. The approval for the edge adjacency matrix is similar. If it is the
weighted graph, simply replace 0 and 1 with weights. The weight must be a positive
integer, and 0 means there are no links.
A permutation of a set ∆ is defined as a bijection from ∆ to itself [54]. There is a
function from ∆ to ∆ for which every element occurs exactly once as an image value.
This is related to the rearrangement of the elements of ∆ in which each element δ is
replaced by the corresponding f (s).
The theorem indicates that any two vertices u and v of G1 are adjacent in G2 if and
only if f (u) and f (v) are adjacent in G2 . This kind of bijection is commonly described
as “edge-preserving bijection”, in accordance with the general notion of isomorphism
being a structure-preserving bijection [55] and [54]. If an isomorphism exists between
two graphs, then the graphs are called isomorphic and denoted as G1 ≈ G2 . The
mixed graph contains both directed and undirected edges. The undirected edges
have been treated as two directed edges in this thesis.
Therefore, the proposed theorem could be stated as follows:
Assertion 1. Two graphs are isomorphic if and only if the array of the sum of rows
in both vertex and edge adjacency matrices are bijective.
′
Assertion 2. Given two arrays of natural number Γ = (γ1 , γ2 , . . . γk ) and Γ =
′ ′ ′ ′ ′
(γ1 , γ2 , . . . γk ), Γ and Γ are bijective and equivalent if and only if = ki=1 Γ = ki=1 Γ ,
P P
44
Pk 2
Pk ′ Pk Pk ′
i=1 (Γ) = i=1 (Γ )2 , . . . i=1 (Γ)
k
= i=1 (Γ )k . The sequence of the two arrays
′
are bijective and equivalent, where k, γk and γk are all integers greater than or equal
to 1.
If every γ1 , γ2 , γ1′ , γ2′ is a positive integer and larger than 1, according to fun-
damental theorem of arithmetic, γ1 , γ2 , γ1′ , γ2′ either is a prime number itself or
can be represented as the product of prime numbers; moreover, this representation is
unique, up to (except for) the order of the factors. Then K = p1 p2 p3 . . . pl pl+1 . . . pr =
q1 q2 . . . ql′ ql′ +1 . . . qs , where γ1 = p1 p2 . . . pl , γ2 = pl+1 pl+2 . . . pr , γ1′ = q1 q2 . . . ql′ ,
γ2′ = ql′ +1 ql′ +2 . . . qs , where pi and qi are prime. Assume that m(m ̸= 0) and
45
n are integers. We say that m could be divided by n if n is a multiple of m,
namely, if there exists an integer parameter o such that n = m ∗ o. If m divides
n, it can be represented as m|n. The order of the factors will not affect the re-
sults. We have p1 |K, then p1 |q1 q2 . . . ql′ ql′ +1 ql′ +2 . . . qs . p1 divides at least one of qi ,
then if we rearrange qi , we could have p1 |q1 . Because q1 is prime, factors are 1 or
q1 , and then we have p1 = q1 . Now remove it from both sides of the equation.
p2 p3 . . . pl pl+1 . . . pr = q2 q3 . . . ql′ ql′ +1 . . . qs .
Continue this process until all of pi and qi are removed. If all of pi is removed, the
left side of the equality is 1, so there is no left qi . Similarly, if all of qi are removed,
the right side of the equality is 1. The number of pi is equal to qi . Then we have
proved, K = p1 p2 . . . pl pl+1 . . . . . . pr = q1 q2 . . . ql′ ql′ +1 . . . qs , all of pi and qi are prime,
r = s, l = l′ , rearrange qi , we have p1 = q1 , p2 = q2 , . . . pl = ql′ , . . . pr = qs , thus γ1 = γ2
and γ1′ = γ2′ , because γ1 and γ2 are commutative and γ1′ and γ2′ are commutative.
There could be γ1 = γ2′ and γ2 = γ1′ . Then the set of γ1 and γ2 is a permutation of
the set of γ1′ and γ2′ .
Next, we will prove the uniqueness of this condition. That is, there exists a
quaternary and quadratic system of equations 3.1 as
γ1 + γ2 = γ1′ + γ2′
(3.1)
γ 2 + γ 2 = γ ′ 2 + γ ′ 2
1 2 1 2
is a system of two equations involving the four variables γ1 , γ2 , γ1′ , γ2′ , where all
variables are natural numbers. A solution to this system of integer equations is an
assignment of values to the variables such that all the equations are simultaneously
46
satisfied. Two solutions to the system above are given:
γ1 = γ1′
solution set A : (3.2)
γ2 = γ ′
2
γ1 = γ1′
solution set A : (3.3)
γ2 = γ ′
2
When k = 4, if and only if γ1 + γ2 + γ3 + γ4 = γ1′ + γ2′ + γ3′ + γ4′ , γ12 + γ22 + γ32 + γ42 =
′ ′ ′ ′
γ12 + γ22 + γ32 + γ42 , the set of γ1 , γ2 , γ3 , γ4 is a permutation of the set of γ1′ , γ2′ , γ3′ , γ4′ .
47
the set of γ1′ , γ2′ , . . . , γK
′
.
Mathematical Proof: Let P er(K − 1) be the statement of the permutation theo-
rem. We give proof by induction on K.
Base case. The statement holds for k = 1 and k = 2. P er(1) is easily seen to be
true, and P er(2) is true by the above-mentioned proof when k = 2.
Inductive Step. The following steps will show that for any K − 1 ≥ 0 that if
P er(K − 1) holds, then also P er(K) holds. This can be done as follows. Assume the
induction hypothesis that P er(K −1) is true (for some unspecified value of K −1 ≥ 2),
that is, if and only if γ1 +γ2 +· · ·+γK−1 = γ1′ +γ2′ +· · ·+γK−1
′
and γ12 +γ22 +· · ·+γK−1
2
=
′ ′ ′ ′ ′ ′
γ12 + γ22 + · · · + γK−1
2
, . . . , γ1K−1 + γ2K−1 + · · · + γK−1
K−1
= γ1K−1 + γ2K−1 + · · · + γK−1
K−1
,
the set of γ1 , γ2 , . . . , γK−1 is a permutation of the set of γ1′ , γ2′ , . . . , γK−1
′
.
Using the induction hypothesis, the permutation theorem as P er(K − 1) can be
′
written to: if and only if γ1 + γ2 + · · · + γK−1 = γ1′ + γ2′ + · · · + γK−1 and γ12 + γ22 +
′ ′ ′ ′ ′
2
· · · + γK−1 = γ12 + γ22 + · · · + γK−1
2
, . . . , γ1K−1 + γ2K−1 + · · · + γK−1
K−1
= γ1K−1 + γ2K−1 +
′ K−1
· · · + γK−1 , there must exist a permutation matrix P , where [γ1 γ2 . . . γK−2 γK−1 ]P =
[γ1′ γ2′ . . . γK−2
′ ′
γK−1 ]. For [γ1 γ2 . . . γK−2 γK−1 γK ] P0 10 = [[γ1 γ2 . . . γK−2 γK−1 ]P γK ] =
′
[[γ1′ γ2′ . . . γK−2 ′
γK−1 ]γK ], where [ P0 10 is a square binary matrix that has exactly one
entry of 1 in each row and each column and 0s elsewhere. Because γ1 +γ2 +· · ·+γK−1 =
γ1′ + γ2′ + · · · + γK−1
′
, and γ1 + γ2 + · · · + γK−1 + γK = γ1′ + γ2′ + · · · + γK−1′
+ γK′
, then we
′
, thus, [γ1 + γ2 + · · · + γK−1 + γK ] P0 10 = [γ1′ γ2′ . . . γK−1
′ ′
have γK = γK γK ]. Then, the
′
set of γ1 , γ2 , . . . , γK is a permutation of the set of γ1′ , γ2′ , . . . , γK . Therefore, P er(K)
is true, that is, if and only if γ1 + γ2 + · · · + γK−1 + γK = γ1′ + γ2′ + · · · + γK−1
′ ′
+ γK and
′ ′ ′ ′
γ12 +γ22 +· · ·+γK−1
2 2
+γK = γ12 +γ22 +· · ·+γK−1
2
+γK2 , . . . , γ1K +γ2K +· · ·+γK−1
K K
+γK =
′ ′ ′ ′
γ1K + γ2K + · · · + γK−1
K
+ γKK . The set of γ1 , γ2 , . . . , γK is a permutation of the set of
γ1′ , γ2′ , . . . , γK
′
. It shows that, indeed, P er(K) holds. Since the base case and the in-
ductive step have been performed by mathematical induction, the statement P er(K)
holds for all natural numbers K.
Example 2. To check if two arrays are a permutation of another one, such as array
Γ = 2, 3, 3, 2, 2, 3, 3, 2 and array Γ′ = 2, 3, 2, 3, 2, 3, 2, 3. We calculate the sum of
array as (array Γ) = 2 + 3 + 3 + 2 + 2 + 3 + 3 + 2 = 20 and (array Γ′ ) =
P P
48
2 + 3 + 2 + 3 + 2 + 3 + 2 + 3 = 20, the sum of squared individual elements in the
array (array Γ)2 = 22 + 32 + 32 + 22 + 22 + 32 + 32 + 22 = 52 and (array Γ′ )2 =
P P
If the vertex or edge adjacency matrices for graph G1 and G2 have the equinumerous
′
eigenvalue, and there exist the elementary row interchange matrix Rn∗n and Rn∗n
P P
which satisfies: for every eigenvalue 1P and 2P (multiplicity of P ) in G1 and
G2 , the corresponding left singular vector set U1P and U2P , there exist the square
P P
matrix MP ∗P and U1P M = RU2P ; likewise, for every eigenvalue 1P and 2P in
G1 and G2 , the corresponding right singular vector set V1P and V2P , there exist the
square matrix MP′ ∗P and V1P M ′ = R′ V2P . The equinumerosity theorem would execute
singular value decomposition of the vertex and edge adjacency matrix. Determine
whether the eigenvalue sequence is equinumerous. If not, the two graphs is not
isomorphic. If the eigenvalue sequence is equinumerous, then produce the maximally
linearly independent system of the left and right singular vectors for the P -multiple
49
eigenvalues. If they are not equinumerous, the two graphs are not isomorphic. If so,
the two graphs are isomorphic.
The basic algorithmic theory of the equinumerous theorem shows in Figure 3-9,
and 3-10 and the pseudo-code shows in Algorithm 2.
50
Algorithm 2 Equinumerosity Theorem Verification
Input: Vertex adjacency matrices A and B, edge adjacency matrices C and D of two
candidates.
Output: If the two candidates are isomorphic
T
1: A = Uv1 Σv1 Vv1
T
2: B = Uv2 Σv2 Vv2
T
3: C = Ue1 Σe1 Ve1
T
4: D = Ue2 Σe2 Ve2
5: if map(Σv1 ) = map(Σv2 ) & map(Σe1 ) = map(Σe2 ) then
6: if map(Uv1 ) = map(Uv2 ) & map(Vv1 ) = map(Vv2 ) then
7: if map(Ue1 ) = map(Ue2 ) & map(Ve1 ) = map(Ve2 ) then
8: return True
9: else
10: return False
11: end if
12: else
13: return False
14: end if
15: else
16: return False
17: end if
We say that a set of vectors is a maximal linearly independent set if the addition of
any vector to the vector space will result in a set which is not linearly independent.
For the other implication, note that any vector from a vector space can be combined
as a linear combination of the basis vectors, and try to conclude that adding any
vector to the set of the basis vectors makes it linearly dependent.
Equivalently, a set B is a basis. All that is required is its elements are linear
independent, and each element of W is a linear combination of elements of B. Namely,
a basis is a set of linearly independent vectors span the space shown in Figure 3-11.
It’s evident that orthogonal basis vectors all point in different directions.
In the theory of vector space, linear dependence just means some vectors can
51
(a) b1 (b) b2
Figure 3-11: The geometric explanation for the base. The same vector can be repre-
sented in two different bases (green and red arrows)
• s is linearly independent.
The maximal linearly independent vector group is with the most significant num-
ber of vectors in the linear space, and it is not unique.
Give a set of vectors, and we can compute the number of independent vectors by
calculating the rank of the set and finding a maximal linearly dependent subset.
52
3. Identify the columns of B that contain the leading 1s (the pivots).
4. The columns of A that correspond to the columns identified in step (3) from a
maximal linearly independent set of our original set of vectors.
a1 and a2 is a maximally
linearly independent system, and
a = 1 a + a
3 2 1 2
a =a +a
4 1 2
53
non-negative real number σ is a singular value of M only if there are k µ unit vectors
u and k υ unit vectors v as follows:
M v = σu and M u = σv (3.5)
Where u and v are the left and right singular vectors of σ, respectively.
The elements on the diagonal of the matrix Σ are equal to the singular values
of M. The columns of U and V are the left and right singular vectors, respectively.
Therefore, the above definition of SVD states:
Without loss of generality, the columns of U and the rows of V T are defined and
used in this thesis, which is called left and right singular vectors.
54
one 1, and the other is 0. The current format of the original vector set is called the
maximally independent vector system.
Lemma 1. For the general matrices, there are at most k linearly independent eigen-
vectors corresponding to k eigenvalues. For a specific eigenvalue λ, there are m
linearly independent eigenvectors. The following proves that m ≤ k. For them, us-
ing Schmidt’s method to obtain m orthogonal eigenvectors. The following proves
55
that m ≤ k. For them,using Schmidt’s method to obtain m orthogonal eigenvec-
M ϵ1 = λϵ1
tors ϵ1 , ϵ2 , . . . , ϵm x = . . . . Extend it to get a set of orthogonal basis
M ϵm = λϵm
′
ϵ1 , ϵ2 , . . . ϵm , θm+1 , . . . θn , and L = [ϵ1 , ϵ2 , . . . ϵm , θm+1 , . . . θn ]. Then L M L = λI0m M0 ′
(It can be seen from the form of this matrix that the eigenvalue λ is at least m dimen-
′
sions). Since L M L and M have the same eigenvalues, that is to say, the dimension
of the eigenvalue λ of M is k ≤ m.
Thus Lemma 1 is proved. According to the algebra theorem, the total number of
algebraic multiples of the nth -dimensional equation roots is n, so the sum of the num-
ber of all linearly independent eigenvectors to each eigenvalue ≤ n, and (property 3)
proves that there is n independent eigenvector for the real symmetric matrices. That
is, in the above inequality, the equal sign holds. The condition that the equal sign
is true as long as there are exactly k linearly independent eigenvectors corresponding
to k -multiple eigenvalues.
Combining Definition 4, 5, 6 and Property 2, it is proved that the rank of
the maximally linearly independent subset Un∗P /VP ∗n of left/right singular vector
corresponding to the P -multiple eigenvalues is P.
All two graphs satisfied the permutation theorem as above, must have Mn∗n En∗1 =
′ ′ ′ ′ ′ ′
Mn∗n En∗1 , then M x = λx, M y = λ y, y T M = y T λ , thus y T λx = y T λ x.
Property 3. Two graphs are isomorphic if the eigenvalue sequences of the vertex
adjacency matrix and edge adjacency matrix are equinumerous.
Property 4. The two graphs satisfying the permutation theorem must have equinu-
merous eigenvalues.
56
Theorem 4. (Equinumerosity Theorem) The two graphs are isomorphic if and only
if the eigenvalues of the two graphs’ vertex and edge adjacency matrices are equinu-
merous, and the maximally linearly independent vector systems of the left and right
singular vectors corresponding to the P -multiple eigenvectors are equinumerous.
Lemma 3. VP ∗n and VP′ ∗n are the right singular vector set for P -multiple eigenvalues
for graph G1 and G2 , respectively. For the elementary row interchange operation
′
matrix exchange matrix S, if there exists QP ∗P , and SV = V Q, then Q is invertible.
′ ′
Proof : Rank(S T V ) = Rank(V ) = Rank(V ) = r ≤ Rank(Q), according to the
′ ′
matrix theory, r = Rank(V ) = Rank(S T V Q) ≤ min(Rank(S T V ), Rank(Q)), so
Rank(Q) ≥ r, because of r ≥ Rank(Q), then rank(Q) = r, therefore Q is invertible.
Proof : Both G1 and G2 are similar to the diagonal matrix Σ (the element at the
diagonal is the eigenvalue sequence). Suppose there are t different eigenvalues. Then,
′ ′ ′ ′
M ′ = (U ′λ1 , . . . , U ′λt )Σ′ (U ′λ1 , . . . , U ′λt )−1 =
(3.7)
H T (V ′λ1 E1λ , . . . , V ′λt Etλ )Σ′ (V ′λ1 E1λ , . . . , V ′λt E λt )−1 (H T )−1
57
M = (U λ1 E1λ , . . . , U λt Etλ )Σ(U λ1 E1λ , . . . , U λt Etλ )−1 (3.8)
Put (3.8 into 3.7), and H T = H −1 . If the left singular vector set also applies, then
′ ′
M = HM H T . Under the same principle, for edge adjacency matrix N = HN H T ,
two graphs are isomorphic.
(Theorem 1 and 4) have laid the foundations for the algorithm in this thesis.
Theorem 5. There exist the row interchange matrix E, and an∗P and a′n∗P satisfies
a = Ea′ if and only if they are equinumerous. Similarly, there exist the column
interchange matrix E, and an∗P and a′n∗P satisfies a = Ea′ if and only if they are
equinumerous.
Proof : The necessary proof is obvious for sufficient proof. From (definition 8),
both the vector set (a and a′ ) correspond to the equinumerous sequence η, then
′
η = Pi=1 Ql a = Qa, η = Pi=1 = Q′l a′ = Q′ a′ , so Qa = Q′ a′ . Ql , Ql (1 ≤ l ≤ p), both
Q Q
3.5 Extensions
The proposed algorithm also works for the weighted graph. In the graph representa-
tion, we use 0 or 1 to indicate adjacency. If we have the weighted graph, we change
the 0 or 1 to different weights like 1, 2, . . . k, where k is the natural number. 0
still indicates non-link. If the original weight is not the natural number, we need to
standardize and round them. Both vertex and binary edge adjacency matrices for
the weighted graph are still symmetric. Because the method only works with natural
numbers, standardisation and rounding are required to manage the weights.
58
Creation of an isomorphic group for a certain graph
There is a simple method to create all of the isomorphic graphs for a certain graph
G1 . In the vertex adjacency matrix n ∗ n, we select any two vertices as r1 and r2 to
swap in row and row and column and column. The new vertex adjacency matrix will
correspond to another graph G2 , and we could prove that G1 ≈ G2 . We can prove
that in this process, both vertex adjacency and edge adjacency matrix of a graph is
a permutation of another if and only if G1 ≈ G2 .
For the binary vertex adjacency matrix, if we swap any two rows and two columns,
the newly formed graph is isomorphic with the original one (referring to Theorem 1).
We pick up any two combinations of numbers from 1 to n to swap, say 2 and n-1 for
the binary vertex adjacency matrix.
The calculation procedures are the same as the examples above, so we do not
specify them here.
In this section, we present the algorithm execution process and analyze the compu-
tational complexity of the proposed isomorphism algorithm. The graph isomorphism
algorithm is based on the permutation theorem and equinumerosity theorem [54],
which is shown in the following pseudocode:
1. Generate the triple tuple for two undirected (or directed) graphs G1 and G2 .
59
Figure 3-12: Flow Chart of Graph Isomorphism Matching.
60
or the number of edges is not equivalent, produce they are not isomorphic. If
yes, then proceed to the next step. If the number of vertices of the matrix is n,
then the space complexity for a graph is n2 .
- Calculate the sum of rows based on the rows (even/odd rows) of V1 and
V2 and produce n sets of an array as Vv and Vv′ . Compute and check if
Pn i
Pn i
Pn i 2
Pn i 2
Pn i n
Pn i n
i=1 Vv = i=1 Vv ′ , i=1 (Vv ) = i=1 (Vv ′ ) , . . . , i=1 (Vv ) = i=1 (Vv ′ )
for rows ( 2n
P i
P2n i P2n i 2
P2n i 2
P2n i 2n
i=1 Vv = i=1 Vv ′ , i=1 (Vv ) = i=1 (Vv ′ ) , . . . , i=1 (Vv ) =
P2n i 2n
i=1 (Vv ′ ) for even rows and odd rows). That corresponds to the node
degree (or indegree and outdegree) arrays. Check if one array is a per-
mutation of another by the permutation theorem. If so, go to the next
step; if not, output that undirected (or directed) graph G1 and G2 are not
isomorphic.
- Calculate the sum of rows based on the rows (even/odd rows) of E1 and
E2 and produce m sets of an array as Ee and Ee′ . Compute and check
Pm j
Pm j Pm j 2
Pm j 2 Pm j m
if j=1 Ee = j=1 Ee′ , j=1 (Ee ) = j=1 (Ee′ ) , . . . , j=1 (Ee ) =
Pm j m
for rows ( 2m j 2m j 2m j 2 2m j 2
P P P P
j=1 (Ee′ ) j=1 Ee = j=1 Ee′ , j=1 (Ee ) = j=1 (Ee′ ) ,
. . . , 2m j 2m
= 2m j 2m
P P
j=1 (Ee ) j=1 (Ee′ ) for even rows and odd rows). That cor-
responds to the edge degree (or indegree and outdegree) arrays. Check if
one array is a permutation of another by the permutation theorem. If so,
go to the next step; if not, the undirected (or directed) graph G1 and G2
are isomorphic.
61
verification process shows in Algorithm 2. The temporal complexity for calcu-
lating the eigenvalues of (binary) vertex adjacency matrix and edge adjacency
matrix is n3 .
- Determine if the singular value sequences of the two matrices are equinu-
merous. If so, go to the following step; otherwise, the undirected (or di-
rected) graph G1 and G2 are not isomorphic. The complexity of temporal
is n.
- Verify that the singular values of the two matrices are equinumerous. If
so, proceed to the next stage; otherwise, the results indicate that graphs
G1 and G2 are not isomorphic. The temporal complexity is n.
Temporal and spatial complexity are very commonly used characteristics to exam-
ine a graph or subgraph isomorphism algorithm. The comparison with the existing
62
benchmark algorithms and our proposed graph isomorphism algorithm is shown be-
low:
Table 3.16: The comparison with the existing benchmark algorithms and our proposed
algorithm.
Table 3.16 summarizes the time and spatial complexity of our algorithm compared
with that of Ullmann and VF2’s Algorithm as can be deduced from [13] and [14], in
the best and worst case. In this paper, the time complexity of the best case is
O(n2 ), and the worst case is O(n6 ), including the storage. The space complexity for
a graph is just O(3 ∗ m) in triple tuple format, and the worst case is O(n2 ). Note, n
represents the number of vertex of pattern graph; m represents the number of edges;
the maximum number of edges is n(n − 1)/2 in an undirected graph and there are
twice as many in a directed graph, that is n(n − 1), as each node can at most have
edges to every other node.
63
(a) g1 (b) g2
Two triple tuples are produced to represent the undirected graphs g1 and g2 as below
in Table 3.17 and 3.18:
Table 3.17: Triple tuple of graph g1 . Table 3.18: Triple tuple of graph g2
Vertex adjacency matrices of graph g1 and g2 are shown in Table 3.19 and 3.20.
Edge adjacency matrices of graph g1 and g2 are shown in Table 3.21 and 3.22.
64
Table 3.19: Vertex Adjacency Matrix of g1 . Table 3.20: Vertex Adjacency Matrix of g2 .
(step 4 marked as G1 ) (step 6 marked as G2 )
′ ′ ′ ′ ′
v1 v2 v3 v4 v5 v1 v2 v3 v4 v5
′
v1 0 1 1 1 1 v1 0 1 0 1 1
′
v2 1 0 0 1 0 v2 1 0 1 1 0
′
v3 1 0 0 0 1 v3 0 1 0 1 0
′
v4 1 1 0 0 1 v4 1 1 1 0 1
′
v5 1 0 1 1 0 v5 1 0 0 1 0
Table 3.21: Edge adjacency matrix of g. Table 3.22: Edge Adjacency Matrix of g1 .
(step 5 marked as G1 ) (step 7 marked as G2 )
′ ′ ′ ′ ′ ′ ′
e1 e2 e3 e4 e5 e6 e7 e1 e2 e3 e4 e5 e6 e7
′
e1 0 1 1 1 0 0 0 e1 0 1 0 0 1 1 1
′
e2 1 0 1 0 1 1 0 e2 1 0 1 0 1 0 0
′
e3 1 1 0 1 1 1 0 e3 0 1 0 1 1 1 0
′
e4 1 0 1 0 1 0 1 e4 0 0 1 0 1 1 1
′
e5 0 1 1 1 0 1 1 e5 1 1 1 1 0 1 0
′
e6 0 1 1 0 1 0 1 e6 1 0 1 1 1 0 1
′
e7 0 0 0 1 1 1 0 e7 1 0 0 1 0 1 0
Follow the theory from permutation theorem, the row sum of vertex adjacency ma-
trices shown in Table 3.23 and 3.24.
Table 3.23: Computation results of vertex adjacency matrix for graph g1 (step 8)
Sum of row 1th power 2th power 3th power 4th power 5th power
v1 4 16 64 256 1024
v2 2 4 8 16 32
v3 2 4 8 16 32
v4 3 9 27 81 243
v5 3 9 27 81 243
Sum 14 42 134 450 1574
According to the permutation theorem [28], [29], [37], [38], the row sum of the
vertex adjacency matrix of g1 and g2 are highlighted in the last cell line, we can figure
out the two arrays have the same values, so they are permutated. Therefore, the two
vertex adjacency matrices are permutated.
65
Table 3.24: Computation results of vertex adjacency matrix for graph g2 (step 9)
Sum of row 1th power 2th power 3th power 4th power 5th power
′
v1 3 9 27 81 243
′
v2 3 9 27 81 243
′
v3 2 4 8 16 32
′
v4 4 16 64 256 1024
′
v5 2 4 8 16 32
Sum 14 42 134 450 1574
Following the theory from the permutation theorem, the row sum of edge adjacency
matrices is shown in Table 3.25 and 3.26.
Table 3.25: Computation results of edge adjacency matrix for graph g1 (step 10)
Sum of row 1th power 2th power 3th power 4th power 5th power 6th power 7th power
e1 3 9 27 81 243 729 2187
e2 4 16 64 256 1024 4096 16384
e3 5 25 125 625 3125 15625 78125
e4 4 16 64 256 1024 4096 16384
e5 5 25 125 625 3125 15625 78125
e6 4 16 64 256 1024 4096 16384
e7 3 9 27 81 243 729 2187
Sum 28 116 496 2180 9808 41306 209776
Table 3.26: Computation results of edge adjacency matrix for graph g2 (step 11)
Sum of row 1th power 2th power 3th power 4th power 5th power 6th power 7th power
′
e1 4 16 64 256 1024 4096 16384
′
e2 3 9 27 81 243 729 2187
′
e3 4 16 64 256 1024 4096 16384
′
e4 4 16 64 256 1024 4096 16384
′
e5 5 25 125 625 3125 15625 78125
′
e6 5 25 125 625 3125 15625 78125
′
e7 3 9 27 81 243 729 2187
Sum 28 116 496 2180 9808 41306 209776
According to the permutation theorem [28], [29], [37], [38], the row sum of the
edge adjacency matrix of g1 and g2 are highlighted in the last cell line, we can figure
out the two arrays have the same values, so they are permutated. Therefore, the two
edge adjacency matrices are permutated.
66
Equinumerosity check for V1 and V2
In this part, we implement the singular vector composition (SVD) to vertex adjacency
matrices V1 and V2 . Firstly, check if the two corresponding eigenvalue sequences are
equinumerous. And then, check if the maximally independent vector set of the corre-
sponding P -multiple eigenvalues of the left and right singular vector are equinumer-
ous.
2.935432 0 0 0 0
0 1.618034 0 0 0
0 0 1.472834 0 0
0 0 0 0.618034 0
0 0 0 0 0.462598
In Table 3.28 and 3.31, Σv1 and Σv2 are checked by equinumerosity theorem for
g1 and g2 . The eigenvalues are equinumerous. There are no equal items in the
eigenvalue sets and two arrays could be changed to (0, 0, 0, 0, 0), and (0, 0, 0,
0, 0) in terms of the equinumerosity degree [54].
67
Table 3.30: Left Singular Matrix Uv2 for g2 . (step 21)
2.935432 0 0 0 0
0 1.618034 0 0 0
0 0 1.472834 0 0
0 0 0 0.618034 0
0 0 0 0 0.462598
In Table 3.27 and 3.30, Uv1 and Uv2 are checked by equinumerosity theorem
for the left singular vector for g1 and g2 . The maximally linearly independent
system could be changed to (51, 52, 53, 54, 55) and (51, 52, 53, 54, 55) in terms
of the equinumerosity degree. Uv1 and Uv2 are equinumerous.
3. check right singular vector by equinumerosity theorem for V1 and V2 . (step 32)
In Table 3.27 and 3.30, VvT1 and VvT2 are checked by equinumerosity theorem for
the right singular vector for g1 and g2 . The maximally linearly independent
system could be changed to (51, 52, 53, 54, 55) and (51, 52, 53, 54, 55) in terms
of the equinumerosity degree. VvT1 and VvT2 are equinumerous.
68
Table 3.33: Maximally linearly indepen- Table 3.34: Maximally linearly indepen-
dent system of left singular vector for V1 . dent system of left singular vector for V2 .
1 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 1
Table 3.35: Maximally linearly indepen- Table 3.36: Maximally linearly indepen-
dent system of right singular vector for V1 . dent system of right singular vector for V2 .
1 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 1
In this part, we implement the singular vector composition (SVD) to edge adjacency
matrices E1 and E2 . Firstly, check if the two corresponding eigenvalue sequences
are equinumerous. And then, check if the maximally independent vector set of the
corresponding P -multiple eigenvalues of the left and right singular vector are equinu-
merous.
In Table 3.38 and 4.24, Σe1 and Σe2 are checked by equinumerosity theorem for
g1 and g2 . There are two groups of equal items in the eigenvalue sets as 1=1,
and 2=2. And the two arrays could be changed to (0, 21, 22, 23, 24, 0, 0) and
69
Table 3.38: Singular Matrix Σe1 for E1 . (step 25)
4.124885 0 0 0 0 0 0
0 2 0 0 0 0 0
0 0 2 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 0.761557 0
0 0 0 0 0 0 0.636672
4.124885 0 0 0 0 0 0
0 2 0 0 0 0 0
0 0 2 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 0.761557 0
0 0 0 0 0 0 0.636672
70
Table 3.42: Right Singular Matrix VeT2 for E2 . (step 29)
(0, 21, 22, 23, 24, 0, 0), in terms of the equinumerosity degree. Σe1 and Σe2 are
equinumerous.
In Table 3.37 and 3.40, Ue1 and Ue2 are checked by equinumerosity theorem
for the left singular vector for E1 and E2 . The maximally linearly independent
system could be changed to (51, 52, 53, 54, 55) and (51, 52, 53, 54, 55) in terms
of the equinumerosity degree. 3.37 and 3.40 are equinumerous.
Table 3.43: Maximally linearly indepen- Table 3.44: Maximally linearly indepen-
dent system of left singular vector for E1 dent system of left singular vector for E2
1 0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0 0 0 0 0 0 1
3. check right singular vector by equinumerosity theorem for E1 and E2 . (step 35)
In Table 3.39 and 3.42, VeT1 and VeT2 are checked by equinumerosity theorem for
the right singular vector for E1 and E2 . The maximally linearly independent
system could be changed to (51, 52, 53, 54, 55), and (51, 52, 53, 54, 55) in terms
of the equinumerosity degree [37], [38]. VeT1 and VeT2 are equinumerous.
71
Table 3.45: Maximally linearly indepen- Table 3.46: Maximally linearly indepen-
dent system of right singular vector for E1 . dent system of right singular vector for E2
1 0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0 0 0 0 0 0 1
(a) g3 (b) g4
According to the theory of permutation theorem, we only present the array of row
sum in the last cell line: [10, 10, 20, 28, 44, 76, 140, 268, 524, 1036, 2060, 4108,
8204, 16396, 32780, 65548, 131084, 262156] due to the space limitation, the row
sums of vertex adjacent matrices are the same, so we can conclude that binary vertex
72
Table 3.47: Triple tuple of graph g3 . Table 3.48: Triple tuple of graph g4
v+
1 v−
1 v+
2 v−
2 v+
3 v−
3 v+
4 v−
4 v+
5 v−
5 v+
6 v−
6 v+
7 v−
7 v+
8 v−
8
v+
1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
v−
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
v+
2 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
v−
2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
v+
3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
v−
3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
v+
4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
v−
4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
v+
5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
v−
5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
v+
6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
v−
6 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
v+
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
v−
7 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
v+
8 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
v−
8 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
73
Table 3.50: Vertex adjacency matrix of g4 . (step 6 marked as G2 )
′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′
v1+ v1− v2+ v2− v3+ v3− v4+ v4− v5+ v5− v6+ v6− v7+ v7− v8+ v8−
′
v1+ 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
′
v1− 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
′
v2+ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
′
v2− 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
′
v3+ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
′
v3− 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
′
v4+ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
′
v4− 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
′
v5+ 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
′
v5− 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
′
v6+ 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
′
v6− 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
′
v7+ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
′
v7− 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
′
v8+ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
′
v8− 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
74
Table 3.52: Edge adjacency matrix of g4 . (step 7 marked as G2 )
′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′
e1+ e1− e2+ e2− e3+ e3− e4+ e4− e5+ e5− e6+ e6− e7+ e7− e8+ e8− e9+ e9− e10+ e10−
′+
e1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
′
e1− 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
′
e2+ 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
′
e2− 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
′
e3+ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
′
e3− 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0
′
e4+ 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
′
e4− 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
′
e5+ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
′
e5− 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
′
e6+ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
′
e6− 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
′
e7+ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
′
e7− 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
′
e8+ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
′
e8− 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
′
e9+ 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
′
e9− 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
′
e10+ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
′
e10− 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Table 3.53: Computation results of binary vertex adjacency matrix for graph g3 (step
8)
Sum of odd row Sum of even row Sum of row 2th power ... 16th power
v+
1 1 1 1 ... 1
v−
1 1 1 1 ... 1
v+
2 1 1 1 ... 1
v−
2 2 2 4 ... 65536
v+
3 1 1 1 ... 1
v−
3 1 1 1 ... 1
v+
4 1 1 1 ... 1
v−
4 2 2 4 ... 65536
v+
5 1 1 1 ... 1
v−
5 1 1 1 ... 1
v+
6 2 2 4 ... 65536
v−
6 1 1 1 ... 1
v+
7 1 1 1 ... 1
v−
7 1 1 1 ... 1
v+
8 2 2 4 ... 65536
v−
8 1 1 1 ... 1
Sum 10 10 20 28 ... 262156
75
Table 3.54: Computation results of binary vertex adjacency matrix for graph g4 (step
9)
Sum of odd row Sum of even row Sum of row 2th power ... 16th power
′
v1+ 2 2 4 ... 65536
′
v1− 1 1 1 ... 1
′
v2+ 1 1 1 ... 1
′
v2− 1 1 1 ... 1
′
v3+ 1 1 1 ... 1
′
v3− 1 1 1 ... 1
′
v4+ 1 1 1 ... 1
′
v4− 2 2 4 ... 65536
′
v5+ 1 1 1 ... 1
′
v5− 1 1 1 ... 1
′
v6+ 2 2 4 ... 65536
′
v6− 1 1 1 ... 1
′
v7+ 1 1 1 ... 1
′
v7− 2 2 4 ... 65536
′
v8+ 1 1 1 ... 1
′
v8− 1 1 1 ... 1
Sum 10 10 20 28 ... 262156
adjacency matrix for graph g3 and g4 are permutable. So we have to continue carrying
out the calculation on edge adjacent matrices.
76
Table 3.55: Computation results of binary edge adjacency matrix for graph g3 (step
10)
Sum of Sum of
Sum of row 2th power ... 20th power
odd row even row
e+
1 2 2 4 ... 1048576
e−
1 1 1 1 ... 1
e+
2 1 1 1 ... 1
e−
2 1 1 1 ... 1
e+
3 2 2 4 ... 1048576
e−
3 1 1 1 ... 1
e+
4 1 1 1 ... 1
e−
4 1 1 1 ... 1
e+
5 1 1 1 ... 1
e−
5 2 2 4 ... 1048576
e+
6 2 2 4 ... 1048576
e−
6 2 2 4 ... 1048576
e+
7 3 3 9 ... 3486784401
e−
7 1 1 1 ... 1
e+
8 2 2 4 ... 1048576
e−
8 2 2 4 ... 1048576
e+
9 2 2 4 ... 1048576
e−
9 2 2 4 ... 1048576
+
e10 1 1 1 ... 1
e−
10 2 2 4 ... 1048576
Sum 17 15 32 58 ... 3497270170
77
Table 3.56: Computation results of binary edge adjacency matrix for graph g4 (step
11)
Sum of Sum of
Sum of row 2th power ... 20th power
odd row even row
′
e1+ 2 2 4 ... 1048576
′
e1− 1 1 1 ... 1
′
e2+ 1 1 1 ... 1
′
e2− 2 2 4 ... 1048576
′
e3+ 1 1 1 ... 1
′
e3− 2 2 4 ... 1048576
′
e4+ 2 2 4 ... 1048576
′
e4− 1 1 1 ... 1
′
e5+ 2 2 4 ... 1048576
′
e5− 1 1 1 ... 1
′
e6+ 2 2 4 ... 1048576
′
e6− 1 1 1 ... 1
′
e7+ 1 1 1 ... 1
′
e7− 2 2 4 ... 1048576
′
e8+ 2 2 4 ... 1048576
′|
e8 2 2 4 ... 1048576
′
e9+ 2 2 4 ... 1048576
′
e9− 2 2 4 ... 1048576
′
e10+ 1 1 1 ... 1
′
e10− 2 2 4 ... 1048576
Sum 16 16 32 56 ... 12582920
78
3.6.4 Case Study 3 - Multigraph Isomorphism
(a) g5 (b) g6
To handle the loop and parallel edges in the multigraph, we follow the graph
representation rules in Section 3.2.3 of this Chapter.
Table 3.57: Triple tuple of graph g5 . Table 3.58: Triple tuple of graph g2
According to the theory of permutation theorem, We could get the final results in the
last cell line in Table 3.65 and 3.66: [11, 33, 104, 369] for g5 and [11, 35, 119, 419] for
g6 , they are not same. So graph g5 and g6 are not isomorphic, and there is no need
79
Table 3.59: Vertex Adjacency Matrices for induced Matching.
Table 3.65: Computation results of vertex adjacency matrix for graph g1 (step 8)
Sum of row 1th power 2th power 3th power 4th power
v1 4 16 64 256
v2 3 9 27 81
v3 2 4 8 16
v4 2 4 8 16
Sum 11 33 104 369
Table 3.66: Computation results of vertex adjacency matrix for graph g2 (step 9)
Sum of row 1th power 2th power 3th power 4th power
′
v1 4 16 64 256
′
v2 3 9 27 81
′
v3 3 9 27 81
′
v4 1 1 1 1
Sum 11 35 119 419
80
to verify the permutation theorem on edge adjacency matrices and equinumerosity
theorem on the vertex and edge matrices further.
3.7 Summary
In this chapter, we focused on the undirected and directed graph isomorphism prob-
lem. An efficient mechanism is proposed to support and guide some relative down-
stream applications to make movements. Unlike many other approaches where back-
tracking is a commonly used manner, this procedure maps vertices by composing rows
and columns of the distance matrix to obtain the initial partitioning of vertices of the
directed graphs, which reduces the size of the search tree. The proposed algorithm
guarantees returning the best matching candidates at the stage of permutation the-
orem verification. We ran comprehensive and representative illustrations to explain
the efficiency of our isomorphism algorithm. The algorithmic complexity analysis of
temporal and spatial demonstrate the proposed algorithm significantly outperforms
the state-of-the-art algorithms.
81
82
Chapter 4
83
This chapter is organised as follows. In Section 4.1, we introduce the concepts
of subgraph isomorphism and partial and induced subgraphs. Complexity analysis is
discussed in Section 4.2.1. Then, experiments are set up in Section 4.2.3 and 4.2.2.
The work of this chapter is summarised in Section 4.3.
The subgraph isomorphism problem is not known yet as an NP-hard problem [61],
and the definition can be provided as follows: suppose there are two graphs, which
are a query graph Q and a database graph G and the operation is to identify exact
matching instances of Q in G completely. To be specific, a query graph Q = (VS , ES )
is regarded as a subgraph of G = (V, E), graph supposing VS ⊂ V and ES ⊂ E. The
induced graph is the graph with the edges that link VS ∈ G are in S. In other words,
graph G = (V, E) contains graph Q = (VS , ES ). Simply, we recursively emulate all
sets of subgraphs of G and each of them is isomorphic to Q.
To achieve substructure matching results across a graph in this study, we must
build a set of candidate matches and compare the query graph to these candidate
graphs over the appropriate number of vertices and edges, which may significantly
decrease the number of candidate graphs. Clearly, the number of vertices and edges
in two isomorphic graphs must be same. The corresponding subgraphs with the
same number of vertices have to be emulated on the basis of the query graph scale.
Then, the array of the vertex adjacency matrices and the array of the edge adjacency
matrices would be generated according to the triple tuples subset. Thirdly, the sum
of the array will be calculated according to the rules of the permutation theorem and
equinumerosity theorem in sequence to check if they are isomorphic. Our proposed
algorithm can work effectively and efficiently for subgraph matching. In this work,
our proposed algorithm is able to solve induced and partial substructure searching.
A subset of the vertices of the data graph forms an induced subgraph and it
contains all of the edges that have both endpoints in the subset, whereas a partial
subgraph may miss some edges in the subset. An induced subgraph is a special case
84
of a subgraph. Also, this subgraph isomorphism covers both directed and undirected
graphs and it applies the weighting just by replacing 1 with the corresponding weight
value as mentioned in Section 3.5 of Chapter 3.
We have already provided very detailed illustrations about the process of graph
representation method, permutation and equinumerosity theorem verification in Sec-
tion 3.2, 3.3 and 3.4 of Chapter 3, and they are also equally applicable to subgraph
isomorphism verification. Therefore, readers are referred back to Chapter 3 to find
the relevant contents. In order to clearly elaborate the idea of our algorithm, we first
provide some common definitions and then give two examples to illustrate.
2. Check if the number of vertices and edges are the same or not. If not, they are
not isomorphic. If yes, go to the next step.
3. Generate the vertex adjacency matrix for two graphs g and g ′ . (g ′ based on the
corresponding indices of row and column from the vertex and edge adjacency
matrices G)
85
Figure 4-1: Graphs with vertex and edge labeling.
v1 v2 v3 v4 v5
v1 0 1 0 1 1
v2 1 0 1 1 0
v3 0 1 0 1 0
v4 1 1 1 0 1
v5 1 0 0 1 0
We have graph G and subgraph g shown in Figure 4-1 for checking. The total 429
subgraphs from G was created according to the full combination of 5 (the number of
nodes in g) out of 13 (the number of nodes in G). For example, these subgraphs from
G as g1 shown in Figure 4-2a, g2 shown in Figure 4-2b, and g3 shown in Figure 4-2c
were created. We input g1 and g into the algorithm in Figure 3-12 for demonstrating
the process (Step 1). The vertex and edge adjacency matrices of G are shown in
86
(a) g1 (b) g2
(c) g3
87
Table 4.3: Edge adjacency matrix of g.
e1 e2 e3 e4 e5 e6 e7
e1 0 1 0 0 1 1 1
e2 1 0 1 0 1 0 0
e3 0 1 0 1 1 1 0
e4 0 0 1 0 1 1 1
e5 1 1 1 1 0 1 0
e6 1 0 1 1 1 0 1
e7 1 0 0 1 0 1 0
Table 4.1 and 4.3 and g are shown in Table 4.2 and 4.4.
g, g1 and g2 do have the same number of vertices and edges, so these two subgraphs
could go into the permutation theorem checking process. While g and g3 do not have
the same number of vertices and edges, therefore, there is no need to check. Next,
we use the results of g and g1 for demonstration purposes. The vertex and edge
adjacency matrices of g1 are shown in Table 4.5 and 4.6.
According to the permutation theorem [54], the row sum of the vertex adjacency
matrix of g and g1 are permutated, and the row sum of edge adjacency matrix in
g and g1 are permutated. Equinumerosity theorem, the vertex and edge adjacency
88
Table 4.5: Vertex Adjacency Matrix of g1 .
′ ′ ′ ′ ′
v2 v4 v5 v9 v13
′
v2 0 1 1 1 1
′
v4 1 0 0 1 0
′
v5 1 0 0 0 1
′
v9 1 1 0 0 1
′
v13 1 0 1 1 0
matrices are equinumerous. For a detail calculation procedure, refer to Section 3.6.2
in the last Chapter.
In Graph 4-1, there are 34 subgraphs isomorphic to query graph g 4-2a (1, 3, 5,
11, 12), (1, 3, 7, 8, 11), (1, 3, 7, 10, 11), (1, 3, 7, 11, 12), (1, 3, 10, 11, 12), (1, 4, 6,
7, 8), (1, 4, 7, 10, 11), (1, 6, 7, 8, 11), (1, 7, 8, 10, 11), (1, 7, 10, 11, 12), (2, 3, 5, 9,
13), (2, 3, 5, 11, 12), (2, 4, 5, 9, 13), (2, 4, 7, 9, 10), (2, 4, 9, 10, 12), (2, 4, 9, 10, 13),
(2, 4, 9, 12, 13), (2, 9, 10, 12, 13), (3, 5, 9, 12, 13), (3, 5, 10, 11, 12), (3, 5, 11, 12,
13), (3, 7, 10, 11, 12), (3, 9, 10, 11, 12), (4, 6, 7, 8, 10), (4, 6, 7, 9, 10), (4, 6, 7, 10,
11), (4, 7, 9, 10, 11), (4, 7, 9, 10, 12), (4, 7, 10, 11, 12), (4, 9, 10, 11, 12), (4, 9, 10,
12, 13), (5, 9, 10, 12, 13), (7, 9, 10, 11, 12), (9, 10, 11, 12, 13).
A partial subgraph can have fewer edges between the same vertices than the query
graph.
The figure 4-3 below illustrates the subgraph spanned on the data graph by the
vertex subset 2, 5, 3, 12, 13. The induced subgraph searching process performs in a
89
Algorithm 3 Subgraph Selection Scheme 1
Input: A pattern graph G and a subgraph g
Output: If g is a subgraph candidate of G
′
1: while choose the same number of vertices as g from G, termed as g do
′
2: if g has the same number of edges as g then
3: go to Algorithm 1
4: else
5: return false
6: end if
7: end while
(a) g (b) g4
Based on the algorithm presented in the previous Section 4.1.1, due to g4 having
an extra edge between vertex 3 and 4, g is not a subgraph of g4 . In some scenarios, it
is not true. For ease of understanding, we use a simple case to illustrate the proposed
scheme to deal with this issue. The pseudo-code of inexact subgraph selection is
shown in Algorithm 4.
2. Check if the number of edges of c′ is equal or more than c or not. If not, they
are not isomorphic. If yes, go to the next step.
90
3. Regenerating the vertex and edge adjacency matrices of c′ based on the corre-
sponding indices of row and column from the vertex and edge adjacency matri-
ces C and go to the next step. Ensure the size of vertex and edge adjacency
matrices of c and c′ is the same.
This section presents the experimental results on the performance of temporal and
spatial (see Section 4.2.1). We show the detailed verification process in terms of
undirected partial and induced subgraph isomorphism in Section 4.2.2. The algorithm
flow chart 3-12 is presented below.
91
4.2.1 Complexity Analysis
1. Subgraph Selection Algorithm analysis (refer to Algorithm 3 and 4); The tem-
poral complexity is O(min(nk , n(n−k) )), though if k ≥ n/2 it is more accurate
to say O(n(n−k) ), where n indicates the number of vertices of the data graph, k
represents the number of vertices of the query graph; or n indicates the number
of edges of the candidate subgraph, k represents the number of edges of the
query graph.
The temporal and spatial complexity of these algorithms (Ullmann [13], VF2 [16]
and Quasipolynomial GI-algorithm [41]) are suitable for both graph isomorphism and
subgraph isomorphism. The detailed analysis and demonstration of permutation and
equinumerosity theorem are referred to Section 3.3 and 3.4.
For ease of understanding, we use two simple cases to illustrate further.
The generation process of the vertex adjacency matrix of graph c′ for induced match-
ing is choosing the corresponding indexes according to enumerating vertex combina-
tions from the vertex adjacency matrix of graph C.
Table 4.7: Query graph c Table 4.8: step 1: enumerate vertices from
graph C
v1 v2 v3 v4
v1 0 1 0 0 v1′ v2′ v3′ v4′ v5′
v2 1 0 1 0 v1′ 0 1 1 0 0
v3 0 1 0 1 v2′ 1 0 1 0 0
v4 0 0 1 0 v3′ 1 1 0 1 0
v4′ 0 0 1 0 1
v5′ 0 0 0 1 0
92
(a) c (b) step 1: enumerate vertices from graph C.
(c) step 2a: enumerate edges from graph C. (d) step 2b: enumerate edges from graph C.
′
Figure 4-4: The Procedure of Generating Graph c .
The generation process of the edge adjacency matrix of graph c′ for induced match-
ing is choosing the corresponding indexes according to enumerating edge combinations
from the edge adjacency matrix of graph C.
Table 4.9: step 2a: enumerate edges from Table 4.10: step 2b: enumerate edges from
graph C graph C
e′1 e′2 e′3 e′4 e′5 e′1 e′2 e′3 e′4 e′5
e′1 0 1 0 1 0 e′1 0 1 0 1 0
e′2 1 0 1 1 0 e′2 1 0 1 1 0
e′3 0 1 0 1 1 e′3 0 1 0 1 1
e′4 1 1 1 0 0 e′4 1 1 1 0 0
e′5 0 0 1 0 0 e′5 0 0 1 0 0
The subgraph c 4-4a has 4 vertices and 3 edges, so we have to choose the same
number of vertexes from the pattern graph C 4-4b. In this example, 5 subgraphs are
generated with vertex id: (1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5), (1, 3, 4, 5) and (2, 3, 4,
5). Among these subgraphs, only the subgraph with vertexes combination of (1, 2, 4,
5) has 2 edges, which are less than 3 edges in c. Therefore, only (1, 2, 3, 4), (1, 2, 3,
5), (1, 3, 4, 5) and (2, 3, 4, 5) could be the potential subgraph candidates. Here, we
93
use the subgraph c′ with vertex ID (1, 2, 3, 4) in C and c for demonstration purposes.
The vertex adjacency matrix of subgraph c′ is shown in Table 4.8. In Algorithm 3,
subgraph c′ would be excluded in the initial selection process because there are 4
edges between the corresponding vertices.
The vertex and edge matrices of c′ are generated by selecting the corresponding
rows and columns from C, refer to Table 4.8, 4.9 and 4.10. By going through the
permutation and equinumerosity theorem checking process, only subgraphs with edge
ID (1, 2, 3) and (1, 3, 4) are isomorphic to graph c. The specific calculation process
is as Section 3.6.2 in Chapter 3.
2. Enumerate the same number of edges as subgraph c among the vertices based
on step 1.
Based on the principle of induced subgraph matching, we can conclude that the
subgraphs with vertices combinations of (1, 2, 3, 4), (1, 3, 4, 5), (2, 3, 4, 5) in Figure
4-6c are isomorphic to Figure 4-6a.
94
(a) c (b) C
Table 4.11: Triple tuple of graph c. Table 4.12: Triple tuple of graph C
Table 4.13: Vertex adjacency matrices for Table 4.14: Vertex adjacency matrices for
query graph c. data graph C
′ ′ ′ ′ ′
v1 v2 v3 v4 v1 v2 v3 v4 v5
′
v1 0 1 0 0 v1 0 1 1 0 0
′
v2 1 0 1 0 v2 1 0 1 0 0
′
v3 0 1 0 1 v3 1 1 0 1 0
′
v4 0 0 1 0 v4 0 0 1 0 1
′
v5 0 0 0 1 0
The subgraph c 4-5a has 4 vertices and 3 edges. In Algorithm 3, we only consider the
subgraphs with the same number of vertexes has the same number of edges with the
query graph c 4-5a as potential candidates. So we have to choose the same number of
vertexes from the pattern graph C 4-5b. If this applies, the matching process should
be continued. In this example, 5 subgraphs with 4 edges are generated according
to the full combination of 4 (the number of vertexes in c) out of 5 (the number of
95
Table 4.15: Edge adjacency matrix for Table 4.16: Edge adjacency matrix for
query graph c data graph C
′ ′ ′ ′ ′
e1 e2 e3 e1 e2 e3 e4 e5
′
e1 0 1 0 e1 0 1 0 1 0
′
e2 1 0 1 e2 1 0 1 1 0
′
e3 0 1 0 e3 0 1 0 1 1
′
e4 1 1 1 0 0
′
e5 0 0 1 0 0
vertexes in C ), these subgraphs with vertex id: (1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5),
(1, 3, 4, 5) and (2, 3, 4, 5). Among these subgraphs, (1, 2, 3, 4) has 4 edges and (1, 2,
4, 5) has 2 edges, which are not equal to 3 as query graph c. Therefore, only (1, 2, 3,
5), (1, 3, 4, 5) and (2, 3, 4, 5) could be the potential subgraph candidates. Here, we
use the subgraph c′ with vertex id (1, 2, 3, 5) in C and c for demonstration purposes.
(a) c (b) C
(c) C
The vertex and edge matrices of c′ are generated by selecting the corresponding
rows and columns from C. Based on the principle of partial subgraph matching and
by going through the permutation and equinumerosity theorem checking process, we
can conclude that the subgraph the subgraphs with vertices combinations of (1, 3,
96
Table 4.17: Triple tuple of graph c. Table 4.18: Triple tuple of graph C
Table 4.21: Edge adjacency matrix of c′ Table 4.22: Edge adjacency matrix of C
′ ′ ′
e1 e2 e3 e1 e2 e4
′
e1 0 1 0 e1 0 1 1
′
e2 1 0 1 e2 1 0 1
′
e3 0 1 0 e4 1 1 0
Table 4.23: Computation results of vertex adjacency matrix for graph g1 (step 8)
Sum of row 1th power 2th power 3th power 4th power
v1 1 1 1 1
v2 2 4 8 16
v3 2 4 8 16
v4 1 1 1 1
Sum 6 10 18 34
Table 4.24: Computation results of vertex adjacency matrix for graph g1 (step 8)
Sum of row 1th power 2th power 3th power 4th power
v1′ 2 4 8 16
v2′ 2 4 8 16
v3′ 2 4 8 16
v5′ 0 0 0 0
Sum 6 12 24 48
97
4, 5), (2, 3, 4, 5) with edge ID (1, 2, 3), (1, 3, 4) and (2, 3, 5) in Figure 4-6c are
isomorphic to graph c 4-6a. Readers are referred to Section 3.6.2 of the previous
chapter for more details about the calculating process.
4.3 Summary
In this chapter, We introduced the concept of subgraph isomorphism. Then, we dis-
cussed subgraph matching problems in which partial and induced subgraph scenarios
are considered. On the theoretical basis of permutation and equinumerosity theorem
proposed in Section 3, the experiment results suggest that our proposed subgraph
isomorphism algorithm achieves a good solution quality and is effective in two situa-
tions.
98
Chapter 5
In Chapter 3 and 4, we discussed undirected and directed graph and subgraph iso-
morphism verification. There are two conditions of distance: 0 and 1, for every two
graphs. This representation, however, fails to show a real distance between the two
graphs. The real distance could display the grade of differences, whose range should
be [0,1] rather than only 0 and 1. In this chapter, we propose a quantitative graph
distance measurement that extends the distance of two graphs from 0, 1 to [0, 1].
Due to the graphs that could have dissimilar vertices and edges, to solve this tricky
question, we transform the two graphs into two graphs with the equivalent number of
vertices and edges. Therefore, it is comparable with each other. Our method defines
a synopsis quantitative distance (or dissimilarity) measure for approximate matching.
The distance calculation between different chemical structures will be used as
running examples in this chapter. Figuring out the connection between the structure
and the function is an essential problem in biology, where the biological structure
is usually represented by graphs. The graph-isomorphism-based representation is
breaking through in bioinformatics. Chemical compound identifications could be
converted into graph querying problems, which is a knotty problem pressing for a
solution.
We adopt adjacent matrix graph presentations by simplifying molecular structures
99
to detect isomorphic topological patterns and to further improve the substructure
retrieval efficiency. Inspired by the graph isomorphism algorithm [54] and the SMILES
representation, we created another representation for the chemical components as the
follows.
In this section, we illustrate the procedure of our proposed Quantitative Graph Dis-
tance model in detail. The fundamental structure of this algorithm is depicted in
Figure 5-1.
0.5S1+0.5S2 0.25S1+0.25S2+0.25S3+0.25S4
Distance Distance
This chapter is organised as follows. Section 5.1 describes the Permutation Topo-
logical Distance calculation process of the vertex and edge adjacency matrices. In
Section 5.2, we will then describe the equinumerosity topological distance calculation
process of the vertex and edge adjacency matrices. In Section 5.3, we illustrate the
procedure of constructing the vertex and edge adjacency matrix of labelled graphs.
In Section 5.4, detailed processes for introducing the complexity of our proposed al-
gorithm for graph topological distance and a number of real-world applications are
discussed. The work of this chapter is summarised in Section 5.5.
100
responding vertex/edge adjacency matrices. If the distance is 0, they are isomorphic.
If a graph’s array of row sum of vertex/edge adjacency matrix is a permutation of
another graph, then the two graphs are isomorphic. The computational procedure of
Permutation Topological Distance is as follows:
key
z }| {
1. Constructing a vertex-edge map based on edge-vertex map: 〈ID of vertex v :
value
z }| {
all adjacent edges of vertex v 〉.
key
z }| {
2. Constructing a edge-vertex map: 〈ID of edge e:
value
z }| {
starting point of the edge e, ending point of edge e 〉.
3. The row sum sequence of the vertex adjacency matrix is determined according
to the vertex-edge map.
4. The row sum sequence of the edge adjacency matrix is determined according to
the edge-vertex map.
5. Obtaining sum of the row, sum of the square of the row up to sum of nth power
of row for vertex adjacency matrix and the sum of a row, the sum of square of
a row up to sum of mth power of row for edge adjacency matrix respectively, as
shown in Table 5.3 and 5.4.
Note, the number of n and m in Table 5.3 and 5.4 indicate the size of the vertex
and edge adjacency matrices, respectively.
Euclidean distance can easily compute the dissimilarity or distance between two
candidates. The formula 5.1 shows the row sum distance of vertex adjacency matrices:
101
Table 5.2: Row sums of edge matrices
s ′ ′ ′
(α1 − α1 )2 + (α2 − α2 )2 + · · · + (αn−1 − αn−1 )2 + (αn − αn′ )2
S1 = 2 2 2
(5.1)
+ αn−1 + αn2 + αn′ 2
′ ′ ′
2
α12 + α1 + α22 + α2 + · · · + αn−1
The formula 5.2 shows the row sum distance of edge adjacency matrices:
s ′ ′ ′
(β1 − β1 )2 + (β2 − β2 )2 + · · · + (βn−1 − βn−1 )2 + (βn − βn′ )2
S1 = 2 2 2
(5.2)
+ βn−1 + βn2 + βn′ 2
′ ′ ′
β12 + β1 + β22 + β2 + · · · + βn−1
2
102
5.2 Equinumerosity Topological Distance
This section presents the computational procedure of equinumerosity topological dis-
tance, which contains two parts: 1) equinumerosity topological distance of singular
values and 2) equinumerosity topological distance of the left and right singular matrix.
1. With the eigendecomposition of the vertex and edge matrix, we convert a singu-
lar value sequence to an equinumerous sequence. Firstly, grouping the singular
values of matrix S with the same values, and the number of elements in each
group termed as P -multiple eigenvalues (P >1). (This step is named as mapping
function)
(3.6, 3.6, 3.2, 3.2, 2.4, 2.4, 2.4, 2.4, 2.4, 2.4).
↓
〈singular value: index of the corresponding singular value 〉: {〈3.6: (1, 2)〉, 〈3.2:
(3, 4)〉, 〈2.4: (5, 6, 7, 8, 9, 10)〉}.
↓
Equinumerous sequence: {21, 22, 23, 24, 61, 62, 63, 64, 65, 66}.
↓
Calculate sum, the sum of square and sum of cube of equinumerous sequences:
471, 26241, 1585521, 99018069, 6263167401, 398174714421, 25371059927481,
1618644679753029, 103360482666590121, 6605253359089495701.
103
2. Performing step 1) on the corresponding vertex and edge matrices of two given
graphs to get the sum, sum of square and sum of cube of the equinumerous
sequence of singular values, respectively.
s ′ ′ ′
(γ1 − γ1 )2 + (γ2 − γ2 )2 + · · · + (γn−1 − γn−1 )2 + (γn − γn′ )2
S3 = 2 2 2
(5.3)
+ γn−1 + γn2 + γn′ 2
′ ′ ′
γ12 + γ1 + γ22 + γ2 + · · · + γn−1
2
104
5.2.2 Equinumerous topological distance of left and right sin-
gular matrix
The more detailed computing process of the equinumerous topological distance of the
left and the right singular matrix is as follows:
′
• If num ̸= num : the algorithm returns to step 1) to check next P -multiple
eigenvalue.
′
• If num = num :
105
2. Performing Gaussian elimination on the maximally independent vector set of
the corresponding P -multiple eigenvalues of the left and right eigenvector of
′ ′
candidate 1 and candidate 2. Say, UP , UP , VP and VP .
4. Similarly, we can apply the same calculating process on the right singular ma-
trix. Thus obtaining equinumerous topological distances of left and right sin-
gular matrix of edge adjacency matrix S7 and S8 .
106
Algorithm 6 Equinumerosity Topological Distance Measurement
Input: Vertex adjacency matrices A and B, edge adjacency matrices C and D of two
candidates
Output: Equinumerosity Topological Distance
T
A = Uv1 Σv1 Vv1
T
B = Uv2 Σv2 Vv2
C = Ue1 Σe1 Ve1T
D = Ue2 Σe2 Ve2T
if map(Σv1 ) = map(Σv2 ) & map(Σe1 ) = map(Σe2 ) then
S3 = 0
S4 = 0
if map(Uv1 ) = map(Uv2 ) & map(Vv1 ) = map(Vv2 ) then
S5 = 0
S6 = 0
if map(Ue1 ) = map(Ue2 ) & map(Ve1 ) = map(Ve2 ) then
S=0
else
S = 0.125 ∗ S7 + 0.125 ∗ S8
end if
else
S = 0.125 ∗ S5 + 0.125 ∗ S6
end if
else
S = 0.25 ∗ S3 + 0.25 ∗ S4
end if
107
theorem [20]. The mathematical proof is shown in 3.3.2. Therefore, the range of the
topological distance should be (0, 1).
The value of the quantitative graph distance measurement is the dissimilar value
for every pair of two graphs. Then, the formulas mentioned above are the corre-
sponding distance function. We, therefore, calculate how the difference between two
graphs implies the degree of two graphs being non-isomorphic. It offers a detailed and
accurate measurement of graph isomorphism. The design of topological distance mea-
surement follows Euclidean distance between two graphs. Furthermore, a real-time
application is shown in 5.4.
Labels are a kind of naming only — and so label nodes are either present or absent
in graphs. We discussed the shaping isomorphism scenarios in the last chapters
because, without labelling, the only thing that differentiates two graphs is their shape.
Thus the number of unlabeled graphs with n vertices is the number of graph shapes
with n vertices. While graphs with labelled nodes are commonly used in practical
applications. If you’re modelling a situation in which nodes represent something
that is distinguishable, it matters how those things are connected by whatever the
edges represent. With labelled vertices, it characterises different objects. Two graphs
can have the same shape but not be the same because if we take Chloroethane,
for example, which is shown in Figure 5-2a, if splitting the chemical structure of
Chloroethane from the middle and without labelling the left node as ‘H’ and right
node as ‘Cl’, they are symmetrical. That is, Without labelling nodes, these two
possible subgraphs are isomorphic.
Chemical compounds can be represented as graph structures in which the atoms
represent the nodes, and the bonds represent the links.
The SMILES code for Chloroethane is CCCl and Propanol is CCCO (it can also
be written as OCCC). Before calculating the distance between Chloroethane and
Propanol, we have to convert the SMILES code into graph form. We summarize
108
(a) Chloroethane (b) Propanol
3. How to compare two SMILES strings with dissimilar numbers of atoms that is,
two graphs with different numbers of nodes;
4. How to compare two SMILES strings with a dissimilar number of bonds that
is, two graphs with different numbers of edges.
In the next section, we will solve the problems above by considering the charac-
teristic chemical instances.
109
5.4.1 Complexity Analysis
The calculation process of Topological structure Comparison between two given graphs
is shown below:
1. Based on the two supplied graphs, generate vertex and edge adjacency matrices.
If the number of vertices of the matrix is n, then the space complexity for a
graph is n4 .
110
A detailed discussion of various practical instances may be found in the next
section.
5.4.2 Applications
The extension of the application domain for our proposed graph isomorphism al-
gorithm includes numerous fields: chemical data analysis, computational biology
and drug discovery. Chemical compound identifications could be converted into
(sub)graph querying problems, where a node and an edge corresponding to an atom
and a chemical bond, respectively. It can facilitate the process of drug design. An
assumption is that the chemical compound structure is correlated with its properties
and biological activities, a powerful concept in drug discovery research known as the
structure-activity relationship (SAR) principle. Therefore, graph isomorphism can
benefit from uncovering chemical and biological characteristics such as stability, toxi-
city, metabolism, absorption and activity, etc. The essential features of chemical data
graphs are relatively small, each node and edge with distinct labels (the name of the
corresponding atoms and bonds may be repeated many times in one molecule).
111
graph. Different starting atoms and different neighbour atom traversals will yield
different SMILES as a result. Generally, one molecular structure usually has a unique
graph but a number of equally valid SMILES representations. For example, the
structure of ethanol could be specified as CCO, OCC and C(O)C. The process of
generating canonical SMILES by a canonicalisation algorithm is inconsistent among
chemical toolkits. It is hard to distinguish if the given SMILES strings present the
same chemical structure. A SMILES encodes a molecule, and it cannot be transformed
the structure search problem into a simple string matching problem. For example,
the chemical structure of Aspirin is shown in Figure 5-3:
The chemical structure for Aspirin can be encoded into 10 very different SMILES
strings. (C9 H8 O4 ) are shown as below:
1. C1CCCC(C1OC(=O)C)C(O)=O
2. C(=O)(C1CCCCC1OC(C)=O)O
3. C1C(C(OC(=O)C)CCC1)C(=O)O
4. C1(OC(C)=O)C(CCCC1)C(O)=O
5. C1CC(C(C(O)=O)CC1)OC(C)=O
6. C1C(C(=O)O)C(OC(C)=O)CCC1
7. C1(C(CCCC1)C(O)=O)OC(C)=O
112
8. C1CCCC(C(=O)O)C1OC(=O)C
9. C1CCCC(C1C(=O)O)OC(C)=O
10. C1CC(C(CC1)C(=O)O)OC(C)=O
This algorithm exploits molecule topological similarity and functional compound sim-
ilarity so that they can be reliable via the application of the concept of bioisosterism.
The algorithms which are designed for chemical data need to take into account
repetitions in node end edge labels, which is a knotty problem pressing for a solution.
The design philosophy is identifying the structural symbols and chemical atom and
bond symbols to convert the corresponding chemical label information into structural
information, which is called transformed graphs. It transforms SMILES expressions
into undirected graphs. And then generating vertex and edge adjacency matrices
from graphs. Chemical constitution matching verification in this study also uses two
theorems involving: the permutation theorem and the equinumerosity theorem.
The next section will present 4 chemical instances to further illustrate our graph
distance measurement method, which the aforementioned algorithm can support. The
calculation process of the algorithm is shown below:
113
Case Study 1 - The Identification of Multiple SMILES Notations for One
Chemical Compound
The structure of molecules with identical formulas may not be unique. Thus, varia-
tions in the structure of a molecule are referred to as Isomers. Geometrical isomers
need to be distinguished from each other - and checking these structural isomers is
a challenging problem. In this set of experiments, we look into the algorithm per-
formances at identifying equivalent SMILES strings for Oxetane. Oxetane can be
represented by four other equivalent SMILES strings: O1CCC1, C1OCC1, C1COC1
and C1CCO1. The presence of cycles makes it hard to deal with, the isomer graphs
have been generated as Figure 5-5.
Based on the permutation theorem, the row sums of O1CCC1 and C1COC1 for
vertex and edge adjacency matrix are both [18, 56, 180] and [18, 56, 180]. In this sec-
tion, we only do an approximate calculation of the sum of rows to the third power and
we refer the reader to 3.6.2 for more computational details, so they are permutated.
114
Table 5.5: Triple tuple of O1CCC1 Table 5.6: Triple tuple of C1COC1
Table 5.7: Vertex adjacency matrix of Table 5.8: Vertex adjacency matrix of
O1CCC1 C1COC1.
e1 e2 e3 e4 e5 e6 e7 e8 e9
e1 0 1 1 1 1 0 0 0 0
e2 1 0 1 1 0 1 1 0 0
e3 1 1 0 1 0 1 0 1 0
e4 1 1 1 0 0 0 1 0 1
e5 1 0 0 0 0 0 0 1 1
e6 0 1 1 0 0 0 1 1 0
e7 0 1 0 1 0 1 0 0 1
e8 0 0 1 0 1 1 0 0 1
e9 0 0 0 1 1 0 1 1 0
115
Table 5.10: Edge adjacency matrix of C1COC1
′ ′ ′ ′ ′ ′ ′ ′ ′
e1 e2 e3 e4 e5 e6 e7 e8 e9
′
e1 0 1 1 1 1 0 0 0 0
′
e2 1 0 1 1 0 1 1 0 0
′
e3 1 1 0 1 0 1 0 1 0
′
e4 1 1 1 0 0 0 1 0 1
′
e5 1 0 0 0 0 0 0 1 1
′
e6 0 1 1 0 0 0 1 1 0
′
e7 0 1 0 1 0 1 0 0 1
′
e8 0 0 1 0 1 1 0 0 1
′
e9 0 0 0 1 1 0 1 1 0
Graph G1 (O1CCC1) and G2 (C1COC1) satisfy the permutation theorem. The sin-
gular values of the two matrices are equinumerous and the maximally independent
vector set of the corresponding P -multiple eigenvalues of the left and right singular
vector of vertex and edge adjacency matrices are equinumerous, the graph G1 and G2
are satisfying equnumerosity theorem. So graph G1 and G2 are isomorphic.
We will clarify why we add an index list to the graph generation procedure by use
of examples in Figure 5-7. The index list can map corresponding attributes to each
vertex.
The chemical structure of Oxetane and Thietane are shown in Figure 5-4 and
5-6. It is evident that they share the same topological structure except for one node
with a different label. This is the reason why we created an index list on the left of
the element list, and this mapping relationship can distinguish nodes with a different
types of atoms. Therefore we can calculate the distance between graphs not just from
their topological level.
The distance between Oxetane (O1CCC1) and Thietane (S1CCC1) is 0.064865.
The row sums of Oxetane and Thietane for the vertex adjacency matrix are [20, 62,
200] and [20, 60, 188], S1 = 0.042078. The row sums of Oxetane and Thietane for
the edge adjacency matrix are [42, 184, 828] and [40, 168, 730], so S = 0.064865.
116
Figure 5-6: Chemical structure for Thietane
Table 5.11: Triple tuple of Oxetane Table 5.12: Triple tuple of Thietane
(O1CCC1) (S1CCC1)
117
Table 5.13: Vertex adjacency matrix of Table 5.14: Vertex adjacency matrix of
Oxetane (O1CCC1) Thietane (S1CCC1)
e1 e2 e3 e4 e5 e6 e7 e8 e9 e10
e1 0 1 1 1 1 1 0 0 0 0
e2 1 0 1 1 0 0 1 1 0 0
e3 1 1 0 1 0 0 1 0 1 0
e4 1 1 1 0 0 0 0 1 0 1
e5 1 0 0 0 0 1 0 0 0 0
e6 1 0 0 0 1 0 0 0 1 1
e7 0 1 1 0 0 0 0 1 1 0
e8 0 1 0 1 0 0 1 0 0 1
e9 0 0 1 0 0 1 1 0 0 1
e10 0 0 0 1 0 1 0 1 1 0
118
Table 5.16: Edge adjacency matrix of Thietane (S1CCC1)
e′1 e′2 e′3 e′4 e′5 e′6 e′7 e′8 e′9 e′10
e′1 0 1 1 1 1 0 0 0 0 0
e′2 1 0 1 1 0 0 1 1 0 0
e′3 1 1 0 1 0 0 1 0 1 0
e′4 1 1 1 0 0 0 0 1 0 1
e′5 1 0 0 0 0 1 0 0 0 0
e′6 0 0 0 0 1 0 0 0 1 1
e′7 0 1 1 0 0 0 0 1 1 0
e′8 0 1 0 1 0 0 1 0 0 1
e′9 0 0 1 0 0 1 1 0 0 1
e′10 0 0 0 1 0 1 0 1 1 0
In Figure 5-8, we show two chemical molecular structures of C3 H6 O for further expla-
nation. The atom list on the left side (marked as blue colour) represents the index list
and the SMILES element list is on the right side (marked as purple colour). In this set
of experiments, we look into the algorithm performances at identifying the SMILES
formula for Propanal is CCC=O and Methoxyethene is COC=C, respectively. In this
case, we treat the double bond as a labelled node.
119
(a) CCC=O (b) COC=C
Table 5.17: Triple tuple of Propanal Table 5.18: Triple tuple of Methoxyethene
(CCC=O) (COC=C)
120
Table 5.19: Vertex adjacency matrix of Table 5.20: Vertex adjacency matrix of
Propanal (CCC=O) Methoxyethene (COC=C)
e1 e2 e3 e4 e5 e6 e7 e8 e9
e1 0 1 1 1 1 0 0 0 0
e2 1 0 1 1 0 1 0 0 0
e3 1 1 0 1 0 1 1 0 0
e4 1 1 1 0 0 0 1 1 0
e5 1 0 0 0 0 0 0 0 1
e6 0 1 1 0 0 0 1 0 0
e7 0 0 1 1 0 1 0 1 0
e8 0 0 0 1 0 0 1 0 1
e9 0 0 0 0 1 0 0 1 0
121
Case Study 4 - Chemical Compound with Different Number of Atoms
Chloroethane (CCCl) and Propanol (CCCO) are in Figure 5-2. Because the two
given SMILES strings are not equal in length, we must create an empty node and
two empty edges for Chloroethane at the same time, ensuring the vertex and edge
adjacency matrices generated from graphs in Figure 5-10 have the same dimensions,
which means adding zero-rows and all zero-columns at the end of the corresponding
matrices).
Table 5.23: Triple tuple of Chloroethane Table 5.24: Triple tuple of Propanol
(CCCl) (CCCO)
122
Table 5.25: Vertex adjacency matrix of Table 5.26: Vertex adjacency matrix of
Chloroethane (CCCl) Propanol (CCCO)
e1 e2 e3 e4 e5 e6 e7 e8 e9
e1 0 1 1 1 0 0 0 0 0
e2 1 0 1 0 0 1 0 0 0
e3 1 1 0 0 0 1 1 0 0
e4 1 0 0 0 1 0 0 0 0
e5 0 0 0 1 0 0 1 0 0
e6 0 1 1 0 0 0 1 0 0
e7 0 0 1 0 1 1 0 0 0
e8 0 0 0 0 0 0 0 0 0
e9 0 0 0 0 0 0 0 0 0
123
86] and [18, 52, 162], so S1 = 0.401077. The row sums of Propanal and Methoxyethene
for edge adjacency matrix are [20, 60, 188] and [34, 138, 592], so S2 = 0.642959.
5.5 Summary
In this chapter, we explored the problem of distance between graphs with labelled
nodes. We elaborate on the theory of conversion of a chemical structure into the
corresponding representation forms of vertex and edge adjacency matrix. Built upon
the Permutation and equinumerosity theorem proposed in Section 3, we proposed
the Permutation and equinumerosity distance measurement to quantify the two given
candidates. We consider several possible situations with different structural features
of instance and concrete results are given.
124
Chapter 6
Epilogue
Graphs reflect a potential relationship among different entities of data which has been
widely recognized as an effective mechanism in response to the proliferating demands
across industries and academia, which helps them to make operational and strategic
decisions.
(Sub)graph isomorphism is one of the most important practical applications of
graph mining, and it has been widely applied in a diversity of real-world applications.
We propose a tripartite architecture graph isomorphism-based algorithm to verify
two graphs are identical and subgraph isomorphism-based approach to obtain query
subgraphs across huge graphs. First, we significantly reduce the computational com-
plexity by comparing the number of vertices and edges between two graphs for graph
isomorphism; and by extracting a succession of subgraphs to be matched and then
comparing the number of vertices and edges between the putative isomorphic sub-
graphs and the query graph for subgraph isomorphism. Following this, the vertex and
edge matrix including vertex and edge positional relation, directional relation, and
distance relation has been constructed. Then, using the permutation theorem, the
row sum of the vertex and edge adjacency matrix of the query graph and prospective
sample is computed. According to the equinumerosity theorem, we next determine
if the eigenvalues of the vertex and edge adjacency matrices of the two graphs are
equal. The topological distance can be computed based on graph isomorphism, and
subgraph isomorphism can be implemented after subgraph combination.
125
A wealth of applications can receive benefits from our subgraph matching tech-
nique. Even though the applications mentioned in this thesis are derived from the
chemical domain, they can still be commonly leveraged in other fields. Because of
the linear structure of SMILES notations, we parse these strings into geometrical
forms. Namely, our proposed method is able to handle linear data structures by
translating them into graph structures, which can subsequently be translated into
two-dimensional matrix forms using the quantitative graph distance measurement.
Many approximations and sub-optimal polynomial algorithms have been proposed
lately, but they do not guarantee to return efficient polynomial time solutions to graph
matching problems under any type of graph scenario. In order to enhance the effec-
tiveness of checking whether two graphs are isomorphic, this thesis puts forward a
polynomial-time algorithm to compute metrics. Theoretical analysis of this algorithm
has been carried out on its time and space requirements, and it has been successfully
applied to measure the similarity between two structures. The computation complex-
ity is O(n6 ) in the worst case and O(n2 ) in the best case, where n is the number of
sequences. The experiment results show our proposed method achieves a much lower
computational complexity than the state-of-the-art algorithms, which demonstrates
that the accuracy, effectiveness, and efficiency have been dramatically enhanced by
our proposed method for analysis. Our proposed algorithms can handle exact and
inexact graph matching. Therefore, the newly proposed subgraph isomorphism and
graph distance algorithms can spur further research with breakthrough significance
in the field of artificial intelligence theory.
This thesis studied graph isomorphism and graph distance problems and provided
solutions to some downstream applications. There are several potential research di-
rections to explore and further opportunities to build upon this research, such as the
maximal common subgraph search problem, which attempts to match the maximum
number of nodes between the two graphs. It is a promising research direction to
extend the work of this thesis.
126
Bibliography
[1] Chengyi Xia, Zhishuang Wang, Chunyuan Zheng, Quantong Guo, Yongtang
Shi, Matthias Dehmer, and Zengqiang Chen. A new coupled disease-awareness
spreading model with mass media on multiplex networks. Information Sciences,
471:185–200, 2019.
[2] Yang Liu, Yun Yuan, Jieyi Shen, and Wei Gao. Emergency response facility
location in transportation networks: a literature review. Journal of traffic and
transportation engineering (English edition), 8(2):153–169, 2021.
[3] Sancheng Peng, Yongmei Zhou, Lihong Cao, Shui Yu, Jianwei Niu, and Weijia
Jia. Influence analysis in social networks: A survey. Journal of Network and
Computer Applications, 106:17–32, 2018.
[5] Fen Zhao, Yi Zhang, Jianguo Lu, and Ofer Shai. Measuring academic influence
using heterogeneous author-citation networks. Scientometrics, 118(3):1119–1140,
2019.
[6] Xingchen Li, Xiang Wang, Xiangnan He, Long Chen, Jun Xiao, and Tat-Seng
Chua. Hierarchical fashion graph network for personalized outfit recommen-
dation. In Proceedings of the 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval, pages 159–168, 2020.
[7] Lucia Cavallaro, Ovidiu Bagdasar, Pasquale De Meo, Giacomo Fiumara, and
Antonio Liotta. Graph and network theory for the analysis of criminal networks.
In Data Science and Internet of Things, pages 139–156. Springer, 2021.
[9] Donatello Conte, Pasquale Foggia, Carlo Sansone, and Mario Vento. Thirty
years of graph matching in pattern recognition. International journal of pattern
recognition and artificial intelligence, 18(03):265–298, 2004.
127
[10] Kaspar Riesen, Xiaoyi Jiang, and Horst Bunke. Exact and inexact graph match-
ing: Methodology and applications. Managing and Mining Graph Data, pages
217–247, 2010.
[11] Tibério S Caetano, Julian J McAuley, Li Cheng, Quoc V Le, and Alex J Smola.
Learning graph matching. IEEE transactions on pattern analysis and machine
intelligence, 31(6):1048–1058, 2009.
[12] Yinghui Wu and Arijit Khan. Graph Pattern Matching, pages 871–875. Springer
International Publishing, Cham, 2019.
[13] Julian R Ullmann. An algorithm for subgraph isomorphism. Journal of the ACM
(JACM), 23(1):31–42, 1976.
[14] Luigi P Cordella, Pasquale Foggia, Carlo Sansone, Francesco Tortorella, and
Mario Vento. Graph matching: a fast algorithm and its evaluation. In Pro-
ceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.
98EX170), volume 2, pages 1582–1584. IEEE, 1998.
[15] Luigi P Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. Performance
evaluation of the vf graph matching algorithm. In Proceedings 10th International
Conference on Image Analysis and Processing, pages 1172–1177. IEEE, 1999.
[16] Luigi Pietro Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. An
improved algorithm for matching large graphs. In 3rd IAPR-TC15 workshop on
graph-based representations in pattern recognition, pages 149–159, 2001.
[17] Vincenzo Bonnici, Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha, and Al-
fredo Ferro. A subgraph isomorphism algorithm and its application to biochem-
ical data. BMC bioinformatics, 14(7):1–13, 2013.
[18] Vincenzo Carletti, Pasquale Foggia, and Mario Vento. Vf2 plus: An improved
version of vf2 for biological graphs. In International Workshop on Graph-Based
Representations in Pattern Recognition, pages 168–177. Springer, 2015.
[19] Alpár Jüttner and Péter Madarasi. Vf2++—an improved subgraph isomorphism
algorithm. Discrete Applied Mathematics, 242:69–81, 2018.
[20] Vincenzo Bonnici and Rosalba Giugno. On the variable ordering in subgraph
isomorphism algorithms. IEEE/ACM transactions on computational biology and
bioinformatics, 14(1):193–203, 2016.
[21] Vincenzo Carletti, Pasquale Foggia, Alessia Saggese, and Mario Vento. Intro-
ducing vf3: A new algorithm for subgraph isomorphism. In International Work-
shop on Graph-Based Representations in Pattern Recognition, pages 128–139.
Springer, 2017.
128
[22] Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. Taming verifi-
cation hardness: an efficient algorithm for testing subgraph isomorphism. Pro-
ceedings of the VLDB Endowment, 1(1):364–375, 2008.
[23] Huahai He and Ambuj K Singh. Graphs-at-a-time: query language and access
methods for graph databases. In Proceedings of the 2008 ACM SIGMOD inter-
national conference on Management of data, pages 405–418, 2008.
[24] Shijie Zhang, Shirong Li, and Jiong Yang. Gaddi: distance index based sub-
graph matching in biological networks. In Proceedings of the 12th international
conference on extending database technology: advances in database technology,
pages 192–203, 2009.
[25] Peixiang Zhao and Jiawei Han. On graph query optimization in large networks.
Proceedings of the VLDB Endowment, 3(1-2):340–351, 2010.
[26] Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. Turboiso: towards ultrafast
and robust subgraph isomorphism search in large graph databases. In Proceedings
of the 2013 ACM SIGMOD International Conference on Management of Data,
pages 337–348, 2013.
[27] Xuguang Ren and Junhu Wang. Exploiting vertex relationships in speeding up
subgraph isomorphism over large graphs. Proceedings of the VLDB Endowment,
8(5):617–628, 2015.
[28] Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. Efficient sub-
graph matching by postponing cartesian products. In Proceedings of the 2016
International Conference on Management of Data, pages 1199–1214, 2016.
[29] Bibek Bhattarai, Hang Liu, and H Howie Huang. Ceci: Compact embedding
cluster index for scalable subgraph matching. In Proceedings of the 2019 Inter-
national Conference on Management of Data, pages 1447–1462, 2019.
[30] Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin
Han. Efficient subgraph matching: Harmonizing dynamic programming, adap-
tive matching order, and failing set together. In Proceedings of the 2019 Inter-
national Conference on Management of Data, pages 1429–1446, 2019.
[32] Javier Larrosa and Gabriel Valiente. Constraint satisfaction algorithms for graph
pattern matching. Mathematical structures in computer science, 12(4):403–422,
2002.
129
[34] Julian R Ullmann. Bit-vector algorithms for binary constraint satisfaction and
subgraph isomorphism. Journal of Experimental Algorithmics (JEA), 15:1–1,
2011.
[35] Brendan D McKay et al. Practical graph isomorphism. 1981.
[36] Brendan D McKay and Adolfo Piperno. Practical graph isomorphism, ii. Journal
of symbolic computation, 60:94–112, 2014.
[37] Bruno T Messmer and Horst Bunke. A decision tree approach to graph and
subgraph isomorphism detection. Pattern recognition, 32(12):1979–1998, 1999.
[38] Bruno T Messmer and Horst Bunke. Efficient subgraph isomorphism detection: a
decomposition approach. IEEE transactions on knowledge and data engineering,
12(2):307–323, 2000.
[39] Markus Weber, Marcus Liwicki, and Andreas Dengel. Indexing with well-founded
total order for faster subgraph isomorphism detection. In International Work-
shop on Graph-Based Representations in Pattern Recognition, pages 185–194.
Springer, 2011.
[40] Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. Efficient
subgraph matching on billion node graphs. arXiv preprint arXiv:1205.6691, 2012.
[41] László Babai. Graph isomorphism in quasipolynomial time. In Proceedings of the
forty-eighth annual ACM symposium on Theory of Computing, pages 684–697,
2016.
[42] Alfred V Aho and John E Hopcroft. The design and analysis of computer algo-
rithms. Pearson Education India, 1974.
[43] Eugene M Luks. Isomorphism of graphs of bounded valence can be tested in
polynomial time. Journal of computer and system sciences, 25(1):42–65, 1982.
[44] Xiaoyi Jiang and Horst Bunke. Optimal quadratic-time isomorphism of ordered
graphs. Pattern Recognition, 32(7):1273–1283, 1999.
[45] Peter J Dickinson, Horst Bunke, Arek Dadej, and Miro Kraetzl. Matching graphs
with unique node labels. Pattern Analysis and Applications, 7(3):243–254, 2004.
[46] John E Hopcroft and Jin-Kue Wong. Linear time algorithm for isomorphism
of planar graphs (preliminary report). In Proceedings of the sixth annual ACM
symposium on Theory of computing, pages 172–184, 1974.
[47] Shinji Umeyama. An eigendecomposition approach to weighted graph match-
ing problems. IEEE transactions on pattern analysis and machine intelligence,
10(5):695–703, 1988.
[48] Bin Luo, Richard C Wilson, and Edwin R Hancock. Spectral embedding of
graphs. Pattern recognition, 36(10):2213–2230, 2003.
130
[49] Richard C Wilson, Edwin R Hancock, and Bin Luo. Pattern vectors from alge-
braic graph theory. IEEE transactions on pattern analysis and machine intelli-
gence, 27(7):1112–1124, 2005.
[51] Terry Caelli and Serhiy Kosinov. An eigenspace projection clustering method
for inexact graph matching. IEEE transactions on pattern analysis and machine
intelligence, 26(4):515–519, 2004.
[52] Terry Caelli and Serhiy Kosinov. Inexact graph matching using eigen-subspace
projection clustering. International Journal of Pattern Recognition and Artificial
Intelligence, 18(03):329–354, 2004.
[53] Ali Shokoufandeh, Diego Macrini, Sven Dickinson, Kaleem Siddiqi, and Steven W
Zucker. Indexing hierarchical structures using graph spectra. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 27(7):1125–1140, 2005.
[54] Jing He, Jinjun Chen, Guangyan Huang, Jie Cao, Zhiwang Zhang, Hui Zheng,
Peng Zhang, Roozbeh Zarei, Ferry Sansoto, Ruchuan Wang, et al. A polynomial-
time algorithm for simple undirected graph isomorphism. Concurrency and Com-
putation: Practice and Experience, 33(7):1–1, 2021.
[55] Jing He, Jinjun Chen, Guangyan Huang, Mengjiao Guo, Zhiwang Zhang, Hui
Zheng, Yunyao Li, Ruchuan Wang, Weibei Fan, Chi-Huang Chi, et al. A fuzzy
theory based topological distance measurement for undirected multigraphs. In
2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pages 1–
10. IEEE, 2020.
[57] Mengjiao Guo, Chi-Hung Chi, Hui Zheng, Jing He, and Xiaoting Zhang. A sub-
graph isomorphism-based attack towards social networks. In IEEE/WIC/ACM
International Conference on Web Intelligence and Intelligent Agent Technology,
pages 520–528, 2021.
[59] John W Raymond, Eleanor J Gardiner, and Peter Willett. Heuristics for sim-
ilarity searching of chemical graphs using a maximum common edge subgraph
algorithm. Journal of chemical information and computer sciences, 42(2):305–
316, 2002.
131
[60] Daniel Rehfeldt and Thorsten Koch. Combining np-hard reduction techniques
and strong heuristics in an exact algorithm for the maximum-weight connected
subgraph problem. SIAM Journal on Optimization, 29(1):369–398, 2019.
132